A Recommender System for Healthy and Personalized Recipe Recommendations Florian Pecune * Lucile Callebert * Stacy Marsella florian.pecune@glasgow.ac.uk lucile.callebert@glasgow.ac.uk stacy.marsella@glasgow.ac.uk University of Glasgow University of Glasgow University of Glasgow ABSTRACT is not always available, and research has shown how hard it is for Unhealthy eating behavior is a serious public health issue with mas- people to infer the healthiness of a recipe simply from its picture [5], sive repercussions on an individual’s health. One potential solution even when the recipe has been categorized as healthy [23]. Based to this problem is to help people change their eating behavior by on these findings, it becomes important to build systems that not developing systems able to recommend healthy recipes that can in- only recommend healthy and personalized recipes, but that also fluence eating behavior. One challenge for such systems is to deliver precisely display how healthy these recipes are. healthy recommendations that take into account users’ needs and In this paper, we present an experiment in which we investigate preferences, while also informing users about the healthiness of the whether people would be likely to select recipes that are healthier recommended recipes. In this paper, we investigate whether intro- than the recipes they usually cook. More specifically, our work ducing a healthy bias in a recipe recommendation algorithm, and focuses on investigating how introducing a healthy bias in the displaying a healthy tag on recipe cards would have an influence recommendation algorithm and the presence of a healthy tag would on people’s decision making. To that end, we build three differ- influence users’ likelihood to pick recommended recipes. We first ent recipes recommender systems: one that recommends recipes describe the different recommendation algorithms we evaluate in matching users’ preferences, another one that only recommends our experiment. Then we describe our experimental design and healthy recipes, and a third one that recommends recipes that are present our results. both healthy and match users’ preferences. We evaluate these three systems through a user study in which we asked participants online to select from a list of recipes the ones they like the most. 2 RELATED WORK KEYWORDS Food recommender systems traditionally rely on two distinct ap- proaches to deliver personalized recipes recommendations. Systems food recommender system; healthcare; collaborative filtering relying on the content-based approach recommend recipes based on their description and users’ preferences. In [6] the authors de- 1 INTRODUCTION veloped a system that infers people’s preferred ingredients based Unhealthy eating is a major public health burden that may be on the recipes they like. The system then recommends new recipes reduced in part by helping people select healthier dietary choices. containing the previously inferred ingredients. Rather than relying However, picking appropriate food to eat implies complex decision on recipes ingredients, [13] proposed an Ensemble Topic Model- making processes [4], including being aware of healthy options ing based approach that relied on features that were previously and choosing among them [17]. With people growing increasingly extracted from a recipe database to deliver recommendations. Their familiar with interacting with machines in their everyday life, one system performed significantly better than a conventional content- solution to overcome this issue and help people to make healthier based system. Another approach is described in [25], in which the choices is to develop health-aware food recommender systems authors implemented a goal-oriented recipe recommender system [21, 22]. One of the most important challenges for such a system providing nutrition information. The system first collects the user’s is to deliver accurate and personalized recommendations to their goal (e.g. I want to prevent a cold) before finding a nutrient that users. Although most of the popular recipes found on Internet matches that goal. The system then picks the ingredient containing are unhealthy as defined by the United Kingdom Food Standard the most of the nutrient previously selected. Finally, the system Agency (FSA) [23], significant e ffort ha s be en pu t re cently into recommends a recipe containing that specific ingredient. YumMe, optimizing food recommendation algorithms and try to reconcile the recommender system developed in [27], rely on dietary infor- users’ preferences with healthy recipe recommendation [2, 8, 23]. mation to recommend recipes that would match users’ needs. The By analyzing people’s eating behavior, authors in [9] found that the system automatically extracts dietary information from pictures of fat and calorific content of a recipe were the best rating predictors recipes to form a user profile. The system then relies on this user for people interested in eating healthy. However, this information profile to deliver subsequent recommendations. ∗ The authors contributed equally to this paper. Systems relying on the collaborative filtering approach predict recommendation ratings for a user based on ratings from other users. In [7], the authors developed a system that collects users’ HealthRecSys’20, September 26, 2020, Online, Worldwide preferences by asking them to rate and tag the recipes they usu- © 2020 Copyright for the individual papers remains with the authors. Use permitted ally cook at home. The system then relies on users’ preferences under Creative Commons License Attribution 4.0 International (CC BY 4.0). This volume is published and copyrighted by its editors. to rank recipes and deliver recommendations. Authors found that their improved matrix factorization algorithm outperformed the HealthRecSys’20, September 26, 2020, Online, Worldwide Pecune et al. content-based approach proposed by [6]. The extensive compar- 3.1 Recipe dataset ison performed in [24] confirms that the collaborative filtering We collected our recipes dataset from allrecipes.com with a web approaches performs better than content-based ones. The study crawler in April 2020, limiting ourselves to recipes that had been also reveals that the FSA score of the recipe was the most important reviewed by at least 10 users. We collected a total of 13,515 recipes. content feature, highlighting that people are usually consistent We chose allrecipes.com as it is one of most popular recipe websites in their eating habits. Most of these systems focus on delivering in terms of traffic, with 25 million unique visitors each month, and personalized recommendations matching a users profile. They do it provides nutritional information for most of its recipes. not intend to recommend recipes that not only match their users For each recipe, we collected: its title, image link, list of ingre- preferences but are also healthy. dients and quantities, preparation steps, preparation and cooking In [8], the authors try to solve this problem by extending their times, number of servings, nutritional information, ratings data previous recommendation algorithm [7], introducing a health bias (number of ratings, ratings min and max, average rating) and list based on the balance between the calories that the user needs and of comments (associated with a unique user name and a rating). To the calories of the recipes. Another system reconciling healthiness reduce data sparsity issues, we selected a subset of recipes/users so and personalization is [3], in which the authors propose a method to that recipes are rated at least 25 times and users who had rated at recommend healthy recipes based on a subset of ingredients given least 30 recipes. This results in a dense dataset of 1,169 recipes and by a user. The system first selects ingredients that are compatible 1,339 users for a total of 70,945 ratings. with the given subset, and associate an optimal quantity for each Similar to [5, 23], we used the standards provided by the Food of these ingredients. The system then generates a pseudo-recipe Standard Agency (FSA, UK) and the green, orange and red traffic- containing the ingredients with the healthiest nutritional value, light system to evaluate the healthiness of the recipes. The FSA before picking the existing recipe best matching the pseudo-recipe. provides standard ranges for low content (green), medium content DietOS [1] proposes a solution to manage specific health conditions (orange) or high content (red) of fat, saturates, sugar and sodium. by recommending ingredients matching its users’ health profile. The To calculate a health score, we assign to a recipe, for each of the fat, system also presents the nutritional properties for each ingredient saturates, sugar and sodium elements, one point if the element’s as well as their benefits regarding users health conditions’. In [23] quantity is within the low range, 2 for the medium range and 3 for the authors weighted the outcome of their Collaborative Filtering the high range. The health score therefore ranges from 4 (best) to algorithm based on the FSA and WHO scores associated with each 12 (worst). recipe. The accuracy of such system was lower compared to the The recipes in our dataset are rather unhealthy: the health score best unfiltered collaborative filtering algorithms, but still better ranges from 6 to 12, and 75.36% of the recipes have a health score than unfiltered algorithms such as MostPopularItem, UserKNN or of 8, 9 or 10. Only 2.31% of the recipes are healthy (green category). ItemKNN. Although these systems present interesting approaches That is consistent with the observations from [5] showing that most to reconcile health with users’ preferences, none of them were popular recipes tend to be unhealthy (high fat content). evaluated by real users. A subjective evaluation investigating users preferences towards healthy food is proposed in [5]. The authors first paired specific 3.2 Recommender systems recipes with their healthier version, i.e. similar recipes with health- To answer our research questions, we built a recommender system ier substituted ingredients following the method described in [20]. that takes into account both the users’ preferences and the healthi- Then, they showed participants the different pairs and asked the ness of the recipes. Users’ preferences are learned via collaborative latter to pick the one they preferred, and the one they considered filtering (CF), a popular approach that relies on user ratings and that to be the healthiest. Results demonstrated that people were less reports better results compared to content-based approaches [24]. inclined to pick the healthier recipe of the pair, but also how diffi- CF methods allow a recommender system to rank recipes according cult that was for participants to judge the healthiness of a recipe. to a score that represents how likely the recipe is to correspond to However, the recipes pairs were the same for all the participants the user’s preferences. and were not related to their preferences. Therefore, we focus on We used implicit feedback transferring all the ratings to positive the following research question: feedback from users, indicating a preference of the user for the RQ1: Are people willing to pick recommendations that are both rated recipes compared to the not-rated ones. The user ratings were healthy and match their preferences? then turned into confidence levels on how much the user actually RQ2: Does the presence of a healthy tag on the displayed recipes liked the rated recipe. This preference-confidence approach has have an influence on people’s decision making? shown to perform well [10]. We used the Implicit python library and tested three popular CF algorithms: Alternating Least Squares (ALS) [19], Bayesian Person- alized Ranking (BPR) [16] and Logistic Matrix Factorization (LMF) [12]. We also compared the performances of those algorithms with 3 MODEL a simple Most Popular recommender. We split our dataset into a To investigate our research questions, our first step was to collect a train, cross-validation and test sets to evaluate the performance recipe dataset we could use to build our recommender system. We in terms of AUC [18] of each algorithm. We ran each experiment describe our dataset and the recommender system we built in the 100 times and report the average values in Table 1. The best per- following section. forming algorithm is ALS, with a performance comparable to the A Recommender System for Healthy and Personalized Recipe Recommendations HealthRecSys’20, September 26, 2020, Online, Worldwide Algorithm AUC System 𝑤𝑝 𝑤ℎ Mean Most Common ALS 0.694 Pref 1 0 8.750 9 BPR 0.649 Healthy 0 1 6.000 6 Most popular 0.644 Hybrid-11 1 1 6.275 6 LMF 0.617 Hybrid-21 2 1 7.175 6 Table 1: Performance of the CF algorithms Hybrid-31 3 1 7.662 8 Table 2: Mean and most common value of FSA health scores of recipes recommended to a user by our recommender sys- tems for different values of 𝑤 𝑝 and 𝑤ℎ . one reported in [24] on a similar dataset. Our recommender system therefore relies on this algorithm to output, for each recipe and for each user, a preference score 𝑠 (𝑟, 𝑢)𝑝 ∈ [0, 1]. 4 EXPERIMENT Each recipe is also assigned a health score 𝑠 (𝑟 )ℎ ∈ [0, 1] that cor- responds to the normed FSA health score of the recipe as calculated To answer to our research questions RQ1 and RQ2, we designed in section 3.1 and is independent of users’ preferences. an experiment investigating how our system’s recommendation The preference and health scores are then combined to calculate algorithm and the presence of a healthiness tag in the recipe card a final score 𝑠 (𝑟, 𝑢) ∈ [0, 1] like in equation 1: influenced users’ recipe selection. 𝑠 (𝑟, 𝑢) = (𝑤 𝑝 × 𝑠 (𝑟, 𝑢)𝑝 + 𝑤ℎ × (1 − 𝑠 (𝑟 )ℎ ))/(𝑤 𝑝 + 𝑤ℎ ) (1) 4.1 Experimental Design For the sake of the experiment, we identified two different indepen- where 𝑤 𝑝 and 𝑤ℎ are weights to assign to the preference and health dent variables. The first one represents our system’s recommen- scores respectively. 𝑠 (𝑟, 𝑢) is then used to rank the recipes and give dation algorithm (Reco-Algo) as a between-subject independent a recommendation to the user. variable with three levels: a preference level (pref-reco) in which We then implemented three different recommender algorithms the system delivers recommendations matching users’ preferences, by adjusting the weights 𝑤 𝑝 and 𝑤ℎ . a health level (healthy-reco) in which the user only gets the health- iest recommendation, and a hybrid level (hybrid-reco) in which Preference-based recommender. The preference-based recommender the system biases the preference-based recommendations towards system only takes into account the preferences of the user, ignoring slightly healthier options. Those three levels correspond to the the health scores of the recipes; i.e. 𝑤ℎ = 0. In our pilot experi- three systems described in 3.2. The second between-subject vari- ments, the average health scores of the recipes recommended to able (Tag-Mode) represents whether the recipe card displayed to users with this system was 8.750 while most recipes recommended the user contains a tag representing how healthy the recipe is and had a health score of 9 (see Table 1). has two levels: a healthy-tag level (healthy-tag) in which such a Healthy recommender. As opposed to the preference-based rec- healthy tag is present, and a no-tag level (no-tag) in which the ommender, the healthy recommender ignores the preferences of the recipes do not contain any healthiness tags. user; i.e. 𝑤 𝑝 = 0. This system therefore always recommend healthy Our experiment has a 3x2 design with Reco-Algo and Tag- recipes according the FSA standards (i.e. FSA green category with Mode as between subject variables. In each of the six conditions, FSA health score of 6 or below). Our dataset contains 27 healthy participants followed the same procedure. After agreeing to partici- recipes, all of them associated with a health score of 6. Therefore pate to our study via a consent form, participants were presented our healthy recommender system randomly selects five recipes with a short description of the task. Each participant was then ran- amongst these 27 ones to recommend to the user. domly assigned to a group according to the different independent variables. The task consisted in two steps. In the preference elici- Hybrid recommender. To fulfil our objective of helping people to tation step, participants were asked to select five recipes that they gradually shift their eating behaviors towards healthier habits, the prefer amongst a list of thirty recipes as represented in fig.1. The system should take into account users’ preferences but recommends five selected recipes were sent as the input to our recommender sys- healthier recipes compared to the Preference-based recommender. tem which delivered five recommendations in return. The later five We tested different values for 𝑤 𝑝 and 𝑤ℎ and ran the system with recipes corresponding to the output of our recommendation system 10 users for each condition. Table 2 sums up, for each system, the were then presented to the participants during the recommenda- mean and most common health scores of the recipes recommended tion step along with 25 randomly selected recipes. The position of to the 10 users. Notice that with the Hybrid-11 and Hybrid-21 the recommended recipes on the grid was randomized. As in the systems, the most common health score value of recommended preference elicitation step, participants were asked to select the five recipes is 6, meaning that the systems mostly recommend healthy recipes they preferred. Once their choice was made, participants recipes. Yet, as explained in section 3.1, healthy recipes represent were asked how satisfied they were with their choice and how easy only 2.31% of our database. This strongly limits the possibilities of it was to make this choice. The answers for these two questions personalization based on users’ preferences and makes those two were 7-point Likert items (anchors: 0 = very dissatisfied/difficult, systems very similar to the healthy recommender. We therefore 6 = very satisfied/easy). We also asked participants what influenced decided to use the Hybrid-31 recommender system. them the most for their choice using an open-ended question. After HealthRecSys’20, September 26, 2020, Online, Worldwide Pecune et al. with 53% female and 48% male. The majority of participants (82%) was employed full-time. We conducted four different 3x2 factorial ANOVAs (i.e., analysis of variance) with Reco-Algo and Tag-Mode as between-subject factors. The dependent measures were the F1 score, selected recipes health score, participant’s satisfaction and participant’s perceived choice easiness. 5.1 F1 score The factorial ANOVA revealed a significant main effect of Reco- Algo (F(2, 112) = 8.251; p < .001) on the recommender system’s accuracy. There was no main effect of Tag-Mode (F(1, 112) = .945; p = .33) on the recommender system’s accuracy and the interaction between the two variables was not significant (F(2; 112) = 0.358; p = .7). For our follow-up analysis, post hoc comparisons after Bonferroni correction indicated that the mean score for both pref- reco (M=.235, std=.156) and hybrid-reco (M=.200, std=.164) were Figure 1: Example of the list of recipes as presented to the significantly better than the healthy-reco (M=.100, std=.129). This participants during the recommendation step of the exper- result shows that people are not likely to select healthy recipes if iment. The cards highlighted in green correspond to the these recipes do not match with their preferences/habits at all. recipes already selected by the participant. In the preference- To better understand our results, we looked at and compared the elicitation step, the healthy tag is not displayed. recipes recommended by our system and the recipes selected by the users. We observed that the recipes selected by participants were much more diverse than those recommended by our system. For the end of their task, participants took three surveys: one about example, our system recommended only chicken-based recipes to what is important to them when looking for a recipe online, another a participant who eventually selected two recipes containing meat one about their eating habits, and the last one is a demographics (chicken and pork), one vegetarian recipe and two desserts. As an questionnaire. objective similarity measure, we calculated for each user the cosine similarity 𝑠𝑖,𝑗 ∈ [0, 1] of every pair of recommended (resp. selected) 4.2 Measurements recipe titles 𝑅𝑖 and 𝑅 𝑗 , obtaining a 5x5 similarity matrix with 1s We measured four different constructs in our experiment. (a) We on the diagonal (i.e. when 𝑖 = 𝑗). We then averaged the values of relied on the F1 score to measure the performance of our recom- the similarity matrix, thus obtaining for each user one similarity mendation algorithm. The F1 score was computed by considering score 𝑠 ∈ [0, 1] for the recommended (resp. selected) recipes. The i) true positives as the recipes recommended by our system that average similarity value for all users for the recommended recipes were selected by the participants, ii) false positives as the recipes is 0.288 (std=0.093) and the average similarity value for all users for recommended by our system that were not selected by the partici- the selected recipes is 0.255 (std=0.053). A Student t-Test revealed pants, iii) false negatives as the recipes randomly chosen (i.e. not that the difference in similarity values of the recommended recipes recommended by our system) that were selected by the participants and the selected recipes is significant (t(117) = 3.969, 𝑝 < .001). and iv) true negatives as the recipes randomly chosen and that were The low F1 score obtained by all three recommender systems selected by the participants. (b) To measure the healthiness of the could therefore be explained by the lack of diversity in the recipes recipes selected by the participants, we calculated the average FSA recommended, which is coherent with [26] findings. health score for the five recipes they selected during the recom- mendation step. Given the nature of our experiment (i.e. selecting 5.2 Health Score items in a list) and based on the results from [26], we also measured There was no main effect of Reco-Algo (F(2, 112) = 1.858; p = (c) whether the participants were satisfied with the five recipes they .16) or Tag-Mode (F(1, 112) = .060; p = .81) on the selected recipes selected and (d) whether that was easy for them to select the five health score. The interaction between the two variables was not recipes they liked the most. significant (F(2; 112) = 2.362; p = .09). Although none of these results were significant, the interaction 5 RESULTS graph in Fig.2 depicts how the presence of a healthy tag on the We recruited 118 participants on Amazon Mechanical Turk. We recipe card had different effects on the health score of the selected required that participants had at least a 90% HIT acceptance rate on recipes depending on the conditions. In the pref-reco condition, at least 100 HITs. The recipes presented to the users used imperial people selected healthier recipes when the healthy-tag was dis- measurements, therefore, we restricted our evaluation to partici- played, unlike in the healthy-reco condition, in which people se- pants located in the U.S. Participants spent on average 6 minutes lected recipes that were less healthy when the healthy-tag was and 56 seconds (std=3 minutes and 47 seconds) on the task and displayed. There was almost no impact of the healthy-tag in the were paid USD1.20. Most participants were aged 29 to 47 years old, hybrid-reco condition. A Recommender System for Healthy and Personalized Recipe Recommendations HealthRecSys’20, September 26, 2020, Online, Worldwide Reco-Algo Tag-Mode Variable pref-reco hybrid-reco healthy-reco healthy-tag no-tag F1 score 0.235(±0.156) ∗∗∗ 0.200(±0.164) ∗ 0.100(±0.129) ∗∗∗/∗ 0.166(±0.172) 0.193(±0.147) Health score 9.185(±0.631) 8.965(±0.653) 8.889(±0.8427) 9.000(±0.735) 9.030(±0.706) Satisfaction 5.000(±0.961) 4.950(±0.782) 4.868(±1.234) 5.069(±0.814) 4.817(±1.142) Choice easiness 4.175(±1.412) 4.375(±1.314) 4.421(±1.222) 4.362(±1.347) 4.283(±1.290) Table 3: Summary of all means and standard errors (in parentheses) for the four dependent variables across conditions. The differences between the means are marked according to their level of significance (* for p < .05 and *** for p < .001) In our post-study questionnaire, nine participants of the healthy- (in orange), whereas the two other conditions introduced recipes tag condition mentioned that they were mostly influenced by the tagged in green (for the healthy-algo) or in red (for the pref-algo). healthy tag while choosing a recipe. This correlates with both i) a People who explicitly cared about the health tag were more likely significantly higher F1 score (t(12.3)=−2.7, 𝑝<0.05) for participants to choose recipes recommended by our system in the hybrid-reco who mentioned they were influenced by the healthy tag (M=.29, and health-reco conditions. That confirms the results found in [9] std=.15) compared to the other participants (M=.14, std=.17) and ii) and highlights that need to accurately infer people’s eating goals to a significantly lower health score of the selected recipes (t(11.4)=2.8, adapt the recommendation algorithm accordingly. Both hybrid and 𝑝<0.05) for participants who mentioned they were influenced by the preference-based recommender systems had a significantly better healthy tag (M=8.42, std=.67) compared to the other participants F1 scores than our healthy recommender system, which shows that (M=9.11, std=.70). Both the F1 score and the health score were people are not likely to select healthy recipes if these recipes do significantly different between the two Tag-Mode groups in the not match with their preferences/habits at all. recommendation phase but not in the preference elicitation phase, There is no significant evidence related to the impact of healthy confirming that people are poor judges of the healthiness of a recipe tags on participants’ decision making that would help us answer [5]. RQ2. Indeed, only nine people out of 60 explicitly stated they were influenced by the healthy tag in our post-evaluation questionnaire. 5.3 Perceived satisfaction and choice easiness However, the results described in section 5.2 suggest that although people are more likely to pick healthier recipes compared to what There was no main effect of Reco-Algo (F(2, 112) = 0.171; p = .84) they would usually pick when informed about recipes healthiness, or Tag-Mode (F(1, 112) = 1.850; p = .18) on participants’ satisfaction. they are less likely to pick recipes tagged as very healthy. In other The interaction between the two variables was not significant (F(2; words, people will avoid recipes tagged as unhealthy (in red) as well 112) = 0.186; p = .83). Overall, participants were more satisfied with as recipes tagged as healthy (in green). The first part can partially their choices when the recommended recipes matched their prefer- be explained by the fact that people usually associate a feeling of ences. The presence of a healthy tag on the recipe card increased guilt with unhealthy food consumption [11]. Thus, people are less satisfaction regardless of the recommendation algorithm. inclined to pick unhealthy recipes if they are explicitly informed Regarding the perceived ease of use, there was no main effect about their unhealthiness. The second part can be explained by of Reco-Algo (F(2, 112) = 0.391; p = .68) or Tag-Mode (F(1, 112) the "healthy = less tasty" effect which describe how people tend to = .110; p = .74) on participants’ perceived choice easiness. The associate healthy food with low tastiness [15]. Hence, we assume interaction between the two variables was not significant (F(2; 112) that participants in our experiment were less inclined to pick recipes = 1.697; p = .19). Although the presence of a healthy tag lowered explicitly tagged as healthy because they thought such recipes the perceived difficulty in both the healthy-reco and the pref-reco would be tasteless. Overall, people were more satisfied with their conditions, such tag made the selection more difficult for people choices when informed about the recipes’ healthiness. who were recommended recipes in the hybrid-reco conditions. 5.4 Discussion 6 CONCLUSION To answer to our research question RQ1, our results show that In this paper, we investigated whether introducing a healthy bias in people are slightly less inclined to select recommendations coming a recipe recommendation algorithm, and displaying a healthy tag on from our hybrid recommender compared to the preference-based recipe cards would have an influence on people’s decision making. one. However, although the difference is minimal when no tags Our results show that a the performance of a recommender system are displayed on the recipes, the presence of healthy tags accen- able to combine healthiness with personalization depends on its tuates the difference. The negative impact of the healthy tags on users eating goals. People already interested in eating healthy are the F1 score of our hybrid algorithm can be linked to the choice more likely to select recipes coming from such a recommendation easiness. Unlike the healthy and preference-based conditions, the system. For the others, our results also suggest that adding a simple healthy tags made it more difficult for participants to select five yet accurate tag depicting how healthy recipes are might help them recipes in the hybrid condition. One potential explanation is that to select healthier recipes compared to what they would usually all recipes in the hybrid condition had very similar health scores select. Explicitly informing people how unhealthy some recipes HealthRecSys’20, September 26, 2020, Online, Worldwide Pecune et al. a b c d Reco-Algo Tag-Mode Tag-Mode Tag-Mode Tag-Mode Figure 2: Interaction graphs between Reco-Algo and Tag-Mode regarding the average ○ a F1 score, ○ b health score ○ c satisfaction and ○ d choice easiness. are might help them to consciously change their eating habits and [10] Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative filtering for prevent them to pick unhealthy recipes. implicit feedback datasets. In 2008 Eighth IEEE International Conference on Data Mining. Ieee, 263–272. One potential extension of this work would be to combine our CF [11] JungYun Hur and SooCheong Shawn Jang. 2015. Anticipated guilt and pleasure approach with a knowledge-based approach to have more control in a healthy food consumption context. International Journal of Hospitality Management 48 (2015), 113–123. over the diversity of the recommended recipes. As explained in [12] Christopher C Johnson. 2014. Logistic matrix factorization for implicit feedback section 5.1, a diverse set of recommendations can positively impact data. (2014), 78 pages. users’ experience [26]. The results of a CF-based algorithm could [13] Mansura A Khan, Ellen Rushe, Barry Smyth, and David Coyle. 2019. Personalized, Health-Aware Recipe Recommendation: An Ensemble Topic Modeling Based for instance be post-filtered to force the presence of different cat- Approach. arXiv preprint arXiv:1908.00148 (2019). egories of recipes (e.g. main, vegetarian, dessert) and/or different [14] Florian Pecune, Shruti Murali, Vivian Tsai, Yoichi Matsuyama, and Justine Cassell. ingredients (e.g. chicken, pork) in the list of recommended recipes. 2019. A model of social explanations for a conversational movie recommenda- tion system. In Proceedings of the 7th International Conference on Human-Agent The integration of a knowledge-based approach could be done Interaction. 135–143. by building a conversational recommender system asking specific [15] Rajagopal Raghunathan, Rebecca Walker Naylor, and Wayne D Hoyer. 2006. The unhealthy= tasty intuition and its effects on taste inferences, enjoyment, and questions about users requirements. Appropriate conversational choice of food products. Journal of Marketing 70, 4 (2006), 170–184. skills can also improve users’ experience as well as people’s per- [16] Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. ception of recommended items [14]. Furthermore, such a conver- 2012. BPR: Bayesian personalized ranking from implicit feedback. arXiv preprint arXiv:1205.2618 (2012). sational approach could also help us to know whether users are [17] Benjamin Scheibehenne, Rainer Greifeneder, and Peter M Todd. 2010. Can there initially interested in eating healthy so that the system could adapt ever be too many options? A meta-analytic review of choice overload. Journal of its recommendations consequently. consumer research 37, 3 (2010), 409–425. [18] Gunnar Schröder, Maik Thiele, and Wolfgang Lehner. 2011. Setting goals and choosing metrics for recommender system evaluations. In UCERSTI2 workshop at the 5th ACM conference on recommender systems, Chicago, USA, Vol. 23. 53. [19] Gábor Takács and Domonkos Tikk. 2012. Alternating least squares for personal- REFERENCES ized ranking. In Proceedings of the sixth ACM conference on Recommender systems. [1] Giuseppe Agapito, Mariadelina Simeoni, Barbara Calabrese, Ilaria Caré, Theodora 83–90. Lamprinoudi, Pietro H Guzzi, Arturo Pujia, Giorgio Fuiano, and Mario Cannataro. [20] Chun-Yuen Teng, Yu-Ru Lin, and Lada A Adamic. 2012. Recipe recommendation 2018. DIETOS: A dietary recommender system for chronic diseases monitoring using ingredient networks. In Proceedings of the 4th Annual ACM Web Science and management. Computer methods and programs in biomedicine 153 (2018), Conference. 298–307. 93–104. [21] Thi Ngoc Trang Tran, Müslüm Atas, Alexander Felfernig, and Martin Stettinger. [2] Devis Bianchini, Valeria De Antonellis, Nicola De Franceschi, and Michele Mel- 2018. An overview of recommender systems in the healthy food domain. Journal chiori. 2017. PREFer: A prescription-based food recommender system. Computer of Intelligent Information Systems 50, 3 (2018), 501–526. Standards & Interfaces 54 (2017), 64–75. [22] Christoph Trattner and David Elsweiler. 2017. Food recommender systems: [3] Meng Chen, Xiaoyi Jia, Elizabeth Gorbonos, Chnh T Hong, Xiaohui Yu, and Yang important contributions, challenges and future research directions. arXiv preprint Liu. 2019. Eating healthier: Exploring nutrition information for healthier recipe arXiv:1711.02760 (2017). recommendation. Information Processing & Management (2019), 102051. [23] Christoph Trattner and David Elsweiler. 2017. Investigating the healthiness [4] Sally Jo Cunningham and David Bainbridge. 2013. An analysis of cooking queries: of internet-sourced recipes: implications for meal planning and recommender Implications for supporting leisure cooking. (2013). systems. In Proceedings of the 26th international conference on world wide web. [5] David Elsweiler, Christoph Trattner, and Morgan Harvey. 2017. Exploiting food 489–498. choice biases for healthier recipe recommendation. In Proceedings of the 40th [24] Christoph Trattner and David Elsweiler. 2019. An Evaluation of Recommendation international acm sigir conference on research and development in information Algorithms for Online Recipe Portals.. In HealthRecSys@ RecSys. 24–28. retrieval. 575–584. [25] Tsuguya Ueta, Masashi Iwakami, and Takayuki Ito. 2011. Implementation of a [6] Jill Freyne and Shlomo Berkovsky. 2010. Intelligent food planning: personalized goal-oriented recipe recommendation system providing nutrition information. recipe recommendation. In Proceedings of the 15th international conference on In 2011 International Conference on Technologies and Applications of Artificial Intelligent user interfaces. ACM, 321–324. Intelligence. IEEE, 183–188. [7] Mouzhi Ge, Mehdi Elahi, Ignacio Fernaández-Tobías, Francesco Ricci, and David [26] Martijn C Willemsen, Mark P Graus, and Bart P Knijnenburg. 2016. Understanding Massimo. 2015. Using tags and latent factors in a food recommender system. the role of latent feature diversification on choice difficulty and satisfaction. User In Proceedings of the 5th International Conference on Digital Health 2015. ACM, Modeling and User-Adapted Interaction 26, 4 (2016), 347–389. 105–112. [27] Longqi Yang, Cheng-Kang Hsieh, Hongjian Yang, John P Pollak, Nicola Dell, [8] Mouzhi Ge, Francesco Ricci, and David Massimo. 2015. Health-aware food Serge Belongie, Curtis Cole, and Deborah Estrin. 2017. Yum-me: a personalized recommender system. In Proceedings of the 9th ACM Conference on Recommender nutrient-based meal recommender system. ACM Transactions on Information Systems. 333–334. Systems (TOIS) 36, 1 (2017), 7. [9] Morgan Harvey, Bernd Ludwig, and David Elsweiler. 2013. You are what you eat: Learning user tastes for rating prediction. In International Symposium on String Processing and Information Retrieval. Springer, 153–164.