Learning user tastes: a first step to generating healthy meal plans? Morgan Harvey Bernd Ludwig David Elsweiler Computer Science (i8) Institute for Information and Institute for Information and Uni of Erlangen-Nuremberg Media, Language and Culture Media, Language and Culture 91058 Erlangen, Germany University of Regensburg University of Regensburg morgan.harvey@cs.fau.de 93053 Regensburg, Germany 93053 Regensburg, Germany bernd.ludwig@ur.de david@elsweiler.co.uk ABSTRACT do recognise a problem, they lack the requisite nutritional Poor nutrition is fast becoming one of the major causes of understanding to implement positive dietary changes [4]. ill-health and death in the western world. It is caused by a Therefore many people could benefit from assistance that variety of factors including lack of nutritional understand- allows them to strike a balance between a diet that is healthy ing leading to poor choices being made when selecting which and will keep them well and one that is appealing and they dishes to cook and eat. We wish to build systems which can will want to eat. After all, it is no good providing users with recommend nutritious meal plans to users, however a crucial healthy diet plans if they do not cook and eat the dishes pre-requisite is to be able to recommend dishes that people therein, but instead choose unhealthy meals which are more will like. In this work we investigate key factors contributing appealing to them. to how recipes are rated by analysing the results of a long- We believe this is a problem for which recommender sys- term study (n=123 users) in order to understand how best tems are ideally suited. If systems can predict dishes that to approach the recommendation problem. In doing so we the user would actually like to eat, this could be combined identify a number of important contextual factors which can within a system modelling expert nutritional knowledge to influence the choice of rating and suggest how these might provide meal recommendations that are both healthy and be exploited to build more accurate recipe recommender sys- nutritious, but also appealing. Furthermore complete meal tems. We see this as a crucial first step in a healthy meal plans for individual users corresponding to nutritional guide- recommender. We conclude by summarising our thoughts on lines given by experts could be generated algorithmically how we will combine recommended recipes into meal plans which would suit the user’s personal tastes. In this paper based on nutritional guidelines. we work towards these goals via the following main contri- butions: 1. INTRODUCTION AND MOTIVATION • We collect recipe ratings data in context, in a natural- In the modern developed world people have the luxury of istic setting over a relatively long time period an abundance of choice with regard to the food they eat. While huge choice offers many advantages, making the de- • Users not only provide ratings data, but specify the cision of what to eat is not always straightforward, is influ- reasons behind their rating (i.e. the content and con- enced by several personal and social factors [11] and can be textual features that led them to rate in this way) complex to the point of being overwhelming [15]. • We analyse the collected data to determine which fac- The evidence suggests that many people are making poor tors might help us to better understand a user’s pref- dietary choices with stark consequences for their health and erences well-being. Societal problems such as obesity [19], diabetes [18] and hypertension [14] are all becoming more preva- • We discuss how these factors could be utilised to build lent, and these conditions are strongly linked to poor di- systems which combine recipes into complete meal plans etary habits. The nutritional science literature indicates and the challenges this may present that these kinds of conditions can be prevented and some- times even reversed through positive nutritional change [12]. These contributions all relate to the first aim of our work, Two issues, though, are that people are generally poor at that is, to better predict which recipes appeal to a given judging the healthiness of their own diet [8] and even if they users and are therefore likely to prepare and eat. We con- clude the paper by outlining our plans for future work, sum- marising some ideas on how we may combine recipe recom- mendations into sensible meal plans. Paper presented at the Workshop on Recommendation Technologies for 2. RELATED WORK Lifestyle Change 2012, in conjunction with the 6th ACM conference on The task of understanding user preferences and suggesting Recommender Systems. Copyright c 2012 for the individual papers by the appropriate recipes from a collection can be seen as a novel papers’ authors. This volume is published and copyrighted by its editors. variant of the well-researched recommender system problem Lifestyle @RecSys’12, September 13, 2012, Dublin, Ireland [13, 7]. Although food recommendation is not a frequently studied domain, there is a small body of appropriate related work. Early attempts to design automated systems to plan or recommend meals include CHEF [5] and JULIA [6]. Both of these systems utilise case-based planning to plan a meal to satisfy multiple, interacting constraints. [16] presented a hybrid recommender using fuzzy reasoning to recommend recipes; [9] recommended new food products to supermar- ket customers, and [17] proposed a system that recommends food items based on recipes recommended to groups of users, clustered by labels. More recent efforts have tried to better understand the user’s tastes and improve recipe recommendations by break- ing recipes down into individual ingredients. Freyne and her colleagues [1, 2, 3] demonstrate that this approach works well, with clear improvements over standard collaborative filtering approaches. We wish to build on the success of this work to explore if other content and contextual factors influ- ence the ratings that people assign to recommended recipes. It is our hypothesis that the process of rating a recipe is complex and several factors will combine to determine the rating assigned, beyond purely the user’s tastes and that these tastes must be carefully modelled. Both negative and positive ratings could be taken into account, for example: the user may really dislike tomatoes so all recipes with this ingredient might be poorly rated. Furthermore, not just the existence or absence of explicit ingredients in a recipe but also combination of those ingre- dients could be important, as could the complexity of the recipe and how long it might take to prepare. Other fac- tors such as how well the preparation steps are described Figure 1: Screenshot of part of the user interface and perhaps the nutritional properties of the dish and the availability of ingredients could have a bearing on the user’s opinion of the recommendation. We believe that by building ing used to determine which meals should be recommended recommender algorithms that incorporate or exploit these for which time period. This is important because, in con- kinds of aspects we will be better able to accurately predict trast to previous data collection methods, the user is not ratings. However we also believe that it is vitally impor- only rating the recipe with respect to how appealing it is, tant that such factors can be automatically ascertained from but also how suitable the recipe is given a specific context. ratings data rather than replying on the users themselves. Approximately 3 main meals were recommended for every By doing so users can be left to focus on the task of rating recommended breakfast. recipes and the amount of potentially misleading bias can be In addition to collecting ratings, the web interface offered minimised. Below we describe how data was collected and the users the chance to explain their ratings by clicking ap- analysed to understand how content and contextual factors propriate check boxes representing different reasons. These may influence the way a recipe is rated. check boxes were grouped into reasons to do with personal preferences, reasons related to the healthiness of the recipe 3. DATA COLLECTION and reasons related to the preparation of the recipe – see Fig- To collect data we developed a simple food recommender ure 1. Reasons contributing positively to the ratings were system, which selected recipes from a pool of 912 Internet- shown on the right-hand side of the screen and negative sourced recipes. This number was chosen as we believe it reasons to the left. The listed explanations were generated represents a good balance providing a sufficient variety of through a small user study, whereby 11 users rated recipes dishes from which we may later be able to derive plans and explained their decisions in the context of an interview. whilst, at the same time, being small enough that the result- The web interface also provided a free-text box for reasons ing ratings matrix will not be too sparse. Users were given not covered by the checkboxes, however this was only very a personalised URL and when this was accessed, they were infrequently used. We did not record any information re- presented with a recipe, selected at random from a list fil- garding whether or not the recipe was later cooked or eaten. tered to match a very basic profile. For example, users who We were concerned simply by how appealing the recipe was specified being vegetarian were only recommended recipes to the user in the occurring context. with meta-data indicating no meat; lactose intolerant users After publicising the system on the Internet, through mail- were not suggested recipes with milk, etc. Users were not ing lists and twitter, 123 users from 4 countries provided made aware of the random nature of these “recommenda- 3672 ratings over a period of 9 months. The user popula- tions” and were under the impression that the choices were tion grew organically over time with some users only using tailored to them. The web page invites the user to provide the system actively for a few weeks and others for longer pe- a rating for the recipe in context i.e. either as a main meal riods - the kind of behaviour you would expect with a real or breakfast for the following day, with recipe meta-data be- system. We argue that although this is a relatively small Rating 0 1 2 3 4 5 reasons at least once had a click on a health reason. Count 61 818 609 822 828 534 To help understand the relationships between the clicked % 1.66 22.22 16.54 22.32 22.76 14.5 factors and between the factors and the submitted rating we trained a number of linear models. The final model con- Table 1: Breakdown of ratings tained 23 factors in total with 17 factors which were sig- nificant i.e. the coefficient estimate is more than 2 stan- dard errors away from 0. Highly significant factors (all p- and sparse data set, it is an improvement on previous recipe value  0.01) included the combination of ingredients in the ratings data collection methods, which have used mechanical recipe, whether the recipe would be suitable for vegetarians, turk (where there are no validity controls) [1, 3] and surveys how well the users felt the recipe fitted their own tastes and where participants rate large numbers of recipes or ingre- if the recipe contained a specific ingredient the user partic- dients in a single session [2]. While surveys can offer the ularly likes. All of these significant indicators point to the chance to collect data on general user preferences in short content of the recipes (in terms of ingredients) being highly time periods, they cannot account for factors, such as food significant factors in the choice of rating and also suggest in availability, preparation and cooking time, previously eaten many cases that this is dependent on the individual tastes of meals etc., that would influence ratings if a recipe recom- the users. This endorses the approach of Freyne et al., who mender was to be used in the wild. tried to model ingredient preferences in their work. Nev- Our dataset also differs from previous work in terms of ertheless, the fact that ingredient factors can have both a matrix density. The number of ratings per user follows a positive and negative influence on ratings and that the com- Zipfian distribution (median = 7, mean = 29.93 max = 395 bination of ingredients can be important, suggests that more min =1; 18 users have 1, 52 have 10+). Whereas previous complicated models may be able to better exploit ingredient food recommender papers report user - ratings densities of information when calculating predictions. between 22% and 35% [1, 2, 3], our dataset exhibits a user- Other important factors included whether to not the recipe rating density of 3.28%, which we believe to be much more would be easy to prepare and whether it suited the time of realistic and more in line with standard recommender sys- day specified (i.e. breakfast or main meal) and if the user al- tems collections such as movielens and netflix. In terms of ready had the necessary ingredients at home. Interestingly, ratings per recipe, our collection has a median 3 ratings per given the importance of how easy the recipe is to prepare recipe (mean = 4.04, max=14, min=2). Table 3 shows the was, the perceived time required to cook the recipe was not breakdown of ratings (ratings of 0 were discounted as they a significant factor. This highlight the complexity of the de- were marked as not being suitable as a full meal). cision process and the number of factors - context-related Our dataset is, therefore, not only realistic in terms of and content related - which influence how a recipe is rated. size, but also a suitable platform for investigation and ex- A number of factors related to how healthy the user per- perimentation as it is both sparse and variant in terms of ceived the recipe to be including if the user felt it would be ratings (sd = 1.41). light and easy to digest and if the user felt it was too un- healthy. In general these health factors did not contribute 4. EXPLORATORY ANALYSIS significantly to the predictive power of the linear models for all of the ratings together, however we wanted to under- To learn about the decision process undertaken when users stand if they might help predict ratings on a per-user basis. rate recipes, as well as the factors that influence this pro- We looked at the correlation between calorie and fat con- cess, we analysed the reasons provided by the users when tent of recipes and the ratings provided by two groups of they rated. The aim here was take inspiration for the devel- users, those had clicked on a health related factor once or opment of new and improved recommendation models. Fig- more (Care-about-Health, n = 53, 2572 ratings), and those ure 2 shows the frequency with which users indicated that who never clicked on a health reason (Don’t-Care-About particular reasons had influenced the rating they assigned. Health, n = 70, 1110 ratings)1 . Figures 3 and 4 show clear This figure demonstrates the complexity of the process with differences between the rating behaviour exhibited in these several factors - both context and content related - being groups. There is a clear trend that the higher the fat con- indicated as being influential. Given that the focus of this tent of recipes (r2 =0.88, p=0.012) or the higher the calorific work is to inform the development of recipe recommender content (r2 =0.87,p=0.022), the lower users in Care-about- systems, we focus primarily on factors which could be de- Health group tend to rate the recipe. This trend is not termined automatically by a system present in the second group. If anything there seems to be The most common reasons for negatively rating a recipe a slight tendency toward the reverse trend whereby recipes (shaded grey in the figure) were that the recipe contained a higher in fat (r2 = 0.230,p = 0.643) and calories (r2 = 0.73, particular disliked ingredient, the combination of ingredients p = 0.064) tend to be assigned a higher rating. This obser- did not appeal, or the recipe would take too long to prepare vation suggests that accounting for nutritional factors will and cook. The most common reasons for rating a recipe allow more accurate recommendations to be generated. positively (shaded white) had to do with ease or quickness To summarise, these analyses of the collected data demon- of preparation, the type of dish or the recipe being novel strate the complexity of deciding how suitable a recipe will or interesting. Health related reasons, such as the recipe be to cooked in the near future. The results also hint that containing too many calories, the user not perceiving the several factors could be exploited in recommendation algo- recipe as being healthy enough, or positive factors like the rithms for recipe recommendations. recipe being balanced or easily digestible were clicked less often overall. However, further analysis revealed that these were clicked very frequently for a particular subset of users. 1 Nutritional content of recipes was calculated using the sys- 16.3% of the recipes rated by users who clicked on health tem as described in [10]. 100 200 300 400 0 t . a sy .t l Ty red r y ea s g r se Ty . d C of r. C of t p e e e ail. il. la e ep e r bi ea ig ay r e sh r e d i sh o ef. ep n . a n ila lth ef ie Fa ce Ex tim im Ba abl pe ing Pr Nov Pr atio g va a pe ie In erti D fd i om M or To t pr pr Su Ti . av E d m n t al . es si o n nt v H n C gr g In m In ur ur C Figure 2: Reasons given for ratings Healthy group Unhealthy group Healthy group Unhealthy group 1.66 Calories per gram 0.95 Fat component 1.62 0.90 1.58 1.54 0.85 1 2 3 4 5 1 2 3 4 5 Rating Rating Figure 3: Influence of Calorific Content on Ratings Figure 4: Influence of Fat Content on Ratings either a positive or a negative influence on the rating. For 5. BUILDING ON THESE RESULTS example, if the user likes tomatoes and a recipe contains In the previous section we uncovered several patterns in this ingredient it would be a reason for a high rating. On the data indicating that building recommendation algorithms the other hand, however, if a user does not like tomatoes, able to account for specific content or contextual features our data shows this will negatively affect the recipe rating. may enable more accurate prediction of recipe ratings. Two Previous recommender algorithms do not account for this important open questions are 1) how can we derive these negative bias and we believe, based on our results, that in- contextual variables in real-life settings without asking the cluding this would improve prediction accuracy. Future rec- user to explicitly define their context? And 2) how can we ommender models may also account for how important an best incorporate such features into recommendation models? ingredient is to a dish. For example, imagine a user who does We outline some of our thoughts on these points below: not like tomatoes. For his rating of a recipe where tomato The reasons given by users in our study and the corre- is merely a garnish, this may not have a large influence on sponding ratings suggest the ingredients contained within a the rating. However, if the tomato is a vital ingredient in recipe are very important to the rating process. This finding the recipe e.g. in a tomato soup, then it is more likely to endorses the approach of Freyne and her colleagues. How- have a large influence. ever, it is clear from our data that ingredients can have Another point to consider with respect to ingredients is the coverage of particular ingredients within a collection. activity patterns. The WHO guidelines provide a means to For example, Freyne et al.’s algorithm deals with ratings calculate recommended calorie intake based on a user’s pro- for individual ingredients. This means if egg is rated highly file, as well as a breakdown of the percentage of energy that egg-white will be not be treated in the same way. This is should come from different types of sources (proteins, fats, exacerbated in our case by the fact that our recipes are web- carbs, fibre etc.) sourced and may have vocabulary mis-match issues. These One way of modelling this situation is to view it as a graph kinds of relationships between terms could be identified via problem, where the shortest pathes should be computed in a instances of nth order co-occurrence. This could be achieved graph where nodes correspond to meals. A week with three via the use of dimensionality reduction techniques such as meals per day would be represented by a graph with 7 * singular value decomposition. 3 nodes where edges correspond to dishes (e.g. spaghetti Reducing the dimensionality of the feature space would carbonara is an edge from breakfast today to lunch today). likely have other advantages with respect to dealing how A possible cost function could be the distance from the in- ingredients are combined in a recipe. Our data show that take estimated from the ingredients and the portion size the combination of ingredients can influence the rating ap- compared to the recommended daily value. Evaluating the plied to a recipe. For example, a user may rate recipes with output of such algorithms will be a challenge beyond al- tomato highly and recipes with pineapple similarly highly on gorithmics and will involve collaboration with nutritional average. However, recipes which combine these ingredients scientists working on on the project. may be given a very low rating. On the other hand, tomato and basil are a combination that work well together and this Acknowledgements may have an extra positive influence on the data. Dimen- The authors would like to thank Mario Amrehn and Stefanie sionality reduction techniques, such as SVD or Bayesian La- Mika for their hard work with the data collection. tent Variable models, should implicitly deal with these kinds of patterns. 7. REFERENCES Our analyses further suggest that including nutritional in- [1] J. Freyne and S. Berkovsky. Intelligent food planning: formation in recommendation models should allow more ac- personalized recipe recommendation. In IUI ’10, pages curate prediction of ratings. We identified two groups of 321–324, New York, NY, USA, 2010. ACM. users who behaved very differently based on whether or not they at some point checked that the healthiness of a recipe as [2] J. Freyne and S. Berkovsky. Intelligent food planning: an explanation for a rating. The “healthy group” tended to personalized recipe recommendation. In IUI ’10, pages assign a lower rating to recipes higher in calorie and fat con- 321–324, New York, NY, USA, 2010. ACM. tent, while the “unhealthy group” displayed, if anything, the [3] J. Freyne, S. Berkovsky, and G. Smith. Recipe opposite predisposition. The group to which a user should recommendation: accuracy and reasoning. In Proc. be assigned could be obtained explicitly from the user or, UMAP, UMAP’11, pages 99–110, Berlin, Heidelberg, preferably, could be learned from ratings data. For exam- 2011. Springer-Verlag. ple, recipes could be assigned a healthiness score based on [4] J. F. Guthrie, B. M. Derby, and A. S. Levy. America’s nutritional guidelines from health experts and learn which Eating Habits: Changes and ConsequencesAgriculture group a user belongs to based on the way they rate recipes Information Bulletin No. (AIB750), pages 243–280. with high or low health scores. We acknowledge that the US Department for Agriculture, 1999. nutrition-aware models may improve performance by offer- [5] K. Hammond. Chef: A model of case-based planning. ing unhealthy dishes to the users that prefer such dishes and In Proceedings of the National Conference on AI, 1986. this could be against our long-term goals. We would, how- [6] T. Hinrichs. Strategies for adaptation and recovery in ever, deal with this issue when combining recipes into meal a design problem solver. In Proceedings of the plans as explained below. Workshop on Case-Based Reasoning, 1989. [7] D. Jannach, M. Zanker, A. Felfernig, and G. Friedrich. 6. CONCLUSIONS AND FUTURE WORK Recommender systems – An introduction. Cambridge In this paper we have investigated the decisional process Univ. Press, 2011. involved in rating recommended recipes. We collected rat- [8] G. Johansson, A. Wikman, A. M. Ahrn, G. Hallmans, ings data for recipes and context and statistically analysed and I. Johansson. Underreporting of energy intake in the reasons behind assigned ratings. Our future goals in the repeated 24-hour recalls related to gender, age, weight, short term include building on this work to design models day of interview, educational level, reported food that better predict user food preferences using the ideas sug- intake, smoking habits and area of living. Public gested above. We are continuing to collect data and hope to Health Nutrition, 4(4):919–27, 2001. investigate how performance of models change as the collec- [9] R. D. Lawrence, G. S. Almasi, V. Kotlyar, M. S. tion size increases. Viveros, and S. S. Duri. Personalization of The presented work represents a single component in a supermarket product recommendations. Data Min. much larger project aimed at building recommender sys- Knowl. Discov., 5(1-2):11–32, January 2001. tems that promote healthier dietary choices. In the longer [10] M. Müller, M. Harvey, D. Elsweiler, and S. Mika. term we plan to move beyond the recommendation of recipes Ingredient matching to determine the nutritional in isolation to recommending dietary plans (7 - 30 days). properties of internet-sourced recipes. In Pervasive This involves recommending sequences of recipes under con- Health 2012, 2012. straints. These constraints will include user preferences of [11] M. Nestle, R. Wing, L. Birch, L. DiSogra, combining recipes and nutritional knowledge, such as the A. Drewnowski, S. Middleton, M. Sigman-Grant, daily recommended intake suggested by the WHO, and user J. Sobal, M. Winston, and C. Economos. Behavioral and social influences on food choice. Nutrition Reviews, 56(5):50–64, 1998. [12] D. Ornish, S.E. Brown, J.H. Billings, L.W. Scherwitz, W.T. Armstrong, T.A. Ports, S.M. McLanahan, R.L. Kirkeeide, K.L. Gould, and R.J. Brand. Can lifestyle changes reverse coronary heart disease?: The lifestyle heart trial. The Lancet, 336(8708):129 – 133, 1990. [13] F. Ricci, L. Rokach, B. Shapira, and P.B. Kantor, editors. Rec. Systems Handbook. Springer, 2011. [14] P. Scarborough, P. Bhatnagar, K. Wickramasinghe, K. Smolina, C. Mitchell, and M. Rayner. Coronary heart disease statistics. British Heart Foundation Health Promotion Research Group Department of Public Health, University of Oxford, 2010. [15] B. Scheibehenne, R. Greifeneder, and P. M. Todd. Can there ever be too many options? a meta-analytic review of choice overload. Journal of Consumer Research, 37:409–425, 2010. [16] J. Sobecki, E. Babiak, and M. Slanina. Application of hybrid recommendation in web-based cooking assistant. In Knowledge-Based Intelligent Info. and Engineering Sys., pages 797–804. Springer, 2006. [17] M. Svensson, J. Laaksolahti, K. Höök, and A. Waern. A recipe based on-line food store. In 5th Int. Conf. on Intelligent User Interfaces, IUI ’00, pages 260–263, New York, NY, USA, 2000. ACM. [18] Diabetes UK. Reports and statistics on diabetes prevalence. [19] WHO. World health organization: Chronic disease information sheet http://www.who.int/mediacentre/factsheets/fs311/ (accessed feb 14th 2012).