The Cholesterol Factor: Balancing Accuracy and Health in Recipe Recommendation Through a Nutrient-Specific Metric∗ ALAIN STARKE, Wageningen University & Research, The Netherlands and University of Bergen, Norway CHRISTOPH TRATTNER, HEDDA BAKKEN, MARTIN JOHANNESSEN, and VEGARD SOLBERG, University of Bergen, Norway Whereas many food recommender systems optimize for users’ preferences, health is another but often overlooked objective. This paper aims to recommend relevant recipes that avoid nutrients that contribute to high levels of cholesterol, such as saturated fat and sugar. We introduce a novel metric called ‘The Cholesterol Factor’, based on nutritional guidelines from the Norwegian Directorate of Health, that can balance accuracy and health through linear re-weighting in post-filtering. We tested popular recommender approaches by evaluating a recipe dataset from AllRecipes.com, in which a CF-based SVD method outperformed content-based and hybrid methods. Although we found that increasing the healthiness of a recommended recipe set came at the cost of Precision and Recall metrics, only putting little weight (10-15%) on our Cholesterol Factor can significantly improve the healthiness of a recommendation set with minimal accuracy losses. Additional Key Words and Phrases: Recipes, Recommender Systems, Health, Offline Evaluation, Nutrients 1 INTRODUCTION Most food recommender systems to date focus on recommending foods that users like [47]. This includes content-based approaches that are based on historical data, such as by suggesting dairy products to a user if she has previously bought milk. Even though individual ingredients are considered this way [9], the actual nutritional needs of users are often not incorporated [48]. In fact, dietary constraints of users that stem from underlying health conditions, such as hypertension and high levels of cholesterol, have not received much attention to date [37, 47]. Over 90 million adults (20 years or older) in the United States in 2020 have cholesterol levels of 200 mg/dL or higher [51], which is considered unhealthy. Among them, more than 35 million have levels of 240 mg/dL or higher, which puts them at risk for heart disease. Such persons are commonly advised to change their exercise regimen and to attain healthier eating habits. With regard to food recommendation, this would require an approach that incorporates the nutritional content of the recommended internet-sourced recipes [43, 46, 49]. However, an important pitfall is that multiple studies have observed that popular recipes tend to be unhealthy [46], which applies to internet-sourced recipes [29, 35], as well as to popular recipes in other media [30, 39]. This, in turn, can lead to a popularity bias in a recommender system that is at odds with the objective of healthy recipe recommendation (cf. [1]). This work-in-progress aims to increase the healthiness of recipe recommendations, while maintaining a decent level of accuracy. We focus on mitigating nutrient intake that is associated to high levels of cholesterol, by introducing a ‘cholesterol factor’, a metric that is based on nutritional guidelines to limit fat, saturated fat, and sugar intake. We apply this in a post-filtering approach to refine an initial set of recipe recommendations, based on competing objectives (cf. [33]), to avoid recipes that contain high levels of nutrients that are associated with high cholesterol. Whereas some multi-objective optimization approaches are tedious to implement and require a lot of computational power [32, 54], we set out a simple post-filtering or post-processing approach that involves linear multiplication (cf. [46]). This is ∗ Copyright 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). Presented at the MORS workshop held in conjunction with the 15th ACM Conference on Recommender Systems (RecSys), 2021, in Amsterdam, Netherlands. 1 MORS 2021, September, 2021, Amsterdam, NL Starke et al. consistent with previous studies that, among others, seek to mitigate popularity bias or increase recommendation diversity through a post-filtering re-ranking [1–4]. Through offline evaluation, we assess the accuracy of three recipe recommendation approaches: collaborative filtering (CF), content-based (CB), and hybrid. We use our Cholesterol Factor in the most accurate approach (CF; Matrix Factorization SVD) to post-filter the predicted recommendation set on both accuracy and health, aiming to avoid the intake of high levels of fat and sugar. We examine the following research questions: • RQ1: Which recipe recommendation approach has the best performance in terms of different accuracy metrics? • RQ2: To what extent can a nutrient-based post-filtering approach in recipe recommendation balance accuracy and healthiness? 1.1 Contribution This study presents a nutrient-based post-filtering method to recipe recommendation. Our recommender system is grounded in previous research, testing recommendation methods such as SVD/Matrix factorization that have performed excellently in several independent studies. A food recommender post-filtering approach has also been used by Trattner and Elsweiler [46]. The novelty of our approach is found in our Cholesterol Score, which we have designed based on the guidelines of The Norwegian Directorate of Health, formulated in their “Diet Manual’ [15]. Instead of considering the general healthiness of recipes or meals as done in previous studies, such through an aggregate health indicator or calorie counting [12, 43, 46], we assess the presence of multiple nutrients (i.e., fat, saturated fat, and sugar). This way, we specifically target cardiovascular diseases, by re-ranking recipe predictions based on a cholesterol-related metric. 2 RELATED WORK ON FOOD RECOMMENDER SYSTEMS The earliest meal-planning systems date back to the 1980s [13, 17], which used case-based reasoning. A more contem- porary categorization of food recommender systems differentiates between two types of approaches [28]: one that is optimized towards a user’s preferences and one that considers a user’s nutritional needs. Considering how self- actualization and changes in user preferences come about when interacting with recommender systems [22, 24, 38, 41], only presenting healthy recommendations is an ineffective strategy if these do not align with a user’s preferences – unless users are highly-motivated (cf. [27, 38]). A third type of approach, among others suggested by Tran et al. [45], is to balance user preferences and nutritional needs [7]. The recommender system presented in this work falls into the third category and aims to balance between a user’s preferences and nutritional needs. Type 1 – User Preferences. This type of food recommender system aims to suggest foods that a user is most likely to enjoy; a common approach in this line of research [47]. For example, Freyne and Berkovsky [9] evaluate the performance of different CB, CF, and hybrid approaches, showing that a content-based approach that deconstructs recipe ratings into ingredient ratings performs best. This means that users are inclined to like recipes that contain similar ingredients (e.g., onion) as recipes they liked in the past. Follow-up studies have improved this approach by accounting for negative evaluations [14], using a hybrid approach of Singular Value Decomposition with user and item biases. Type 2 – Nutritional Needs. A second type of food recommender systems is optimized towards the nutritional needs of the user [38]. Although the relation between unhealthy food intake and adverse health conditions is well-studied (cf. [10]), how recommender systems can help users to make healthier choices has received less attention [45]. An early example is described by Mankoff et al. [26], who generate food recommendations based on an analysis of the users’ food receipts. The system would suggest foods to buy based on the nutrient a user was lacking. In a more goal-oriented 2 The Cholesterol Factor MORS 2021, September, 2021, Amsterdam, NL approach, Ueta et al. [50] propose a recommender system that allows the user to disclose specific health problem that she wants to be addressed, for which the system retrieves the nutrient(s) that co-occur more often with that health problem. From there, it would suggest meals that avoid specific nutrients. Type 3 – Optimizing Between User Preferences and Nutritional Needs. Although healthy food and ‘tasty food’ are not mutually exclusive categories, there is often an optimization trade-off between nutrient intake and user preferences [47]. Whereas some approaches aim to balance these two factors simultaneously when retrieving recipes [12], most approaches rely on either pre-filtering or post-filtering based on one or more health indicators [5]. Pre-filtering approaches typically involve constraint-based recommender systems, although these are relatively rare in the food domain [47]. Yang et al. [52] describe an interface in which a user could disclose dietary constraints (e.g., halal, vegetarian, or vegan) that led to an initial selection of meals, after which user preferences were elicited to re-rank the initial set. In a similar vein, Toledo et al. [44] employ a multi-criteria decision analysis to filter out foods that do not meet a user’s health requirements, before considering the user’s overall preferences. A more common approach is to apply post-filtering in food recommender systems [47]. Such recommenders retrieve a relevant set of recipes based on user preferences (e.g., through content-based similarity [9]), after which a feature-based or nutrient-based re-ranking or multiplication would be conducted [42, 43, 46]. Elsweiler et al. [8] set out such a re-ranking approach, by retrieving all recipes that score above a certain user preference threshold and re-ranking them on one or more health indicators afterwards. Trattner and Elsweiler [46] describe a post-filtering approach that re-weights the predicted score of a user X for a recipe Y, based on an aggregate health indicator (i.e., a recipe’s WHO or FSA score; see also [43]). While some approaches have post-filtered on health by considering a meal’s calorie content [12], this study focuses on nutrient intake, because it is a more accurate predictor of health outcomes [6, 31]. 3 METHODOLOGY We assessed to what extent accuracy and health (in terms of cholesterol-related nutrient intake) could be balanced in recipe recommendation through a nutrient-based post-filtering approach. We first describe what recipe dataset and which recommender approaches were used. Subsequently, we explain the rational of our cholesterol post-filtering metric and how we performed offline evaluation of our results. 3.1 Recipe Dataset We employed a dataset that comprised 1,031 unique recipes, which were obtained from the website Allrecipes.com, one of the largest recipe websites. Recipes were annotated with nutrient-specific metadata, including the contents in grams (i.e., carbohydrates, (saturated) fat, fiber, protein, sugar), as well as a recipe’s caloric content. Moreover, it specified ingredients, cooking directions, and the average recipe rating given by users on the website. In total, we had access to 50,681 ratings given to recipes in our dataset. This only included users that had given at least 20 ratings (𝑀 = 63.02 ratings, 𝑆𝐷 = 54.97). The provided ratings, which were given on a 5-point scale, were relatively high: 55.95% of the given ratings were 5 out of 5 and 30.9% were 4 out of 5 (𝑀𝑒𝑎𝑛 = 4.39, 𝑆𝐷 = 0.82). We therefore expected that classification metrics (i.e., precision, recall) would reach relatively high values. 3.2 Recommendation Approaches We evaluated our dataset through three recommender approaches. Each approach was founded in previous research conducted in the food recommender domain, comparing approaches from [14], [9], and a hybrid approach. 3 MORS 2021, September, 2021, Amsterdam, NL Starke et al. 3.2.1 SVD (Matrix Factorization). We used a Matrix Factorization model to discover latent factors in our recipe dataset (cf. Koren et al. [23] for mathematical details). It involved the SVD algorithm [11], as defined in SciKit Surprise [18]. This approach was analogous to probabilistic matrix factorization (cf. [36]), but also included additional bias parameters for users and items. A related study by Harvey et al. [14] on recipe recommendation showed that a singular value decomposition algorithm, which could be considered a method analogous to Matrix Factorization SVD, outperformed other non-hybrid recommender approaches. 3.2.2 Content-Based. We also employed a content-based algorithm (CB) that exploited the available item descriptions. Content-based approaches were typically used in food recommender systems that optimized for user preferences [28]. For one, Freyne and Berkovsky [9] used an algorithm that deconstructed the recipe ratings into ingredient ratings. For example, if a recipe was given 4 stars out of 5, this rating would be counted for its ingredients (e.g., a rating of 4 for tomato and cucumber). We employed a similar approach by predicting recipe ratings based on the average ratings given by a user 𝑢 to its 𝑗 ingredients 𝑖𝑛𝑔 𝑗 . Analogous to [9], we predicted a user’s rating for a recipe 𝑟 using the average of the ingredient ratings 𝑠𝑐𝑜𝑟𝑒 (𝑢, 𝑖𝑛𝑔 𝑗 ), which was computed as follows: Í 𝑗 ∈𝑟 𝑠𝑐𝑜𝑟𝑒 (𝑢, 𝑖𝑛𝑔 𝑗 ) 𝑝𝑟𝑒𝑑 (𝑢, 𝑟 ) = (1) 𝑗 3.2.3 Hybrid. Our hybrid approach combined our two other algorithms. In line with [9], we used a collaborative approach to overcome sparsity issues of the content-based approach for recipes with few ratings or ingredients. Whereas Freyne and Berkovsky [9] used a Nearest Neighbor approach, we fitted our SVD recommender to a training set to estimate recipe scores for all user-recipe pairs that did not have a true rating. Due to the used training splits, this expanded the training data by a factor 32. Subsequently, we fit the content-based recommender to the expanded training set as described above, which led to a drastically longer computation time than for the other approaches. 3.3 Designing a Post-Filtering Metric: The Cholesterol Factor We sought to balance user preferences and nutritional needs through a post-filtering approach. On the one hand, if too much weight would be put on nutritional needs, users would be likely to abandon the recommender system due to a mismatch in taste. On the other hand, if only weight is placed on user preferences (as done in some food recommenders [28]), healthiness might be lost due to the popularity of unhealthy internet-sourced recipes [48]. 3.3.1 Nutrient Intake Guidelines. To help users to avoid nutrient intake associated with high levels of cholesterol, we designed a metric to assess a recipe’s healthiness. We followed nutritional guidelines from the Norwegian Directorate of Health [15, p.173], an organization tasked with monitoring research in the field of nutrition. Its guidelines were in line with other nutrition authorities in Europe, such as the Dutch Voedingscentrum1 . Whereas in many European countries (e.g., United Kingdom2 ) guidelines are along the lines of “Saturated fats should be swapped with unsaturated fats”, the Norwegian advice is formulated more specifically in terms of nutrient intake levels as a percentage of calories per day. Table 1 provides an overview of nutrient-specific guidelines that the Norwegian Directorate for Health has formulated for people with high cholesterol. The guidelines are formalized as percentages of the total calorie content of a meal, either as a recommended interval or as an upper bound only (the amount for fiber was denoted in grams). Three important nutrients (i.e., sugar, fat, saturated fat) did not have an explicit lower bound [15], which made them particularly useful 1 https://www.voedingscentrum.nl/nl/service/vraag-en-antwoord/aandoeningen/wat-mag-ik-eten-bij-een-te-hoog-cholesterol-.aspx 2 https://www.gov.uk/government/news/reducing-saturated-fat-lowers-blood-cholesterol-and-risk-of-cvd 4 The Cholesterol Factor MORS 2021, September, 2021, Amsterdam, NL Table 1. Daily nutrient intake guidelines for people with a high level of cholesterol, as formulated by the Norwegian Directorate for health [15, p.173]. Guidelines used for our Cholesterol Factor are denoted in bold. Macronutrient Amount Nutrient-specific Guidelines Carbohydrates 55-60% Sugar: <10% Protein 10-15% Fiber: 25-35g Fat <30% Monounsaturated Fat: 10-15%; Omega-3: 1%; Polyunsaturated Fat: 4-9%; Saturated Fat: <10% to include in a continuous health score, for which positive values would indicate one’s nutrient intake to be below the recommended guidelines, while a negative score would exceed those guidelines. The dataset was made compatible with these guidelines by converting grams to daily calorie allowance percentages. While a gram of fat amounted to 9 kcal, a gram of protein or carbohydrates was equivalent to 4 kcal [25]. By dividing these by the total amount of kcal per recipe, we obtained the kcal percentage of a meal. 3.3.2 Post-filtering Through a Cholesterol Factor. We proposed a Cholesterol Factor to apply post-filtering on a recipe recommendation set. This entailed a re-weighting of all the predicted scores based on a recipe’s nutrient content, in relation to avoiding high levels of cholesterol. Recipes with relatively low levels of fat, saturated fat, and sugar received higher rating predictions, and vice versa. The Cholesterol Factor is composed of two factors: a ‘Cholesterol Weight’ and a ‘Cholesterol Score’. We used the following formula to post-filter our predicted ratings: 𝑃𝑜𝑠𝑡-Filtered 𝑅𝑎𝑡𝑖𝑛𝑔 = 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑅𝑎𝑡𝑖𝑛𝑔 + (𝐶ℎ𝑜𝑙𝑒𝑠𝑡𝑒𝑟𝑜𝑙 𝑆𝑐𝑜𝑟𝑒 × 𝐶ℎ𝑜𝑙𝑒𝑠𝑡𝑒𝑟𝑜𝑙 𝑊 𝑒𝑖𝑔ℎ𝑡) (2) The Cholesterol Weight is a number that can be chosen depending on how much a system designer wants cater to a user’s health objectives. Higher levels of Cholesterol Weight will lead to higher rating predictions for healthy recipes, presumably at the cost of accuracy. The assigned Cholesterol Weight must be considered relative to the weight attributed to the rating predictions (e.g., a weight of 1 balances health and preference ratings). In contrast, the Cholesterol Score is computed per recipe, based on the nutritional content for fat, saturated fat, and sugar. This is formulated as follows: 𝐶ℎ𝑜𝑙𝑒𝑠𝑡𝑒𝑟𝑜𝑙𝑆𝑐𝑜𝑟𝑒 = 𝐹𝑎𝑡 𝑃𝑜𝑖𝑛𝑡𝑠 + 𝑆𝑎𝑡𝑢𝑟𝑎𝑡𝑒𝑑 𝐹𝑎𝑡 𝑃𝑜𝑖𝑛𝑡𝑠 + 𝑆𝑢𝑔𝑎𝑟 𝑃𝑜𝑖𝑛𝑡𝑠 (3) Table 2 shows how the points for the three nutrient categories are scored on a scale from -5 to 5. If a recipe scores above 0 in a nutrient category, it indicates that the intake for that nutrient complies with healthy eating guidelines (cf. [15, p.173]). The total Cholesterol Score is the sum score of all three categories and, as such, ranges from -15 to 15. This implies that negative sugar scores could be compensated by positive fat scores. Compared to the commonly used FSA score metric (4-12) for nutrient intake [43, 48], our metric had a larger scale resolution and did not consider salt. 3.4 Evaluation We performed our recommender evaluation using 5-fold cross validation, using the Surprise Sci-kit [18]. To assess our recommender system approaches (i.e., SVD, content-based, hybrid), we used the three performance metrics in Scikit-learn [34]: Precision (P), Recall (R), and Mean Absolute Error (MAE). K was set at 10, evaluating the top-10 retrieved recipes in terms of their relevance. Recommendations were deemed relevant if their rating was at least 4 out of 5. In addition, our recommender approaches were also compared to a Random Item Ranking baseline. 5 MORS 2021, September, 2021, Amsterdam, NL Starke et al. Table 2. Scoring table per nutrient to compute the Cholesterol Factor of a recipe, which was the sum score. A negative score indicates that a recipe was relatively unhealthy. The amounts denote the percentage of the total caloric content. For example, a recipe with 200kcal and 2g of sugar, contains 8kcal of ‘sugar calories’, which is 4% of the recipe, resulting in 1 sugar point. The scores are based on nutritional guidelines from the Norwegian Directorate of Health [15]. Fat Saturated Fat Sugar Amount Points Amount Points Amount Points <=1 5 <=1 5 <=1 5 <=5 4 <=3 4 <=3 4 <=10 3 <=5 3 <=5 3 <=20 2 <=7 2 <=7 2 <=30 1 <=10 1 <=10 1 >30 0 >10 0 >10 0 >=40 -1 >=13 -1 >=13 -1 >=50 -2 >=16 -2 >=16 -2 >=60 -3 >=19 -3 >=19 -3 >=70 -4 >=22 -4 >=22 -4 >=80 -5 >=25 -5 >=25 -5 Accuracy of classifications. We used the common classification metrics Precision and Recall to assess how represen- tative the predicted Top-N recommendations were [19, p.180]. Precision indicated the proportion of recommended recipes relevant (𝐻𝑖𝑡𝑠) for user 𝑢 in the top-10 recommendation set (𝑅𝑒𝑐𝑆𝑒𝑡), while Recall referred to the proportion of relevant recommended recipes (𝐻𝑖𝑡𝑠) compared to the total set of relevant recommendations (𝑅𝑒𝑙𝑒𝑣𝑎𝑛𝑡𝑆𝑒𝑡): 𝐻𝑖𝑡𝑠𝑢 𝐻𝑖𝑡𝑠𝑢 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑅𝑒𝑐𝑎𝑙𝑙 = (4) 𝑅𝑒𝑐𝑆𝑒𝑡𝑢 𝑅𝑒𝑙𝑒𝑣𝑎𝑛𝑡𝑆𝑒𝑡𝑢 Accuracy of predictions: We evaluated the capacity of each algorithm to accurately predict to what extent a user will like a certain recipe. We employed the Mean Square Error (MAE) to assess the accuracy of the rating predictions in our full 𝑇 𝑒𝑠𝑡𝑆𝑒𝑡 [19, p.179]. Across all users 𝑈 and recipes 𝑖, the mean deviation between the actual scores given by users and their predicted scores was computed as follows: Í 𝑖 ∈𝑇 𝑒𝑠𝑡𝑆𝑒𝑡𝑢 |𝑟𝑒𝑐 𝑖 (𝑢, 𝑖) − 𝑟 𝑖 | Õ 𝑀𝐴𝐸 = 𝑢 ∈𝑈 (5) |𝑇 𝑒𝑠𝑡𝑆𝑒𝑡𝑢 | Healthiness of recommendations: Finally, we assessed the healthiness of the predicted recommendation set through the Cholesterol Score, as computed in Table 2. Hence, it could take values between -15 and 15. To contextualize our findings, the average ‘Health Score’ of the dataset was found to be -.53. Please note that because of this scale range, the MAE was less appropriate to evaluate accuracy for post-filtered scores (cf. Table 4). 4 RESULTS We first evaluated our different recommender approaches in terms of classification and accuracy metrics (RQ1). Subsequently, we investigated how recommender accuracy and health could be balanced using a post-filtering approach through our Cholesterol Factor (RQ2). 6 The Cholesterol Factor MORS 2021, September, 2021, Amsterdam, NL 4.1 Evaluation of Recommender Approaches in terms of Accuracy (RQ1) We evaluated our recommender models through different metrics. Table 3 describes the values for recommender classification (Precision@10, Recall@10)3 , accuracy (Mean Square Error (MAE)) and the Cholesterol Score. Although all approaches clearly outperformed the baseline in terms of Precision, Recall and MAE, there was not much between them. Due to the high proportion of 5-star ratings in the dataset, we focused on precision and MAE as the key indicators, also because the domain of recipe recommendation is less concerned with false positives [16]. Therefore, we proceeded to our post-filtering evaluation using the SVD algorithm, also because the hybrid approach was computationally more demanding without showing improvements over SVD. Table 3. Evaluation of three recommender approaches, as well as baseline (random item ranking). Cholesterol Method Precision@10 Recall@10 MAE Score SVD .9687 .9789 .5908 -.1218 Hybrid .9669 .9771 .6162 -.5615 Content-Based .9676 .9779 .6117 -.2561 Baseline (RIR) .6978 .3912 1.728 -.3585 4.2 Balancing Accuracy and Health through Post-Filtering (RQ2) We moved on to examine to what extent a nutrient-based post-filtering approach could balance accuracy and health in recipe recommendation. We performed multiple evaluations of our Matrix Factorization/SVD algorithm by using different values of the Cholesterol Weight in our post-filtering approach. Table 4 describes Precision, Recall, 𝑀𝐴𝐸 𝑓 (for post-filtered scores), and recommendation healthiness (i.e., the Cholesterol Score), each for different values of the Cholesterol Weight. It became evident that even for small weight values up to .1, the Cholesterol Score increased noticeably without sacrificing too much accuracy. In particular, Precision and Recall hardly changed, while a sharp increase in the Cholesterol Score (+5 on a 30-point scale) could be observed. Further increasing the Cholesterol Weight in Table 4 did not increase the Cholesterol score significantly, but did lead to smaller values of Precision and Recall. Moreover, 𝑀𝐴𝐸 𝑓 increased sharply for weight values above .2, as the Cholesterol Score seemed to further affect the rating predictions. Although this showed that an increase in recommendation healthiness did come at the cost of accuracy for high values of the Cholesterol Weight, the health gains are already achieved for small weight values. Moreover, as argued earlier, Precision might be more important in this case than Recall, which deemed the swap of accuracy for health to be a decent tradeoff when a recommender system designer would like to also focus on healthiness, instead of only accuracy. Table 4 only reports values of the Cholesterol Weight up 2, as this seemed feasible to apply in a recommender context where accuracy and health are balanced. It was not possible to achieve a healthiness score that was significantly higher than 5, which was arguably due to the healthiness of the recipes available in the dataset. As the average healthiness across the entire dataset was -.53, the findings in Table 4 indicated that our Cholesterol Factor not only improved the healthiness of the predicted recommendations compared to the health score of our baseline SVD approach, but also compared to the mean healthiness of the dataset. 3 The values for Recall depended on the 𝑘 -value, in the sense that shorter lists led to lower levels of Recall. These changes, however, were proportional across the different approaches. 7 MORS 2021, September, 2021, Amsterdam, NL Starke et al. Table 4. Evaluation of an SVD algorithm (as defined in Scikit Surprise [18]) in terms of accuracy and healthiness metrics (i.e., Cholesterol Score), for different values of linear, post-filtering weighting, and 𝑘 = 5. Note: 𝑀𝐴𝐸 𝑓 denotes the rating prediction accuracy after post-filtering, which can take values beyond the original rating scale of 1 to 5, as the Cholesterol Score scale ranged from -15 to +15. Therefore, 𝑀𝐴𝐸 𝑓 should only be used for comparisons within the table. Cholesterol Cholesterol Weight Precision@5 Recall@5 𝑀𝐴𝐸 𝑓 Score 0 .9695 .8989 .5918 -.1602 .01 .9692 .8975 .5921 .7151 .02 .9695 .8991 .5955 1.9675 .03 .9686 .8986 .5985 2.7395 .05 .9691 .8976 .6121 3.8540 .1 .9686 .8979 .6781 4.7999 .12 .9686 .8977 .7144 4.7235 .13 .9690 .8989 .7377 4.6228 .15 .9675 .8975 .7807 4.7858 .2 .9624 .8933 .9017 4.8817 .25 .9498 .8766 1.0360 5.1436 .3 .9259 .8531 1.1783 4.9227 .5 .8214 .7491 1.7878 4.9614 .7 .7317 .6587 2.4284 5.1686 1 .6513 .5821 3.4098 5.0381 1.5 .6225 .5531 5.0640 4.8586 2 .5549 .4878 6.7267 4.9128 Considering the health-accuracy tradeoff, Table 4 suggests that a Cholesterol Weight of .1 is decent estimate to balance user preferences and recommendation healthiness, particularly if nothing would be known about a user’s health preferences. This is the point where the relative increase in health becomes smaller, while the loss of accuracy and the percentage of correct classifications was steady or even increased. This applied to the Allrecipes dataset, which was representative for U.S.-based and Western Europe recipes. 5 DISCUSSION Food recommender systems face a distinct multi-objective optimization problem [28], which particularly applies to internet-sourced recipes. The challenge at hand is how to optimize between a user’s preferences and the healthiness of food presented. This can be particularly challenging due to the unhealthiness of many popular recipes [48], which also applied to the Allrecipes dataset used in the current study, as well as to users who have liked unhealthy recipes in the past and have changed their preferences [24, 41]. Our main contribution is the development of a metric to balance accuracy and health in a recommender approach; in this study through post-filtering. We have aimed to ‘serve’ recipes to users that are healthier, while maintaining relevance. In doing so, we have taken a nutrient-based approach to optimize the healthiness of recipe recommendations using the Cholesterol Factor, which is based on nutritional guidelines from the Norwegian Directorate for Health. Although such exact guidelines vary between countries, they are rather representative for European countries. Based on our metric, we find many recipes in our dataset that are rather unhealthy, which are best avoided in recommendation sets if users wish to meet nutritional guidelines. This is line with other work that uses an Allrecipes.com dataset [46]. 8 The Cholesterol Factor MORS 2021, September, 2021, Amsterdam, NL In terms of recommendation approaches, we have followed the work of Freyne and Berkovsky [9]. In doing so, we have observed that an Matrix Factorization SVD algorithm is able to provide decent predictions for our recommendations, compared to hybrid and content-based approaches. This finding is consistent with studies that show the merits of a Singular Value Decomposition approach [14], but different from those that used a content-based approach [9]. Also considering the Cholesterol Score of the retrieved recommendation set prior to nutrient-based filtering (cf. Table 3), we have found MF-SVD to be the best option to examine further in our post-filtering approach. The good performance of the recommender algorithms in terms of Precision, Recall, and MAE served as a solid foundation for the post-filtering process. We have been able to increase the healthiness of the predicted recommendation set, while maintaining acceptable prediction accuracy. Whereas the healthiness (i.e., the Cholesterol Score) for the top-10 predicted recipes for users started out at -0.38 for the SVD model, we have been able to increase this through post-filtering. Using a weight of 0.12, we have already been able to increase the healthiness score to 4.7. This is still not an extremely healthy score on a scale [-15;+15], but a rather significant increase at a small accuracy cost; and most scores above 0 fall within the recommended guidelines by the Norwegian Directorate of Health. We encourage other researchers to also employ a nutrient-based post-filtering approach to food recommendation, and to expand our work. The extent to which we can generalize our results is somewhat limited, for we have only used a dataset from a single website. Although the website Allrecipes.com is representative for internet-sourced recipes in the United States and, arguably, some European countries, recent studies suggest that there may be differences in how people perceive and interact with recipes online, depending on their cultural background [21, 53]. Whereas Norwegian nutritional guidelines indicate that our US-based recipe dataset is rather unhealthy on average, US food guidelines tend to be more lenient in terms of some nutritional guidelines [40]. Moreover, our current approach has been specific to the platform and did not consider any user characteristics, which is a problem in more food recommender studies [20]. We are seeking to perform follow-up studies that consider such cultural differences in food, by using different research populations (e.g., in crowdsourcing studies) and datasets from different countries. In a similar vein, we also wish to investigate how our metric performs compared to other summary indicators. For one, the FSA and WHO scores have been used in a similar fashion [46], by estimating the healthiness of recipes on specific nutrient, albeit with a smaller scale size. Future studies could also consider to further develop the hybrid algorithm used in the current study. The approach in the current study is based on the work of Freyne and Berkovsky [9], which turned out to be computationally demanding. In contrast, we have used a post-filtering approach with linear weights to optimize for health, which is computationally more efficient than incorporating health in the main user model [32]. Although it did not outperform SVD on the metrics in the current study, we aim to further explore its merits in combination with either a pre- or post-filtering approach. In doing so, we aim to investigate whether this leads to better outcomes in terms of accuracy and health, despite its computational demands. ACKNOWLEDGEMENTS This work was supported by the Research Council of Norway with funding to MediaFutures: Research Centre for Responsible Media Technology and Innovation, through the Centres for Research-based Innovation scheme, project number 309339. REFERENCES [1] Himan Abdollahpouri, Robin Burke, and Bamshad Mobasher. 2019. Managing popularity bias in recommender systems with personalized re-ranking. In The thirty-second international flairs conference. 413–418. 9 MORS 2021, September, 2021, Amsterdam, NL Starke et al. [2] Himan Abdollahpouri, Masoud Mansoury, Robin Burke, and Bamshad Mobasher. 2020. Addressing the Multistakeholder Impact of Popularity Bias in Recommendation Through Calibration. arXiv preprint arXiv:2007.12230 (2020). [3] Gediminas Adomavicius and YoungOk Kwon. 2011. Improving aggregate recommendation diversity using ranking-based techniques. IEEE Transactions on Knowledge and Data Engineering 24, 5 (2011), 896–911. [4] Arda Antikacioglu and R Ravi. 2017. Post processing recommender systems for diversity. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 707–716. [5] Devis Bianchini, Valeria De Antonellis, Nicola De Franceschi, and Michele Melchiori. 2017. PREFer: A prescription-based food recommender system. Computer Standards & Interfaces 54 (2017), 64–75. [6] Ruth E Brown, Karissa L Canning, Michael Fung, Dishay Jiandani, Michael C Riddell, Alison K Macpherson, and Jennifer L Kuk. 2016. Calorie estimation in adults differing in body weight class and weight loss status. Medicine and science in sports and exercise 48, 3 (2016), 521. [7] Jefferson Caldeira, Ricardo S Oliveira, Leandro Marinho, and Christoph Trattner. 2018. Healthy menus recommendation: optimizing the use of the pantry. In Proceedings of the 3rd International Workshop on Health Recommender Systems Co-Located with ACM RecSys. CEUR, Aachen, DE, 6. [8] David Elsweiler, Morgan Harvey, Bernd Ludwig, and Alan Said. 2015. Bringing the “healthy" into Food Recommenders.. In DMRS. CEUR, Aachen, DE, 33–36. [9] Jill Freyne and Shlomo Berkovsky. 2010. Recommending food: Reasoning on recipes and ingredients. In International Conference on User Modeling, Adaptation, and Personalization. Springer, 381–386. [10] G Frost, AA Leeds, CJ Dore, S Madeiros, S Brading, and A Dornhorst. 1999. Glycaemic index as a determinant of serum HDL-cholesterol concentration. The Lancet 353, 9158 (1999), 1045–1048. [11] Simon Funk. 2006. Netflix update: Try this at home. [12] Mouzhi Ge, Francesco Ricci, and David Massimo. 2015. Health-aware food recommender system. In Proceedings of the 9th ACM Conference on Recommender Systems. ACM, New York, NY, USA, 333–334. [13] Kristian J Hammond. 1986. CHEF: A Model of Case-based Planning.. In AAAI. 267–271. [14] Morgan Harvey, Bernd Ludwig, and David Elsweiler. 2013. You are what you eat: Learning user tastes for rating prediction. In International symposium on string processing and information retrieval. Springer, 153–164. [15] Helsedirektoratet. 2016. Kosthåndboken. https://www.helsedirektoratet.no/veiledere/kosthandboken [16] Jonathan L Herlocker, Joseph A Konstan, Loren G Terveen, and John T Riedl. 2004. Evaluating collaborative filtering recommender systems. ACM Transactions on Information Systems (TOIS) 22, 1 (2004), 5–53. [17] Thomas R Hinrichs. 1989. Strategies for adaptation and recovery in a design problem solver. In Proceedings of the Workshop on Case-Based Reasoning. 343–348. [18] Nicolas Hug. 2020. Surprise: A Python library for recommender systems. Journal of Open Source Software 5, 52 (2020), 2174. https://doi.org/10. 21105/joss.02174 [19] Dietmar Jannach, Markus Zanker, Alexander Felfernig, and Gerhard Friedrich. 2010. Recommender systems: an introduction. Cambridge University Press. [20] Mansura A Khan, Barry Smyth, and David Coyle. 2021. Addressing the complexity of personalized, context-aware and health-aware food recommendations: an ensemble topic modelling based approach. Journal of Intelligent Information Systems (2021), 1–41. [21] Kyung-Joong Kim and Chang-Ho Chung. 2016. Tell me what you eat, and i will tell you where you come from: A data science approach for global recipe data on the web. IEEE Access 4 (2016), 8199–8211. [22] Bart P Knijnenburg, Saadhika Sivakumar, and Daricia Wilkinson. 2016. Recommender systems for self-actualization. In Proceedings of the 10th acm conference on recommender systems. ACM, New York, NY, USA, 11–14. [23] Yehuda Koren. 2009. Collaborative filtering with temporal dynamics. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. 447–456. [24] Yu Liang. 2019. Recommender system for developing new preferences and goals. In Proceedings of the 13th ACM Conference on Recommender Systems. ACM, New York, NY, USA, 611–615. [25] National Agricultural Library. [n.d.]. How many calories are in one gram of fat, carbohydrate, or protein? https://www.nal.usda.gov/fnic/how- many-calories-are-one-gram-fat-carbohydrate-or-protein [26] Jennifer Mankoff, Gary Hsieh, Ho Chak Hung, Sharon Lee, and Elizabeth Nitao. 2002. Using low-cost sensing to support nutritional awareness. In International conference on ubiquitous computing. Springer, 371–378. [27] Jutta Mata, Marlene N Silva, Paulo N Vieira, Eliana V Carraça, Ana M Andrade, Sílvia R Coutinho, Luis B Sardinha, and Pedro J Teixeira. 2011. Motivational “spill-over” during weight control: Increased self-determination and exercise intrinsic motivation predict eating self-regulation. 1 (2011), 49–59. [28] Stefanie Mika. 2011. Challenges for nutrition recommender systems. In Proceedings of the 2nd Workshop on Context Aware Intel. Assistance, Berlin, Germany. Citeseer, 25–33. [29] Cataldo Musto, Alain D Starke, Christoph Trattner, Amon Rapp, and Giovanni Semeraro. 2021. Exploring the Effects of Natural Language Justifications in Food Recommender Systems. In Proceedings of the 29th ACM Conference on User Modeling, Adaptation and Personalization. ACM, New York, NY, USA, 147–157. 10 The Cholesterol Factor MORS 2021, September, 2021, Amsterdam, NL [30] Yandisa Ngqangashe, Charlotte De Backer, Christophe Matthys, and Nina Hermans. 2018. Investigating the nutrient content of food prepared in popular children’s TV cooking shows. British Food Journal 120 (2018), 2102–2115. Issue 9. [31] Ruairi O’Driscoll, Jake Turicchi, Kristine Beaulieu, Sarah Scott, Jamie Matu, Kevin Deighton, Graham Finlayson, and James Stubbs. 2020. How well do activity monitors estimate energy expenditure? A systematic review and meta-analysis of the validity of current technologies. British Journal of Sports Medicine 54, 6 (2020), 332–340. [32] Umberto Panniello and Michele Gorgoglione. 2011. Context-Aware Recommender Systems: A Comparison Of Three Approaches.. In DART@ AI* IA. CEUR, Aachen, DE, 12. [33] Umberto Panniello, Alexander Tuzhilin, Michele Gorgoglione, Cosimo Palmisano, and Anto Pedone. 2009. Experimental comparison of pre-vs. post-filtering approaches in context-aware recommender systems. In Proceedings of the third ACM conference on Recommender systems. 265–268. [34] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830. [35] Markus Rokicki, Eelco Herder, and Christoph Trattner. 2017. How editorial, temporal and social biases affect online food popularity and appreciation. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 11. AAAI, 7. Issue 1. [36] Ruslan Salakhutdinov and Andriy Mnih. 2008. Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. In Proceedings of the 25th international conference on Machine learning. 880–887. [37] Hanna Schäfer, Santiago Hors-Fraile, Raghav Pavan Karumur, André Calero Valdez, Alan Said, Helma Torkamaan, Tom Ulmer, and Christoph Trattner. 2017. Towards health (aware) recommender systems. In Proceedings of the 2017 international conference on digital health. 157–161. [38] Hanna Schäfer and Martijn C Willemsen. 2019. Rasch-based tailored goals for nutrition assistance systems. In Proceedings of the 24th International Conference on Intelligent User Interfaces. 18–29. [39] Elizabeth P Schneider, Emily E McGovern, Colleen L Lynch, and Lisa S Brown. 2013. Do food blogs serve as a source of nutritionally balanced recipes? An analysis of 6 popular food blogs. Journal of nutrition education and behavior 45, 6 (2013), 696–700. [40] Marco Springmann, Luke Spajic, Michael A Clark, Joseph Poore, Anna Herforth, Patrick Webb, Mike Rayner, and Peter Scarborough. 2020. The healthiness and sustainability of national and global food based dietary guidelines: modelling study. bmj 370 (2020). [41] Alain Starke. 2019. RecSys Challenges in achieving sustainable eating habits.. In HealthRecSys@ RecSys. 29–30. [42] Alain D Starke, Elias Kløverød Kløverød Brynestad, Sveinung Hauge, and Louise Sandal Løkeland. 2021. Nudging Healthy Choices in Food Search Through List Re-Ranking. In Adjunct Proceedings of the 29th ACM Conference on User Modeling, Adaptation and Personalization. 293–298. [43] Alain D Starke, Martijn C Willemsen, and Christoph Trattner. 2021. Nudging Healthy Choices in Food Search Through Visual Attractiveness. Frontiers in Artificial Intelligence 4 (2021), 20. [44] Raciel Yera Toledo, Ahmad A Alzahrani, and Luis Martinez. 2019. A food recommender system considering nutritional information and user preferences. IEEE Access 7 (2019), 96695–96711. [45] Thi Ngoc Trang Tran, Müslüm Atas, Alexander Felfernig, and Martin Stettinger. 2018. An overview of recommender systems in the healthy food domain. Journal of Intelligent Information Systems 50, 3 (2018), 501–526. [46] Christoph Trattner and David Elsweiler. 2017. Investigating the healthiness of internet-sourced recipes: implications for meal planning and recommender systems. In Proceedings of the 26th international conference on world wide web. 489–498. [47] Christoph Trattner and David Elsweiler. 2019. Food Recommendations. In Collaborative recommendations: Algorithms, practical challenges and applications. World Scientific, 653–685. [48] Christoph Trattner, David Elsweiler, and Simon Howard. 2017. Estimating the healthiness of internet recipes: a cross-sectional study. Frontiers in public health 5 (2017), 16. [49] Christoph Trattner, Dominik Moesslang, and David Elsweiler. 2018. On the predictability of the popularity of online recipes. EPJ Data Science 7, 1 (2018), 1–39. [50] Tsuguya Ueta, Masashi Iwakami, and Takayuki Ito. 2011. A recipe recommendation system based on automatic nutrition information extraction. In International Conference on Knowledge Science, Engineering and Management. Springer, 79–90. [51] Salim S Virani, Alvaro Alonso, Emelia J Benjamin, Marcio S Bittencourt, Clifton W Callaway, April P Carson, Alanna M Chamberlain, Alexander R Chang, Susan Cheng, Francesca N Delling, et al. 2020. Heart disease and stroke statistics—2020 update: a report from the American Heart Association. Circulation 141, 9 (2020), e139–e596. [52] Longqi Yang, Cheng-Kang Hsieh, Hongjian Yang, John P Pollak, Nicola Dell, Serge Belongie, Curtis Cole, and Deborah Estrin. 2017. Yum-me: a personalized nutrient-based meal recommender system. ACM Transactions on Information Systems (TOIS) 36, 1 (2017), 1–31. [53] Qing Zhang, Christoph Trattner, Bernd Ludwig, and David Elsweiler. 2019. Understanding Cross-Cultural Visual Food Tastes with Online Recipe Platforms. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 13. 671–674. [54] Yong Zheng. 2018. Context-aware mobile recommendation by a novel post-filtering approach. In The Thirty-First International Flairs Conference. AAAI, 4. 11