A Recommender System for Healthy and Personalized Recipe
                      Recommendations
                 Florian Pecune *                                      Lucile Callebert *                               Stacy Marsella
         florian.pecune@glasgow.ac.uk                           lucile.callebert@glasgow.ac.uk                  stacy.marsella@glasgow.ac.uk
              University of Glasgow                                  University of Glasgow                          University of Glasgow

ABSTRACT                                                                             is not always available, and research has shown how hard it is for
Unhealthy eating behavior is a serious public health issue with mas-                 people to infer the healthiness of a recipe simply from its picture [5],
sive repercussions on an individual’s health. One potential solution                 even when the recipe has been categorized as healthy [23]. Based
to this problem is to help people change their eating behavior by                    on these findings, it becomes important to build systems that not
developing systems able to recommend healthy recipes that can in-                    only recommend healthy and personalized recipes, but that also
fluence eating behavior. One challenge for such systems is to deliver                precisely display how healthy these recipes are.
healthy recommendations that take into account users’ needs and                         In this paper, we present an experiment in which we investigate
preferences, while also informing users about the healthiness of the                 whether people would be likely to select recipes that are healthier
recommended recipes. In this paper, we investigate whether intro-                    than the recipes they usually cook. More specifically, our work
ducing a healthy bias in a recipe recommendation algorithm, and                      focuses on investigating how introducing a healthy bias in the
displaying a healthy tag on recipe cards would have an influence                     recommendation algorithm and the presence of a healthy tag would
on people’s decision making. To that end, we build three differ-                     influence users’ likelihood to pick recommended recipes. We first
ent recipes recommender systems: one that recommends recipes                         describe the different recommendation algorithms we evaluate in
matching users’ preferences, another one that only recommends                        our experiment. Then we describe our experimental design and
healthy recipes, and a third one that recommends recipes that are                    present our results.
both healthy and match users’ preferences. We evaluate these three
systems through a user study in which we asked participants online
to select from a list of recipes the ones they like the most.
                                                                                     2    RELATED WORK
KEYWORDS                                                                             Food recommender systems traditionally rely on two distinct ap-
                                                                                     proaches to deliver personalized recipes recommendations. Systems
food recommender system; healthcare; collaborative filtering
                                                                                     relying on the content-based approach recommend recipes based
                                                                                     on their description and users’ preferences. In [6] the authors de-
1 INTRODUCTION                                                                       veloped a system that infers people’s preferred ingredients based
Unhealthy eating is a major public health burden that may be                         on the recipes they like. The system then recommends new recipes
reduced in part by helping people select healthier dietary choices.                  containing the previously inferred ingredients. Rather than relying
However, picking appropriate food to eat implies complex decision                    on recipes ingredients, [13] proposed an Ensemble Topic Model-
making processes [4], including being aware of healthy options                       ing based approach that relied on features that were previously
and choosing among them [17]. With people growing increasingly                       extracted from a recipe database to deliver recommendations. Their
familiar with interacting with machines in their everyday life, one                  system performed significantly better than a conventional content-
solution to overcome this issue and help people to make healthier                    based system. Another approach is described in [25], in which the
choices is to develop health-aware food recommender systems                          authors implemented a goal-oriented recipe recommender system
[21, 22]. One of the most important challenges for such a system                     providing nutrition information. The system first collects the user’s
is to deliver accurate and personalized recommendations to their                     goal (e.g. I want to prevent a cold) before finding a nutrient that
users. Although most of the popular recipes found on Internet                        matches that goal. The system then picks the ingredient containing
are unhealthy as defined by the United Kingdom Food Standard                         the most of the nutrient previously selected. Finally, the system
Agency (FSA) [23], significant e ffort ha s be en pu t re cently into                recommends a recipe containing that specific ingredient. YumMe,
optimizing food recommendation algorithms and try to reconcile                       the recommender system developed in [27], rely on dietary infor-
users’ preferences with healthy recipe recommendation [2, 8, 23].                    mation to recommend recipes that would match users’ needs. The
By analyzing people’s eating behavior, authors in [9] found that the                 system automatically extracts dietary information from pictures of
fat and calorific content of a recipe were the best rating predictors                recipes to form a user profile. The system then relies on this user
for people interested in eating healthy. However, this information                   profile to deliver subsequent recommendations.
∗ The authors contributed equally to this paper.
                                                                                        Systems relying on the collaborative filtering approach predict
                                                                                     recommendation ratings for a user based on ratings from other
                                                                                     users. In [7], the authors developed a system that collects users’
HealthRecSys’20, September 26, 2020, Online, Worldwide                               preferences by asking them to rate and tag the recipes they usu-
© 2020 Copyright for the individual papers remains with the authors. Use permitted   ally cook at home. The system then relies on users’ preferences
under Creative Commons License Attribution 4.0 International (CC BY 4.0). This
volume is published and copyrighted by its editors.                                  to rank recipes and deliver recommendations. Authors found that
                                                                                     their improved matrix factorization algorithm outperformed the
HealthRecSys’20, September 26, 2020, Online, Worldwide                                                                                Pecune et al.


content-based approach proposed by [6]. The extensive compar-             3.1    Recipe dataset
ison performed in [24] confirms that the collaborative filtering          We collected our recipes dataset from allrecipes.com with a web
approaches performs better than content-based ones. The study             crawler in April 2020, limiting ourselves to recipes that had been
also reveals that the FSA score of the recipe was the most important      reviewed by at least 10 users. We collected a total of 13,515 recipes.
content feature, highlighting that people are usually consistent          We chose allrecipes.com as it is one of most popular recipe websites
in their eating habits. Most of these systems focus on delivering         in terms of traffic, with 25 million unique visitors each month, and
personalized recommendations matching a users profile. They do            it provides nutritional information for most of its recipes.
not intend to recommend recipes that not only match their users               For each recipe, we collected: its title, image link, list of ingre-
preferences but are also healthy.                                         dients and quantities, preparation steps, preparation and cooking
   In [8], the authors try to solve this problem by extending their       times, number of servings, nutritional information, ratings data
previous recommendation algorithm [7], introducing a health bias          (number of ratings, ratings min and max, average rating) and list
based on the balance between the calories that the user needs and         of comments (associated with a unique user name and a rating). To
the calories of the recipes. Another system reconciling healthiness       reduce data sparsity issues, we selected a subset of recipes/users so
and personalization is [3], in which the authors propose a method to      that recipes are rated at least 25 times and users who had rated at
recommend healthy recipes based on a subset of ingredients given          least 30 recipes. This results in a dense dataset of 1,169 recipes and
by a user. The system first selects ingredients that are compatible       1,339 users for a total of 70,945 ratings.
with the given subset, and associate an optimal quantity for each             Similar to [5, 23], we used the standards provided by the Food
of these ingredients. The system then generates a pseudo-recipe           Standard Agency (FSA, UK) and the green, orange and red traffic-
containing the ingredients with the healthiest nutritional value,         light system to evaluate the healthiness of the recipes. The FSA
before picking the existing recipe best matching the pseudo-recipe.       provides standard ranges for low content (green), medium content
DietOS [1] proposes a solution to manage specific health conditions       (orange) or high content (red) of fat, saturates, sugar and sodium.
by recommending ingredients matching its users’ health profile. The       To calculate a health score, we assign to a recipe, for each of the fat,
system also presents the nutritional properties for each ingredient       saturates, sugar and sodium elements, one point if the element’s
as well as their benefits regarding users health conditions’. In [23]     quantity is within the low range, 2 for the medium range and 3 for
the authors weighted the outcome of their Collaborative Filtering         the high range. The health score therefore ranges from 4 (best) to
algorithm based on the FSA and WHO scores associated with each            12 (worst).
recipe. The accuracy of such system was lower compared to the                 The recipes in our dataset are rather unhealthy: the health score
best unfiltered collaborative filtering algorithms, but still better      ranges from 6 to 12, and 75.36% of the recipes have a health score
than unfiltered algorithms such as MostPopularItem, UserKNN or            of 8, 9 or 10. Only 2.31% of the recipes are healthy (green category).
ItemKNN. Although these systems present interesting approaches            That is consistent with the observations from [5] showing that most
to reconcile health with users’ preferences, none of them were            popular recipes tend to be unhealthy (high fat content).
evaluated by real users.
   A subjective evaluation investigating users preferences towards
healthy food is proposed in [5]. The authors first paired specific        3.2    Recommender systems
recipes with their healthier version, i.e. similar recipes with health-   To answer our research questions, we built a recommender system
ier substituted ingredients following the method described in [20].       that takes into account both the users’ preferences and the healthi-
Then, they showed participants the different pairs and asked the          ness of the recipes. Users’ preferences are learned via collaborative
latter to pick the one they preferred, and the one they considered        filtering (CF), a popular approach that relies on user ratings and that
to be the healthiest. Results demonstrated that people were less          reports better results compared to content-based approaches [24].
inclined to pick the healthier recipe of the pair, but also how diffi-    CF methods allow a recommender system to rank recipes according
cult that was for participants to judge the healthiness of a recipe.      to a score that represents how likely the recipe is to correspond to
However, the recipes pairs were the same for all the participants         the user’s preferences.
and were not related to their preferences. Therefore, we focus on             We used implicit feedback transferring all the ratings to positive
the following research question:                                          feedback from users, indicating a preference of the user for the
   RQ1: Are people willing to pick recommendations that are both          rated recipes compared to the not-rated ones. The user ratings were
healthy and match their preferences?                                      then turned into confidence levels on how much the user actually
   RQ2: Does the presence of a healthy tag on the displayed recipes       liked the rated recipe. This preference-confidence approach has
have an influence on people’s decision making?                            shown to perform well [10].
                                                                              We used the Implicit python library and tested three popular CF
                                                                          algorithms: Alternating Least Squares (ALS) [19], Bayesian Person-
                                                                          alized Ranking (BPR) [16] and Logistic Matrix Factorization (LMF)
                                                                          [12]. We also compared the performances of those algorithms with
3    MODEL                                                                a simple Most Popular recommender. We split our dataset into a
To investigate our research questions, our first step was to collect a    train, cross-validation and test sets to evaluate the performance
recipe dataset we could use to build our recommender system. We           in terms of AUC [18] of each algorithm. We ran each experiment
describe our dataset and the recommender system we built in the           100 times and report the average values in Table 1. The best per-
following section.                                                        forming algorithm is ALS, with a performance comparable to the
A Recommender System for Healthy and Personalized Recipe Recommendations                           HealthRecSys’20, September 26, 2020, Online, Worldwide


                          Algorithm        AUC                                        System       𝑤𝑝    𝑤ℎ     Mean       Most Common
                          ALS       0.694                                               Pref     1    0     8.750       9
                          BPR       0.649                                             Healthy    0    1     6.000       6
                      Most popular 0.644                                             Hybrid-11 1      1     6.275       6
                         LMF        0.617                                            Hybrid-21 2      1     7.175       6
          Table 1: Performance of the CF algorithms                                  Hybrid-31 3      1     7.662       8
                                                                             Table 2: Mean and most common value of FSA health scores
                                                                             of recipes recommended to a user by our recommender sys-
                                                                             tems for different values of 𝑤 𝑝 and 𝑤ℎ .

one reported in [24] on a similar dataset. Our recommender system
therefore relies on this algorithm to output, for each recipe and for
each user, a preference score 𝑠 (𝑟, 𝑢)𝑝 ∈ [0, 1].
                                                                             4     EXPERIMENT
   Each recipe is also assigned a health score 𝑠 (𝑟 )ℎ ∈ [0, 1] that cor-
responds to the normed FSA health score of the recipe as calculated          To answer to our research questions RQ1 and RQ2, we designed
in section 3.1 and is independent of users’ preferences.                     an experiment investigating how our system’s recommendation
   The preference and health scores are then combined to calculate           algorithm and the presence of a healthiness tag in the recipe card
a final score 𝑠 (𝑟, 𝑢) ∈ [0, 1] like in equation 1:                          influenced users’ recipe selection.

      𝑠 (𝑟, 𝑢) = (𝑤 𝑝 × 𝑠 (𝑟, 𝑢)𝑝 + 𝑤ℎ × (1 − 𝑠 (𝑟 )ℎ ))/(𝑤 𝑝 + 𝑤ℎ )   (1)   4.1    Experimental Design
                                                                             For the sake of the experiment, we identified two different indepen-
where 𝑤 𝑝 and 𝑤ℎ are weights to assign to the preference and health          dent variables. The first one represents our system’s recommen-
scores respectively. 𝑠 (𝑟, 𝑢) is then used to rank the recipes and give      dation algorithm (Reco-Algo) as a between-subject independent
a recommendation to the user.                                                variable with three levels: a preference level (pref-reco) in which
   We then implemented three different recommender algorithms                the system delivers recommendations matching users’ preferences,
by adjusting the weights 𝑤 𝑝 and 𝑤ℎ .                                        a health level (healthy-reco) in which the user only gets the health-
                                                                             iest recommendation, and a hybrid level (hybrid-reco) in which
   Preference-based recommender. The preference-based recommender
                                                                             the system biases the preference-based recommendations towards
system only takes into account the preferences of the user, ignoring
                                                                             slightly healthier options. Those three levels correspond to the
the health scores of the recipes; i.e. 𝑤ℎ = 0. In our pilot experi-
                                                                             three systems described in 3.2. The second between-subject vari-
ments, the average health scores of the recipes recommended to
                                                                             able (Tag-Mode) represents whether the recipe card displayed to
users with this system was 8.750 while most recipes recommended
                                                                             the user contains a tag representing how healthy the recipe is and
had a health score of 9 (see Table 1).
                                                                             has two levels: a healthy-tag level (healthy-tag) in which such a
   Healthy recommender. As opposed to the preference-based rec-              healthy tag is present, and a no-tag level (no-tag) in which the
ommender, the healthy recommender ignores the preferences of the             recipes do not contain any healthiness tags.
user; i.e. 𝑤 𝑝 = 0. This system therefore always recommend healthy              Our experiment has a 3x2 design with Reco-Algo and Tag-
recipes according the FSA standards (i.e. FSA green category with            Mode as between subject variables. In each of the six conditions,
FSA health score of 6 or below). Our dataset contains 27 healthy             participants followed the same procedure. After agreeing to partici-
recipes, all of them associated with a health score of 6. Therefore          pate to our study via a consent form, participants were presented
our healthy recommender system randomly selects five recipes                 with a short description of the task. Each participant was then ran-
amongst these 27 ones to recommend to the user.                              domly assigned to a group according to the different independent
                                                                             variables. The task consisted in two steps. In the preference elici-
   Hybrid recommender. To fulfil our objective of helping people to          tation step, participants were asked to select five recipes that they
gradually shift their eating behaviors towards healthier habits, the         prefer amongst a list of thirty recipes as represented in fig.1. The
system should take into account users’ preferences but recommends            five selected recipes were sent as the input to our recommender sys-
healthier recipes compared to the Preference-based recommender.              tem which delivered five recommendations in return. The later five
We tested different values for 𝑤 𝑝 and 𝑤ℎ and ran the system with            recipes corresponding to the output of our recommendation system
10 users for each condition. Table 2 sums up, for each system, the           were then presented to the participants during the recommenda-
mean and most common health scores of the recipes recommended                tion step along with 25 randomly selected recipes. The position of
to the 10 users. Notice that with the Hybrid-11 and Hybrid-21                the recommended recipes on the grid was randomized. As in the
systems, the most common health score value of recommended                   preference elicitation step, participants were asked to select the five
recipes is 6, meaning that the systems mostly recommend healthy              recipes they preferred. Once their choice was made, participants
recipes. Yet, as explained in section 3.1, healthy recipes represent         were asked how satisfied they were with their choice and how easy
only 2.31% of our database. This strongly limits the possibilities of        it was to make this choice. The answers for these two questions
personalization based on users’ preferences and makes those two              were 7-point Likert items (anchors: 0 = very dissatisfied/difficult,
systems very similar to the healthy recommender. We therefore                6 = very satisfied/easy). We also asked participants what influenced
decided to use the Hybrid-31 recommender system.                             them the most for their choice using an open-ended question. After
HealthRecSys’20, September 26, 2020, Online, Worldwide                                                                              Pecune et al.


                                                                         with 53% female and 48% male. The majority of participants (82%)
                                                                         was employed full-time.
                                                                            We conducted four different 3x2 factorial ANOVAs (i.e., analysis
                                                                         of variance) with Reco-Algo and Tag-Mode as between-subject
                                                                         factors. The dependent measures were the F1 score, selected recipes
                                                                         health score, participant’s satisfaction and participant’s perceived
                                                                         choice easiness.

                                                                         5.1    F1 score
                                                                         The factorial ANOVA revealed a significant main effect of Reco-
                                                                         Algo (F(2, 112) = 8.251; p < .001) on the recommender system’s
                                                                         accuracy. There was no main effect of Tag-Mode (F(1, 112) = .945;
                                                                         p = .33) on the recommender system’s accuracy and the interaction
                                                                         between the two variables was not significant (F(2; 112) = 0.358;
                                                                         p = .7). For our follow-up analysis, post hoc comparisons after
                                                                         Bonferroni correction indicated that the mean score for both pref-
                                                                         reco (M=.235, std=.156) and hybrid-reco (M=.200, std=.164) were
Figure 1: Example of the list of recipes as presented to the             significantly better than the healthy-reco (M=.100, std=.129). This
participants during the recommendation step of the exper-                result shows that people are not likely to select healthy recipes if
iment. The cards highlighted in green correspond to the                  these recipes do not match with their preferences/habits at all.
recipes already selected by the participant. In the preference-              To better understand our results, we looked at and compared the
elicitation step, the healthy tag is not displayed.                      recipes recommended by our system and the recipes selected by the
                                                                         users. We observed that the recipes selected by participants were
                                                                         much more diverse than those recommended by our system. For
the end of their task, participants took three surveys: one about        example, our system recommended only chicken-based recipes to
what is important to them when looking for a recipe online, another      a participant who eventually selected two recipes containing meat
one about their eating habits, and the last one is a demographics        (chicken and pork), one vegetarian recipe and two desserts. As an
questionnaire.                                                           objective similarity measure, we calculated for each user the cosine
                                                                         similarity 𝑠𝑖,𝑗 ∈ [0, 1] of every pair of recommended (resp. selected)
4.2     Measurements                                                     recipe titles 𝑅𝑖 and 𝑅 𝑗 , obtaining a 5x5 similarity matrix with 1s
We measured four different constructs in our experiment. (a) We          on the diagonal (i.e. when 𝑖 = 𝑗). We then averaged the values of
relied on the F1 score to measure the performance of our recom-          the similarity matrix, thus obtaining for each user one similarity
mendation algorithm. The F1 score was computed by considering            score 𝑠 ∈ [0, 1] for the recommended (resp. selected) recipes. The
i) true positives as the recipes recommended by our system that          average similarity value for all users for the recommended recipes
were selected by the participants, ii) false positives as the recipes    is 0.288 (std=0.093) and the average similarity value for all users for
recommended by our system that were not selected by the partici-         the selected recipes is 0.255 (std=0.053). A Student t-Test revealed
pants, iii) false negatives as the recipes randomly chosen (i.e. not     that the difference in similarity values of the recommended recipes
recommended by our system) that were selected by the participants        and the selected recipes is significant (t(117) = 3.969, 𝑝 < .001).
and iv) true negatives as the recipes randomly chosen and that were          The low F1 score obtained by all three recommender systems
selected by the participants. (b) To measure the healthiness of the      could therefore be explained by the lack of diversity in the recipes
recipes selected by the participants, we calculated the average FSA      recommended, which is coherent with [26] findings.
health score for the five recipes they selected during the recom-
mendation step. Given the nature of our experiment (i.e. selecting       5.2    Health Score
items in a list) and based on the results from [26], we also measured    There was no main effect of Reco-Algo (F(2, 112) = 1.858; p =
(c) whether the participants were satisfied with the five recipes they   .16) or Tag-Mode (F(1, 112) = .060; p = .81) on the selected recipes
selected and (d) whether that was easy for them to select the five       health score. The interaction between the two variables was not
recipes they liked the most.                                             significant (F(2; 112) = 2.362; p = .09).
                                                                            Although none of these results were significant, the interaction
5     RESULTS                                                            graph in Fig.2 depicts how the presence of a healthy tag on the
We recruited 118 participants on Amazon Mechanical Turk. We              recipe card had different effects on the health score of the selected
required that participants had at least a 90% HIT acceptance rate on     recipes depending on the conditions. In the pref-reco condition,
at least 100 HITs. The recipes presented to the users used imperial      people selected healthier recipes when the healthy-tag was dis-
measurements, therefore, we restricted our evaluation to partici-        played, unlike in the healthy-reco condition, in which people se-
pants located in the U.S. Participants spent on average 6 minutes        lected recipes that were less healthy when the healthy-tag was
and 56 seconds (std=3 minutes and 47 seconds) on the task and            displayed. There was almost no impact of the healthy-tag in the
were paid USD1.20. Most participants were aged 29 to 47 years old,       hybrid-reco condition.
A Recommender System for Healthy and Personalized Recipe Recommendations                             HealthRecSys’20, September 26, 2020, Online, Worldwide


                                                                Reco-Algo                                       Tag-Mode
                 Variable                 pref-reco          hybrid-reco        healthy-reco        healthy-tag             no-tag
                 F1 score             0.235(±0.156) ∗∗∗     0.200(±0.164) ∗   0.100(±0.129) ∗∗∗/∗   0.166(±0.172)      0.193(±0.147)
                 Health score          9.185(±0.631)         8.965(±0.653)     8.889(±0.8427)       9.000(±0.735)      9.030(±0.706)
                 Satisfaction          5.000(±0.961)         4.950(±0.782)     4.868(±1.234)        5.069(±0.814)      4.817(±1.142)
                 Choice easiness       4.175(±1.412)         4.375(±1.314)     4.421(±1.222)        4.362(±1.347)      4.283(±1.290)
Table 3: Summary of all means and standard errors (in parentheses) for the four dependent variables across conditions. The
differences between the means are marked according to their level of significance (* for p < .05 and *** for p < .001)


   In our post-study questionnaire, nine participants of the healthy-          (in orange), whereas the two other conditions introduced recipes
tag condition mentioned that they were mostly influenced by the                tagged in green (for the healthy-algo) or in red (for the pref-algo).
healthy tag while choosing a recipe. This correlates with both i) a            People who explicitly cared about the health tag were more likely
significantly higher F1 score (t(12.3)=−2.7, 𝑝<0.05) for participants          to choose recipes recommended by our system in the hybrid-reco
who mentioned they were influenced by the healthy tag (M=.29,                  and health-reco conditions. That confirms the results found in [9]
std=.15) compared to the other participants (M=.14, std=.17) and ii)           and highlights that need to accurately infer people’s eating goals to
a significantly lower health score of the selected recipes (t(11.4)=2.8,       adapt the recommendation algorithm accordingly. Both hybrid and
𝑝<0.05) for participants who mentioned they were influenced by the             preference-based recommender systems had a significantly better
healthy tag (M=8.42, std=.67) compared to the other participants               F1 scores than our healthy recommender system, which shows that
(M=9.11, std=.70). Both the F1 score and the health score were                 people are not likely to select healthy recipes if these recipes do
significantly different between the two Tag-Mode groups in the                 not match with their preferences/habits at all.
recommendation phase but not in the preference elicitation phase,                 There is no significant evidence related to the impact of healthy
confirming that people are poor judges of the healthiness of a recipe          tags on participants’ decision making that would help us answer
[5].                                                                           RQ2. Indeed, only nine people out of 60 explicitly stated they were
                                                                               influenced by the healthy tag in our post-evaluation questionnaire.
5.3    Perceived satisfaction and choice easiness                              However, the results described in section 5.2 suggest that although
                                                                               people are more likely to pick healthier recipes compared to what
There was no main effect of Reco-Algo (F(2, 112) = 0.171; p = .84)
                                                                               they would usually pick when informed about recipes healthiness,
or Tag-Mode (F(1, 112) = 1.850; p = .18) on participants’ satisfaction.
                                                                               they are less likely to pick recipes tagged as very healthy. In other
The interaction between the two variables was not significant (F(2;
                                                                               words, people will avoid recipes tagged as unhealthy (in red) as well
112) = 0.186; p = .83). Overall, participants were more satisfied with
                                                                               as recipes tagged as healthy (in green). The first part can partially
their choices when the recommended recipes matched their prefer-
                                                                               be explained by the fact that people usually associate a feeling of
ences. The presence of a healthy tag on the recipe card increased
                                                                               guilt with unhealthy food consumption [11]. Thus, people are less
satisfaction regardless of the recommendation algorithm.
                                                                               inclined to pick unhealthy recipes if they are explicitly informed
   Regarding the perceived ease of use, there was no main effect
                                                                               about their unhealthiness. The second part can be explained by
of Reco-Algo (F(2, 112) = 0.391; p = .68) or Tag-Mode (F(1, 112)
                                                                               the "healthy = less tasty" effect which describe how people tend to
= .110; p = .74) on participants’ perceived choice easiness. The
                                                                               associate healthy food with low tastiness [15]. Hence, we assume
interaction between the two variables was not significant (F(2; 112)
                                                                               that participants in our experiment were less inclined to pick recipes
= 1.697; p = .19). Although the presence of a healthy tag lowered
                                                                               explicitly tagged as healthy because they thought such recipes
the perceived difficulty in both the healthy-reco and the pref-reco
                                                                               would be tasteless. Overall, people were more satisfied with their
conditions, such tag made the selection more difficult for people
                                                                               choices when informed about the recipes’ healthiness.
who were recommended recipes in the hybrid-reco conditions.

5.4    Discussion                                                              6    CONCLUSION
To answer to our research question RQ1, our results show that                  In this paper, we investigated whether introducing a healthy bias in
people are slightly less inclined to select recommendations coming             a recipe recommendation algorithm, and displaying a healthy tag on
from our hybrid recommender compared to the preference-based                   recipe cards would have an influence on people’s decision making.
one. However, although the difference is minimal when no tags                  Our results show that a the performance of a recommender system
are displayed on the recipes, the presence of healthy tags accen-              able to combine healthiness with personalization depends on its
tuates the difference. The negative impact of the healthy tags on              users eating goals. People already interested in eating healthy are
the F1 score of our hybrid algorithm can be linked to the choice               more likely to select recipes coming from such a recommendation
easiness. Unlike the healthy and preference-based conditions, the              system. For the others, our results also suggest that adding a simple
healthy tags made it more difficult for participants to select five            yet accurate tag depicting how healthy recipes are might help them
recipes in the hybrid condition. One potential explanation is that             to select healthier recipes compared to what they would usually
all recipes in the hybrid condition had very similar health scores             select. Explicitly informing people how unhealthy some recipes
HealthRecSys’20, September 26, 2020, Online, Worldwide                                                                                                                Pecune et al.


                                        a                                         b                                   c                                         d
                                                                                                                                                                    Reco-Algo


                     Tag-Mode                                 Tag-Mode                            Tag-Mode                                   Tag-Mode


Figure 2: Interaction graphs between Reco-Algo and Tag-Mode regarding the average ○
                                                                                  a F1 score, ○
                                                                                              b health score ○
                                                                                                             c satisfaction
and ○
    d choice easiness.


are might help them to consciously change their eating habits and                        [10] Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative filtering for
prevent them to pick unhealthy recipes.                                                       implicit feedback datasets. In 2008 Eighth IEEE International Conference on Data
                                                                                              Mining. Ieee, 263–272.
    One potential extension of this work would be to combine our CF                      [11] JungYun Hur and SooCheong Shawn Jang. 2015. Anticipated guilt and pleasure
approach with a knowledge-based approach to have more control                                 in a healthy food consumption context. International Journal of Hospitality
                                                                                              Management 48 (2015), 113–123.
over the diversity of the recommended recipes. As explained in                           [12] Christopher C Johnson. 2014. Logistic matrix factorization for implicit feedback
section 5.1, a diverse set of recommendations can positively impact                           data. (2014), 78 pages.
users’ experience [26]. The results of a CF-based algorithm could                        [13] Mansura A Khan, Ellen Rushe, Barry Smyth, and David Coyle. 2019. Personalized,
                                                                                              Health-Aware Recipe Recommendation: An Ensemble Topic Modeling Based
for instance be post-filtered to force the presence of different cat-                         Approach. arXiv preprint arXiv:1908.00148 (2019).
egories of recipes (e.g. main, vegetarian, dessert) and/or different                     [14] Florian Pecune, Shruti Murali, Vivian Tsai, Yoichi Matsuyama, and Justine Cassell.
ingredients (e.g. chicken, pork) in the list of recommended recipes.                          2019. A model of social explanations for a conversational movie recommenda-
                                                                                              tion system. In Proceedings of the 7th International Conference on Human-Agent
    The integration of a knowledge-based approach could be done                               Interaction. 135–143.
by building a conversational recommender system asking specific                          [15] Rajagopal Raghunathan, Rebecca Walker Naylor, and Wayne D Hoyer. 2006. The
                                                                                              unhealthy= tasty intuition and its effects on taste inferences, enjoyment, and
questions about users requirements. Appropriate conversational                                choice of food products. Journal of Marketing 70, 4 (2006), 170–184.
skills can also improve users’ experience as well as people’s per-                       [16] Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme.
ception of recommended items [14]. Furthermore, such a conver-                                2012. BPR: Bayesian personalized ranking from implicit feedback. arXiv preprint
                                                                                              arXiv:1205.2618 (2012).
sational approach could also help us to know whether users are                           [17] Benjamin Scheibehenne, Rainer Greifeneder, and Peter M Todd. 2010. Can there
initially interested in eating healthy so that the system could adapt                         ever be too many options? A meta-analytic review of choice overload. Journal of
its recommendations consequently.                                                             consumer research 37, 3 (2010), 409–425.
                                                                                         [18] Gunnar Schröder, Maik Thiele, and Wolfgang Lehner. 2011. Setting goals and
                                                                                              choosing metrics for recommender system evaluations. In UCERSTI2 workshop at
                                                                                              the 5th ACM conference on recommender systems, Chicago, USA, Vol. 23. 53.
                                                                                         [19] Gábor Takács and Domonkos Tikk. 2012. Alternating least squares for personal-
REFERENCES                                                                                    ized ranking. In Proceedings of the sixth ACM conference on Recommender systems.
 [1] Giuseppe Agapito, Mariadelina Simeoni, Barbara Calabrese, Ilaria Caré, Theodora          83–90.
     Lamprinoudi, Pietro H Guzzi, Arturo Pujia, Giorgio Fuiano, and Mario Cannataro.     [20] Chun-Yuen Teng, Yu-Ru Lin, and Lada A Adamic. 2012. Recipe recommendation
     2018. DIETOS: A dietary recommender system for chronic diseases monitoring               using ingredient networks. In Proceedings of the 4th Annual ACM Web Science
     and management. Computer methods and programs in biomedicine 153 (2018),                 Conference. 298–307.
     93–104.                                                                             [21] Thi Ngoc Trang Tran, Müslüm Atas, Alexander Felfernig, and Martin Stettinger.
 [2] Devis Bianchini, Valeria De Antonellis, Nicola De Franceschi, and Michele Mel-           2018. An overview of recommender systems in the healthy food domain. Journal
     chiori. 2017. PREFer: A prescription-based food recommender system. Computer             of Intelligent Information Systems 50, 3 (2018), 501–526.
     Standards & Interfaces 54 (2017), 64–75.                                            [22] Christoph Trattner and David Elsweiler. 2017. Food recommender systems:
 [3] Meng Chen, Xiaoyi Jia, Elizabeth Gorbonos, Chnh T Hong, Xiaohui Yu, and Yang             important contributions, challenges and future research directions. arXiv preprint
     Liu. 2019. Eating healthier: Exploring nutrition information for healthier recipe        arXiv:1711.02760 (2017).
     recommendation. Information Processing & Management (2019), 102051.                 [23] Christoph Trattner and David Elsweiler. 2017. Investigating the healthiness
 [4] Sally Jo Cunningham and David Bainbridge. 2013. An analysis of cooking queries:          of internet-sourced recipes: implications for meal planning and recommender
     Implications for supporting leisure cooking. (2013).                                     systems. In Proceedings of the 26th international conference on world wide web.
 [5] David Elsweiler, Christoph Trattner, and Morgan Harvey. 2017. Exploiting food            489–498.
     choice biases for healthier recipe recommendation. In Proceedings of the 40th       [24] Christoph Trattner and David Elsweiler. 2019. An Evaluation of Recommendation
     international acm sigir conference on research and development in information            Algorithms for Online Recipe Portals.. In HealthRecSys@ RecSys. 24–28.
     retrieval. 575–584.                                                                 [25] Tsuguya Ueta, Masashi Iwakami, and Takayuki Ito. 2011. Implementation of a
 [6] Jill Freyne and Shlomo Berkovsky. 2010. Intelligent food planning: personalized          goal-oriented recipe recommendation system providing nutrition information.
     recipe recommendation. In Proceedings of the 15th international conference on            In 2011 International Conference on Technologies and Applications of Artificial
     Intelligent user interfaces. ACM, 321–324.                                               Intelligence. IEEE, 183–188.
 [7] Mouzhi Ge, Mehdi Elahi, Ignacio Fernaández-Tobías, Francesco Ricci, and David       [26] Martijn C Willemsen, Mark P Graus, and Bart P Knijnenburg. 2016. Understanding
     Massimo. 2015. Using tags and latent factors in a food recommender system.               the role of latent feature diversification on choice difficulty and satisfaction. User
     In Proceedings of the 5th International Conference on Digital Health 2015. ACM,          Modeling and User-Adapted Interaction 26, 4 (2016), 347–389.
     105–112.                                                                            [27] Longqi Yang, Cheng-Kang Hsieh, Hongjian Yang, John P Pollak, Nicola Dell,
 [8] Mouzhi Ge, Francesco Ricci, and David Massimo. 2015. Health-aware food                   Serge Belongie, Curtis Cole, and Deborah Estrin. 2017. Yum-me: a personalized
     recommender system. In Proceedings of the 9th ACM Conference on Recommender              nutrient-based meal recommender system. ACM Transactions on Information
     Systems. 333–334.                                                                        Systems (TOIS) 36, 1 (2017), 7.
 [9] Morgan Harvey, Bernd Ludwig, and David Elsweiler. 2013. You are what you eat:
     Learning user tastes for rating prediction. In International Symposium on String
     Processing and Information Retrieval. Springer, 153–164.