=Paper= {{Paper |id=Vol-2439/6-paginated |storemode=property |title=An Evaluation of Recommendation Algorithms for Online Recipe Portals |pdfUrl=https://ceur-ws.org/Vol-2439/6-paginated.pdf |volume=Vol-2439 |authors=Christoph Trattner,David Elsweiler |dblpUrl=https://dblp.org/rec/conf/recsys/TrattnerE19 }} ==An Evaluation of Recommendation Algorithms for Online Recipe Portals== https://ceur-ws.org/Vol-2439/6-paginated.pdf
        An Evaluation of Recommendation Algorithms for Online
                             Recipe Portals
                            Christoph Trattner                                                                 David Elsweiler
                            University of Bergen                                                           University of Regensburg
                                   Norway                                                                          Germany
                          christoph.trattner@uib.no                                                         david.elsweiler@ur.de

ABSTRACT                                                                                  for example, building nutritional content into the recommendation
Better models of food preferences are required to realise the oft                         process [15, 19, 34] or by recommending meal plans, which tailor
touted potential of food recommenders to aid with the obesity crisis.                     recommendations to users’ nutritional needs over time [6].
Many of the food recommender evaluations in the literature have                              Providing healthful food recommendations, using any of the
been performed with small convenience samples, which limits our                           suggested strategies necessitates, however, that we can accurately
conidence in the generalisability of the results. In this work we test                    model and predict the food individual users would actually like to
a range of collaborative iltering (CF) and content-based (CB) re-                         eat. We have yet limited understanding as to which recommender
commenders on a large dataset crawled from the web consisting of                          algorithms work best [33] and the studies that have been performed
naturalistic user interaction data over a 15 year period. The results                     typically focus on one approach in isolation (e.g. recipe ingredients
reveal strengths and limitations of diferent approaches. While CF                         [11] or properties of the associated image [14]). Moreover, past
approaches consistently outperform CB approaches when testing                             work has tended to employ datasets derived from small scale user
on the complete dataset, our experiments show that to improve on                          studies [11, 19] limiting our conidence in the generalisability of the
CF methods require a large number of users (> 637 when sampling                           results. In this work, we test a number of competitive collaborative
randomly). Moreover the results show diferent facets of recipe con-                       iltering (CF) and content-based (CB) recommenders on a large
tent to ofer utility. In particular one of the strongest content related                  scale naturalistic dataset similar to those that have been studied
features was a measure of health derived from guidelines from the                         for cultural [24, 40] or epidemiological [37] reasons using data
UK Food Safety Agency. This inding underlines the challenges we                           science methods. We formulate the problem as is typically done
face as a community to develop recommender algorithms, which                              in recommendation experiments using past feedback from a given
improve the healthfulness of the food people choose to eat.                               user to predict future interactions by that same user [26]. The aim
                                                                                          being not only to compare and contrast diferent models, but also to
KEYWORDS                                                                                  examine the utility of diferent facets of content - which are diverse
                                                                                          in the case of online recipes - and establish how these inluence the
Online recipes; recommender systems
                                                                                          recommendation performance. The main indings include that:
ACM Reference Format:
Christoph Trattner and David Elsweiler. 2019. An Evaluation of Recom-
mendation Algorithms for Online Recipe Portals. In Proceedings of the 4th                 • CF methods consistently outperform CB methods over the full
International Workshop on Health Recommender Systems co-located with 13th                   dataset.
ACM Conference on Recommender Systems (HealthRecSys’19) (HealthRecSys                     • CF requires either a small number of highly active users or over
’19) , 5 pages.                                                                             six hundred users, selected randomly to achieve competitive
                                                                                            performance.
1    INTRODUCTION                                                                         • There is a useful signal in the CB facets, which would be useful
                                                                                            in cold-start situations.
Food recommenders (e.g. [11, 15]) and studies of online recipes (e.g.
                                                                                          • One of the most robust content features is the nutritional health-
[24, 40] ) have received increased research attention of late. A key
                                                                                            iness of the recipe as deined by a measure derived from the
motivation for this is often health, with recommender systems being
                                                                                            United Kingdom Food Standards Agency (FSA). This highlights
touted as a means to help people change dietary habits and address
                                                                                            that users are typically consistent in their nutritional preferences
costly societal problems, such as diabetes and obesity [7, 11].
                                                                                            over time and emphasizes the challenges faced to change eating
   Diverse studies have been published, ofering insight into the
                                                                                            habits.
contextual factors inluencing recipe preference [28, 40] and the
future popularity of recipes [36], as well as providing an under-
standing of the links between recipe preference and incidence of                             The remainder of the paper is structured as follows: Sections 3
eating related illness [37]. A further strain of research has attemp-                     and 4 describe the data basis and experimental setup, respectively.
ted to incorporate health in the food recommendation problem by,                          Section 5 continues to report the results of two rounds of experi-
                                                                                          ments, the irst of which uses the full dataset and the second em-
                                                                                          ploys a bootstrapping approach to test algorithms on sub-samples
HealthRecSys ’19, September 20, 2019, Copenhagen, Denmark
                                                                                          of the data of various sizes. Section 6 summarises the indings and
© 2019 Copyright for the individual papers remains with the authors. Use permitted
under Creative Commons License Attribution 4.0 International (CC BY 4.0). This            sets these in context against the literature, which is reviewed in the
volume is published and copyrighted by its editors..                                      following section.




                                                                                     24
HealthRecSys ’19, September 20, 2019, Copenhagen, Denmark                                                                                     Tratner et al.


2     RELATED WORK                                                           Table 1: Basic statistics of the Internet recipes dataset ob-
                                                                             tained from Allrecipes.com.
In this section two bodies of related work are reviewed. The irst
focuses on the evaluation of food recommender algorithms. The
second summarises studies of user interaction with online recipe                    Total published recipes                                   60,983
portals, which provides insight into human food preference and                      Recipes containing nutrition information                  58,263
the variables inluencing this.                                                      Recipes rated                                             46,713
                                                                                    Ratings                                                1,032,226
2.1    Food Recommendation                                                          Users providing ratings                                  125,762
Eforts to design automated systems to recommend meals can be
traced to the mid-1980s where case-based planning was employed
[18, 21]. More recent eforts have focused on rating prediction, using        interacted with and a growing body of evidence reports correla-
either aspects of recipe content or ratings data using collaborative         tions between recipes accessed via search engines, recipes portals
iltering approaches. Freyne et al. [11] showed the recommenda-               and social-media and incidence of diet-related illness [1, 3, 29, 37].
tions could be improved by decomposing recipes into individual               Moreover, clear weekly and seasonal trends can be observed in
ingredients and building user proiles comprising ingredients users           the way users interact with recipes, both in terms of the contained
liked based on ratings for the recipes containing these ingredients.         ingredients and the nutritional value of the recipes (fat, proteins,
Harvey et al. extended the approach and improved performance by              carbohydrates, and calories) [23, 40]. Other work has reported difer-
creating positive and negative proiles for users and reducing the            ent interaction patterns for users with diferent gender [28, 39] and
dimensionality of the matrices [19].                                         who live in diferent geographical areas within a country [40, 44].
   Other CB approaches have employed visual signals. Yang and                The number of variables shown to relate to eating habits highlights
colleagues demonstrated that algorithms designed to extrapolate im-          just how challenging a problem food recommendation is.
portant visual aspects of food images outperform baseline methods               The brief review of literature above has highlighted the increas-
[42, 43]. Elsweiler et al. [8] also show that automatically extrac-          ing popularity of food recsys research and that a key motivator is
ted low-level image features, such as brightness, colourfulness and          desire to build systems to promote healthy nutrition. Key takeaways
sharpness can be useful for predicting user food preference.                 from the review are as follows:
   A second approach has been to exploit ratings data using col-             • While several evaluations have CF and CB baselines, no extensive
laborative iltering (CF) techniques. Freyne and Berkovsky tested               comparison of CF and CB approaches in food recsys domain has
a nearest neighbour approach, which ofered poorer performance                  been published.
than the content approach described above [11]. Ge et al. [15] tested        • Moreover, no detailed investigation of diferent aspects of content
a matrix factorization solution that fuses ratings information and             that may be useful is available and much of the recipe content
user supplied tags to achieve signiicantly better prediction accur-            (recipe description, cooking steps, cooking time etc.) has not been
acy than content-based and standard matrix factorization baselines.            evaluated.
Several studies report that the best results are achieved when CF            • Finally, the evaluations performed to date have typically been
and CB approaches are combined in hybrid models [11, 14, 19].                  performed on small artiicially generated test collections.
   A common motivator for food recommendation work has been
to promote healthy nutrition. One approach is to rely on rules de-           3    MATERIALS
rived from domain experts to meet daily energy requirements [13]
                                                                             To address the identiied gaps in the literature, in this work, we
or focus on the nutritional requirements of speciic groups such
                                                                             make use of a web crawl of the online platform Allrecipes.com to
as the elderly care [10] or body-builders [38]. Others have tailored
                                                                             evaluate diverse CF and CB approaches in the recipe recommenda-
recommendations based on the user’s caloriic or other nutritional
                                                                             tion context.
needs [15, 16, 34], existing nutritional habits [31] or combine re-
                                                                                The platform was crawled between 20th and 24th of July, 2015.
commendations to meet requirements [6]. Again, approaches have
                                                                             We retrieved 60,983 recipes published by 25,037 users between the
been published for speciic target groups e.g. diabetics [25].
                                                                             years 2000 and 2015 through the sitemap that is available in the
                                                                             robots.txt ile of the website. In this paper we only make use of the
2.2    Studies of Food Behaviour using Online                                58,263 recipes where nutrition information was available. The basic
       Recipe Portals                                                        statistics of this dataset can be found in Table 1.
While not focusing on recommendation, a large body of recent work               In addition to the core recipe components ś such as recipe title,
sheds light on food preferences by studying interactions with on-            ingredient list, number of servings and instructions ś we also col-
line food portals. Analysing the nutritional content of these portals        lected for each recipe the according image, comments provided by
using metrics derived from the World Health Organisation (WHO)               users, rating information and nutrition facts1 , such as total energy
and the United Kingdom Food Standards Agency (FSA) has found                 (kCal), protein (g), carbohydrate (g), sugar (g), salt (g), fat (g) and
recipes to be mainly unhealthy, although healthy recipes can be              saturated fat (g) content (measured in 100g per recipe).
found [35]. Overall, people tend to interact with the least healthy
                                                                             1 Allrecipes.com estimates the nutritional facts for an uploaded recipe by matching
recipes most often [34]. There is, nevertheless, heterogeneity in
                                                                             the contained ingredients with those in the ESHA research database [9]. The ESHA
the user-base with respect to the nutritional properties of recipes          system is used by popular companies such as MCDonald’s and Kellogs.




                                                                        25
An Evaluation of Recommendation Algorithms for Online Recipe Portals HealthRecSys ’19, September 20, 2019, Copenhagen, Denmark


   Allrecipes.com is just one of many online recipe portals. Others          • Directions: From the directions block we computed two similarity
popular sites include Food.com, Epicurious.com, Yummly.com and                 features based again on a LDA topic vector representation of
Cooks.com. We chose Allrecipes.com because, at the time of writ-               the text as well as on TFśIDF vector representation. Similarities
ing, it claims to be the world’s largest food-focused social network:          were again computed employing the cosine similarity measure
the site has a community of over 40 million users from 24 countries            on these vectors.
who annually visit 3 billion recipes [2]. This claim has been corrob-        • Ratings: Here we rely on the the number of ratings of a recipe as
orated by services such as eBizMBA, which ranks Allrecipes.com                 well the average rating. To compute similarities between recipes
as the most popular recipe website [5]. This means that we not                 on theses indicators we rely again on the inverse Manhatten
only analyze a large scale dataset, but also the most popular recipe           distance, i.e. 1 − |metric(r i ) − metric(r j )|.
platform on the Web.                                                         • Health: In order to measure healthiness of a recipe we rely on
                                                                               the following macro nutrient: ‘fat’, ‘saturated fat’, ‘sugar’ and
4    EXPERIMENTAL SETUP                                                        ‘salt’ (measured in 100g per recipe). This allows us to measure
We ran a series of experiments evaluating the performance of                   the healthiness of a recipe according to international standards
6 prominent recommender algorithms on the rating data using                    as introduced in 2007 by The Food Standard Agency (FSA) [12].
the LibRec2 framework. The algorithms tested are: Random item                  There are also other standards that can be applied, such as the
ranking (our baseline), Most Popular item ranking (MostPopular),               ones provided by the World Health Organization (WHO) [41]
user- and item-based collaborative iltering (denoted as UserKNN                or the HEI metric as proposed by the CDC [20]. We employ the
and ItemKNN) [30], Bayesian Personalized Ranking (BPR) [26],                   standards provided by the FSA, as this is currently most robust
Weighted matrix factorization (WRMF) [22] and Latent Dirichlet                 method to estimate the healthiness of online recipes. The metric
Allocation (LDA) [17].                                                         was also used in related work [34]. The scale ranges from 4 for
   For the content-based approaches we induced in total 20 diferent            very healthy recipes to 12 for very unhealthy recipes. Throughout
features, which we used to compute similarities between recipes.               the paper we refer to this metric as ‘FSA score’.
Below we briely summarise these features and their corresponding               For each of the features described above, we derive a scoring
sets:                                                                        function that computes as follows:
                                                                                                                 Í
• Title: For the title feature set, we derived 5 similarity features,                                               sim(i, p)
  based on Levenshein distance, Least Common Sub-Sequence                                                                   p ∈Pu
                                                                                                score(u, i)f eatur e =                        ,                (1)
  (LCS), Jaro-Winkler distance and bi-gram distance. To obtain a                                                                    |Pu |
  similarity value between two recipes based on these features               where Pu is the set of items of a user u, i an arbitrary item, and
  we calculate 1 − dist(r i , r j ). Furthermore, we employ LDA topic        sim(i, p) is any of the above mentioned similarity metrics between
  modelling on the recipe titles using Mallet with Gibbs sampling.           item i and p.
  The number of topics was set to 100 topics. Hence for each recipe             For each feature set we calculate scores based on the linear
  we induce a vector of dimension one hundred capturing the topic            combination of the similarities3 .
  distribution. To calculate similarities between recipes we employ             As in previous work [26], we operationalise the experiments
  the cosine similarity metric.                                              as a personalized ranking problem (item recommendation). The
• Image: For the image feature set we employed on the one hand               aim here is to provide a user with a ranked list of items where the
  side image attractiveness measures such as image brightness,               ranking has to be inferred from the implicit behavior of the user
  sharpness, contract, colorfulness and entropy as well as deep              (e.g. recipes rated in the past). Implicit feedback systems, such as
  convolutional neural network (CNN) features from a pre-trained             those studied in [26] are challenging as only positive observations
  VGG-16 model [32]. For each image we derive one embedding                  are available. The non-observed user-item pairs ś e.g. a user has
  vector of dimension 4096 and calculate cosine similarity between           not cooked a recipe yet ś are a mixture of real negative feedback
  recipes on these vectors. To measure the similarity between two            (the user is not interested in cooking the recipe) and missing values
  recipes based on the image attractiveness metrics [36] we employ           (the user might want to cook the recipe in the future). We use 5-
  the Manhatten distance, i.e. 1 − |metric(r i ) − metric(r j )|.            fold cross validation as protocol for all the experiments and report
• Ingredients: To calculate similarities between recipes on ingredi-         the recommendation performance results employing AUC as a
  ent level, we inducted four diferent features. On the one hand             performance metric [27].
  side the text itself was used and brought to a TFśIDF repres-                 To reduce data sparsity issues, a well-known issue in collaborat-
  entation to calculate cosine similarity between recipes. On the            ive iltering-based methods [27], in the irst experiments we apply
  other hand side we also chose to employ LDA again to derive                a p-core ilter approach [4] using only user proiles with at least
  a topic distribution and to calculate cosine similarity between            20 rating interactions4 and recipes that have been rated at least 20
  recipes on those vectors. Finally, we employed the normalized              times by the users, resulting in a inal dense dataset comprising
  ingredient strings, to calculate similarities between recipes using        1273 users, 1031 items and 50,681 interactions. To study the efects
  cosine similarity and Jaccard. In the case of cosine we normalized         of diferent levels of users on performance we report a second set
  the quantities of each ingredient to 100g of a recipe and used the
  normalized quantity values as frequency indicator.                         3 Parameters were tuned to the optimum using grid search.
                                                                             4 We transfer all ratings to positive feedback, i.e. any rating is counted as positive
                                                                             feedback and any none interaction as negative feedback. This makes sense as 95% of
2 http://www.librec.net/
                                                                             all ratings in the Allrecipes.com dataset are 5-star ratings, see also [36].




                                                                        26
HealthRecSys ’19, September 20, 2019, Copenhagen, Denmark                                                                                           Tratner et al.

Table 2: Results of the recommender experiment ś collabor-                      (A)                 Dense Data Samples (p−core=20)
ative (CF) vs content-based (CB) ś in the dense data sample                                                           ●    ●
                                                                                                                                ●    ●    ●    ●    ●
                                                                                                                ●
with all users. Best features in each set (CF and CB) are bol-                       0.68                 ●                                               Algorithm




                                                                               AUC
                                                                                                    ●
                                                                                                                                                          ●   BPR
ded. Top-5 (↑) and Bottom-5 (↓) single content features are                          0.64
                                                                                                ●
                                                                                                                                                              CB:All
also marked.                                                                         0.60
                                                                                            ●




                                                                                            1


                                                                                                5


                                                                                                     10


                                                                                                          20


                                                                                                                 30


                                                                                                                      40


                                                                                                                           50


                                                                                                                                60


                                                                                                                                     70


                                                                                                                                          80


                                                                                                                                               90


                                                                                                                                                    100
       Method      Algorithm                         AUC                                                       Number of Users [%]

                                                                                (B)                 Sparse Data Samples (no p−core)
                   BPR                               .7094                                                                                     ●    ●
                   WRMF                              .6881                           0.60                                                 ●
                                                                                                                                                          Algorithm
                                                                                                                                     ●
                   UserKNN                           .6962




                                                                               AUC
                                                                                     0.56                                       ●
       CF




                                                                                                                           ●                              ●   BPR
                   ItemKNN                           .6909                           0.52
                                                                                                                      ●
                                                                                                                                                              CB:All
                                                                                                                ●
                                                                                                          ●
                   MostPopular                       .6864                           0.48   ●   ●   ●
                   LDA                               .6863




                                                                                            1


                                                                                                5


                                                                                                     10


                                                                                                          20


                                                                                                                 30


                                                                                                                      40


                                                                                                                           50


                                                                                                                                60


                                                                                                                                     70


                                                                                                                                          80


                                                                                                                                               90


                                                                                                                                                    100
                                                                                                               Number of Users [%]
                   Title:Levenstein-Distance         .5468 (↑)
                   Title:Bigram-Distance             .5500 (↑)
                                                                              Figure 1: (A) shows the results in the dense data samples (=
                   Title:LCS-Distance                .5424
                                                                              p-core iltered) where each user has at least 20 item interac-
                   Title:LDA-Text-Cosine             .5353
                                                                              tions and each item is at least 20-times interacted with, (B)
                   Title:Jaro-Winkler-Distance       .5324
                                                                              shows the results in the sparse data samples (=no p-core).
                   Title:All                         .5523
                   Image:Cosine-Embeddings           .5322
                   Image:Colorfulness-Distance       .5072 (↓)
                   Image:Contrast-Distance           .5175
                   Image:Sharpness-Distance          .5109
                   Image:Entropy-Distance            .5080 (↓)                AUC scores of > .686. This compares to .5883 achieved by the linear
                   Image:Brightness-Distance         .4991 (↓)                combination of content features (= CB:All).
       CB




                   Image:All                         .5425                       Examining the performance of diferent aspects of content (title,
                   Ingredients:Cosine-Text           .5547                    image, ingredients, direction and health) shows that there is a signal
                   Ingredients:Cosine-LDA-Text       .5653 (↑)                in each of these aspects. This is a sign of the consistency, in terms
                   Ingredients:Jaccard               .5502                    of the properties of recipes, which individual users tend to rate.
                   Ingredients:Cosine                .5575                    The fact that the combined model łAllž does not achieve a high
                   Ingredients:All                   .5718                    improvement on these signals individually is perhaps an indication
                                                                              that a linear combination is not the best means to combine these
                   Directions:Cosine-LDA-Text        .5606 (↑)                signals. One of the strongest content-based features is the FSA score
                   Directions:Cosine-Text            .5210                    (AUC=.5775). Again, this hints at consistency in user preference,
                   Directions:All                    .5731                    this time in terms of the healthiness of recipes, which individual
                   Ratings:Number-Distance           .4789 (↓)                users interact with.
                   Ratings:Average-Distance          .4832 (↓)                   To complement these initial results and better understand the
                   Ratings:All                       .5249                    relationship between CF and CB methods and the amount of data
                                                                              required to achieve strong recommendation performance with these
                   Health:FSA                        .5775 (↑)
                                                                              approaches, we performed the bootstrapping study as described
                   CB:All                            .5883                    above. The results are presented in Figure 1.
                   Random                            .4989                       In a irst test, see Figure 1 (A), we sampled only from active
                                                                              users, that is, we derived a test size of various sizes where users
                                                                              had rated at least 20 items and the items involved had also achieved
of bootstrapped experiments using smaller dense samples of heavy              at least 20 ratings. Taking this dense sample showed that even a
users (using the same criteria as above), and varying collection sizes        small number of users can attain stable performance. With only 1%
using standard random sampling, referred to as ‘sparse samples’ in            of all users (N=13) the CF technique (BPR) is able to outperform the
the text. These experiments were repeated 100 times each and the              content approach. Nevertheless, when users are selected at random
average performance reported.                                                 from the dataset and no p-core ilter is applied, see Figure 1 (B) ś
                                                                              which we argue is a much more realistic setup [4] ś many more
5   RESULTS                                                                   users are required on average to achieve an equivalent perform-
                                                                              ance. Whereas the CB approaches achieve a consistent performance
The results of the experiments on the full dataset are shown in
                                                                              (AUC=> .54) regardless of the number of users studied, half of the
Table 2. The CF methods clearly outperform the content-based
                                                                              dataset (50%, N=637) is required before the CF methods outperform
approaches. The best performing CF method (BPR) achieved an
                                                                              the CB approach.
AUC score of .7094 and the remaining CF methods demonstrated




                                                                         27
An Evaluation of Recommendation Algorithms for Online Recipe Portals HealthRecSys ’19, September 20, 2019, Copenhagen, Denmark


6     SUMMARY & CONCLUSION                                                                       [18] Kristian J Hammond. 1986. CHEF: A Model of Case-based Planning.. In AAAI.
                                                                                                      267ś271.
In this work we have tested competitive recommendation algorithms                                [19] Morgan Harvey, Bernd Ludwig, and David Elsweiler. 2013. You are what you eat:
on a large online recipe dataset. While algorithms of these types                                     Learning user tastes for rating prediction. In International Symposium on String
                                                                                                      Processing and Information Retrieval. Springer, 153ś164.
have been evaluated before (e.g. [11, 19]), no systematic evaluation                             [20] HEI. 2016. Healthy Eating Index. (Oct. 2016). https://www.cnpp.usda.gov/health
has been performed on naturalistic data of this type for only recipes                                 yeatingindex
and no results have been published with respect to what signal can                               [21] Thomas R Hinrichs. 1989. Strategies for adaptation and recovery in a design
                                                                                                      problem solver. In Proc. of Workshop CBR ’89. 115ś118.
be ofered by diferent facets of recipe content.                                                  [22] Yifan Hu, Yehuda Koren, and Chris Volinsky. Collaborative iltering for implicit
   Our primary inding is that CF outperformed CB in our exper-                                        feedback datasets. In Proc. of ICDM’08. 263ś272.
                                                                                                 [23] Tomasz Kusmierczyk, Christoph Trattner, and Kjetil Nùrvåg. Temporality in
iments. This is a diferent result from the literature - both [11]                                     online food recipe consumption and production. In Proc. of WWW ’15.
and [19] report ingredient based CB methods outperforming CF                                     [24] Tomasz Kusmierczyk, Christoph Trattner, and Kjetil Nùrvåg. 2016. Understanding
baselines. The small size datasets in these past studies, however,                                    and Predicting Online Food Recipe Production Patterns. In Proc. of HT ’16. 243ś
                                                                                                      248.
suggests the results to be compatible. It is only after data for several                         [25] Maiyaporn Phanich, Phathrajarin Pholkul, and Suphakant Phimoltares. 2010.
hundred (in our experiment 637) users is available that CF methods                                    Food recommendation system using clustering analysis for diabetic patients. In
start to outperform CB.                                                                               Proc. of ISA ’10. 1ś8.
                                                                                                 [26] Stefen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme.
   With respect to recipe content, the performance of FSA high-                                       BPR: Bayesian personalized ranking from implicit feedback. In Proc. of UIAI’09.
lights the challenge in changing people’s habits. This aligns with                                    452ś461.
                                                                                                 [27] Francesco Ricci, Lior Rokach, and Bracha Shapira. 2011. Introduction to recom-
past work revealing that the majority of users tend to prefer un-                                     mender systems handbook. Springer.
healthy food, a smaller group preferred healthy recipes, but both                                [28] Markus Rokicki, Eelco Herder, Tomasz Kusmierczyk, and Christoph Trattner.
groups were consistent in their judgments over time [19]. As a com-                                   Plate and Prejudice: Gender Diferences in Online Cooking. In Proc. of UMAP ’16.
                                                                                                      207ś215.
munity we need to think hard about how these group members can                                   [29] Alan Said and Alejandro Bellogín. You are What You Eat! Tracking Health
be targeted with recommendations that might alter this situation.                                     Through Recipe Interactions. In Proc. of RSWeb ’14.
                                                                                                 [30] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. Item-based
                                                                                                      collaborative iltering recommendation algorithms. In Proc. of WWW ’01. 285ś
REFERENCES                                                                                            295.
 [1] Soiane Abbar, Yelena Mejova, and Ingmar Weber. You Tweet What You Eat:                      [31] Hanna Schäfer and Martijn C Willemsen. 2019. Rasch-based tailored goals for
     Studying Food Consumption Through Twitter. In Proc. of CHI ’15.                                  nutrition assistance systems. In Proc. of IUI ’19. 18ś29.
 [2] Allrecipes. 2016. Allrecipe.com Press report. available at http://press.allrecipes.c        [32] Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks
     om/. Last accessed on 22.03.2019. (2016).                                                        for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
 [3] Munmun De Choudhury and Sanket S Sharma. Characterizing Dietary Choices,                    [33] Christoph Trattner and David Elsweiler. 2017. Food recommender systems:
     Nutrition, and Language in Food Deserts via Social Media. In Proc. of CSCW ’16.                  important contributions, challenges and future research directions. arXiv preprint
 [4] Stephan Doerfel, Robert Jäschke, and Gerd Stumme. 2016. The role of cores in                     arXiv:1711.02760 (2017).
     recommender benchmarking for social bookmarking systems. ACM Transactions                   [34] Christoph Trattner and David Elsweiler. 2017. Investigating the healthiness
     on Intelligent Systems and Technology (TIST) 7, 3 (2016), 40.                                    of internet-sourced recipes: implications for meal planning and recommender
 [5] Ebizma. 2017. Ebizma rankings for recipe websites. available at http://www.ebiz                  systems. In Proc. of WWW ’17. 489ś498.
     mba.com/articles/recipe-websites. Last accessed on 22.03.2019. (2017).                      [35] Christoph Trattner, David Elsweiler, and Simon Howard. 2017. estimating the
 [6] David Elsweiler and Morgan Harvey. Towards automatic meal plan recommend-                        healthiness of internet recipes: a cross-sectional study. Frontiers in public health
     ations for balanced nutrition. In Proc. of RecSys ’15. 313ś316.                                  5 (2017), 16.
 [7] David Elsweiler, Morgan Harvey, Bernd Ludwig, and Alan Said. Bringing the                   [36] Christoph Trattner, Dominik Moesslang, and David Elsweiler. 2018. On the
     "healthy" into Food Recommenders. In Proc. of DRMS ’15. 33ś36.                                   predictability of the popularity of online recipes. EPJ Data Science 7, 1 (2018), 20.
 [8] David Elsweiler, Christoph Trattner, and Morgan Harvey. Exploiting Food Choice              [37] Christoph Trattner, Denis Parra, and David Elsweiler. 2017. Monitoring obesity
     Biases for Healthier Recipe Recommendation. In Proc. of SIGIR ’17. 575ś584.                      prevalence in the United States through bookmarking activities in online food
 [9] ESHA. 2016. Nutrition Labeling Software. available at http://www.esha.com/.                      portals. PloS one 12, 6 (2017), e0179144.
     Last accessed on 22.03.2019. (2016).                                                        [38] Piyaporn Tumnark, Filipe Almeida da Conceição, João Paulo Vilas-Boas, Leandro
[10] Vanesa Espín, María V Hurtado, and Manuel Noguera. 2016. Nutrition for Elder                     Oliveira, Paulo Cardoso, Jorge Cabral, and Nonchai Santibutr. 2013. Ontology-
     Care: a nutritional semantic recommender system for the elderly. Expert Systems                  based personalized dietary recommendation for weightlifting. In Proc. of Int. WS
     33, 2 (2016), 201ś210.                                                                           on Computer Science in Sports. 44ś49.
[11] Jill Freyne and Shlomo Berkovsky. Intelligent Food Planning: Personalized Recipe            [39] Claudia Wagner and Luca Maria Aiello. 2015. Men eat on Mars, Women on
     Recommendation. In Proc. of IUI ’10. 321ś324.                                                    Venus?: An Empirical Study of Food-Images.. In Proc. of WebSci ’15. 63ś1.
[12] FSA. 2016. Guide to creating a front of pack (FoP) nutrition label for pre-packed           [40] Claudia Wagner, Philipp Singer, and Markus Strohmaier. 2014. The nature and
     products sold through retail outlet. available at https://www.food.gov.uk/sites/de               evolution of online food preferences. EPJ Data Science 3, 1 (2014), 1ś22.
     fault/iles/multimedia/pdfs/pdf-ni/fop-guidance.pdf. Last accessed on 22.03.2019.            [41] Joint Who and FAO Expert Consultation. 2003. Diet, nutrition and the prevention
     (2016).                                                                                          of chronic diseases. World Health Organ Tech Rep Ser 916, i-viii (2003).
[13] Dhomas Hatta Fudholi, Noppadol Maneerat, and Ruttikorn Varakulsiripunth.                    [42] Longqi Yang, Yin Cui, Fan Zhang, John P Pollak, Serge Belongie, and Deborah
     2009. Ontology-based daily menu assistance system. In Proc. of ECTICON ’09.                      Estrin. 2015. Plateclick: Bootstrapping food preferences through an adaptive
     694ś697.                                                                                         visual interface. In Proc. of CIKM ’15. 183ś192.
[14] Xiaoyan Gao, Fuli Feng, Xiangnan He, Heyan Huang, Xinyu Guan, Chong Feng,                   [43] Longqi Yang, Cheng-Kang Hsieh, Hongjian Yang, John P Pollak, Nicola Dell,
     Zhaoyan Ming, and Tat-Seng Chua. 2018. Visually-aware Collaborative Food                         Serge Belongie, Curtis Cole, and Deborah Estrin. 2017. Yum-Me: A Personalized
     Recommendation. arXiv preprint arXiv:1810.05032 (2018).                                          Nutrient-Based Meal Recommender System. ACM Transactions on Information
[15] Mouzhi Ge, Francesco Ricci, and David Massimo. Health-aware Food Recom-                          Systems (TOIS) 36, 1 (2017), 7.
     mender System. In Proc. of RecSys ’15. 333ś334.                                             [44] Yu-Xiao Zhu, Junming Huang, Zi-Ke Zhang, Qian-Ming Zhang, Tao Zhou, and
[16] Elizabeth Gorbonos, Yang Liu, and Chính T Hoàng. 2018. NutRec: Nutrition                         Yong-Yeol Ahn. 2013. Geography and similarity of regional cuisines in China.
     Oriented Online Recipe Recommender. In Proc. of WI ’18. 25ś32.                                   PloS one 8, 11 (2013), e79161.
[17] Tom Griiths. 2002. Gibbs sampling in the generative model of latent dirichlet
     allocation. (2002).




                                                                                            28