Advancing Visual Food Attractiveness Predictions for
                         Healthy Food Recommender System
                         Ayoub El Majjodi1,* , Sohail Ahmed Khan1 , Alain D. Starke1,2 , Mehdi Elahi1 and
                         Christoph Trattner1
                         1
                             MediaFutures, University of Bergen, Lars Hilles Gate 30, Bergen, Norway
                         2
                             Amsterdam School of Communication Research (ASCoR), University of Amsterdam, Amsterdam, Netherlands


                                       Abstract
                                       The visual representation of food on digital platforms affects the foods chosen by users, including in the context
                                       of recommender systems. Previous studies show that small changes in visual features can influence human
                                       decision-making, regardless of whether the food is healthy. This paper reports on a study aimed at better
                                       understanding how users perceive the attractiveness of food recipe images in the digital world. In an online
                                       mixed-methods survey (𝑁 = 192), users provided visual attractiveness ratings of food images on a 7-point scale,
                                       along with textual assessments. We found robust correlations between fundamental visual features (e.g., contrast,
                                       colorfulness) and perceived image attractiveness. The analysis also revealed that, among other user factors,
                                       cooking skills positively affected perceived image attractiveness. Regarding food image dimensions, appearance
                                       and perceived healthiness were significantly correlated with user ratings of food image attractiveness.

                                       Keywords
                                       Food recommender systems, User modeling, Image attractiveness, Health, Personalization, Digital nudges


                         1. Introduction
                         Visual cues and attractiveness play a crucial role in everyday food choices [1]. Even when only presented
                         with a food image, humans tend to instantly assess a food’s energy density, expected taste and other
                         characteristics [2]. As such, images are one of the key determinants of food preferences [2, 3], tapping
                         into emotional and hedonic processes of an individual [4].
                            The importance of visual attractiveness also applies to digital choice context, including food rec-
                         ommender systems [5]. Our previous research has shown the capability of recommender systems to
                         influence food behaviors via visual features, including the promotion of either high-fat or low-fat food
                         choices [6], as well as encouraging the search for healthier options [3]. Additionally, our earlier work
                         has established that visual attractiveness significantly contributes to predicting the online popularity of
                         food items [7], and these visual features can also be leveraged to infer cultural backgrounds [8].
                            What is currently missing is an in-depth examination of image feature modeling. Although previous
                         studies have extracted image features and examined the relation between those features, visual attrac-
                         tiveness and user preferences [6, 3], these models have not been optimized. Moreover, to date, image
                         features have not been related to user characteristics (e.g., demographics, food knowledge), which are
                         also important determinants of food preferences [9].
                            We present the results of a mixed-method study that explores the determinants of visual attractiveness
                         in digital recipe images more comprehensively. Our approach builds upon previous work by modeling
                         perceived visual attractiveness based on low-level image features [10, 11, 3]. Additionally, we seek to

                          HealthRecSys’24: The 6th Workshop on Health Recommender Systems co-located with ACM RecSys 2024
                         *
                           Corresponding author.
                          $ ayoub.majjodi@gmail.com (A. E. Majjodi); sohail.khan@uib.no (S. A. Khan); a.d.starke@uva.nl (A. D. Starke);
                          mehdi.elahi@uib.no (M. Elahi); christoph.trattner@uib.no (C. Trattner)
                           http://ayoubmajjodi.info/ (A. E. Majjodi); https://mediafutures.no/2021/09/13/sohali-ahmed-khan/ (S. A. Khan);
                          https://mediafutures.no/2021/06/21/alain-starke/ (A. D. Starke); https://mediafutures.no/2020/11/25/mehdi-elahi/ (M. Elahi);
                          https://christophtrattner.info/ (C. Trattner)
                           0000-0002-7478-5811 (A. E. Majjodi); 0000-0001-5351-2278 (S. A. Khan); 0000-0002-9873-8016 (A. D. Starke);
                          0000-0003-2203-9195 (M. Elahi); 0000-0002-1193-0508 (C. Trattner)
                                       © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
optimize this model by integrating user characteristics that have been employed in knowledge-based
food recommender systems to promote healthier recipe choices [12, 13, 14].
  Finally, we inquire more qualitatively on user justifications for provided visual attractiveness ratings,
asking to motivate their quantitative judgment. We formulate the following research questions:

                                    User Factors


                           User Demographics


                              User Profile           Recipe Ratings and Judgment     Food Image Dimensions


                            User Knowledge


        Figure 1: Steps of the user flow designed for the online survey.


    • RQ1: To what extent do the latest deep learning methods predict visual attractiveness compared
      to state-of-the-art low-level features?
    • RQ2: To what extent do user characteristics, including demographics, food knowledge, and eating
      goals, predict food image attractiveness?
    • RQ3: What dimensions determine the attractiveness of food image?

1.1. Contributions
Compared to our extensive previous work in the field of visual attractiveness and food recommender
systems [6, 7, 8, 13], this study offers novel insights into several key aspects:

    • Previous work mostly relied on low-level image attractiveness features, while this study shows
      how new deep-learning models compare to these old features.
    • This work, compared to any before, also shows as to what extent demographic features play a
      role in predicting visual food attractiveness. To our knowledge, no other work has shown this
      before.
    • Finally, this study tries to go beyond traditional quantitative black box approaches and reveals
      why images are rated less or more attractive.


2. Study Design
To perform our study, we employed a dataset sourced from the well-known recipes website All-
Recipes.com, with the addition of new recipe photos [3, 14]. The dataset comprised various recipe
features, including image URL, ingredients, amount of fats and sugar, and instructions and ingredients.
To generate a diverse set of images, we randomly selected 200 recipes with relatively from the dataset
of 58,000. As most images in this dataset were relatively unattractive [3], we used the recipe’s title in
search engines and image websites (e.g., Unsplash) to look for more attractive images for 100 of these
recipes. To validate this process, three computational food researchers, including a co-author, voted on
which of the two photos was the most attractive to ensure a diverse set of recipe images in terms of
expected attractiveness.
   The study involved a survey design, as depicted in Figure 1. Participants first provided demographic
information, as well as responded to items that measured their subjective food knowledge (4 items)
and cooking skills (6 items), using 5-point Likert scales based on earlier work [15, 16, 17]. We also
used questions from earlier work on a knowledge-based food recommender [14], to inquire on other
user characteristics, including recipe website usage and home cooking frequency, cooking experience
and dietary goals. Afterwards, users were invited to rate the visual attractiveness of 12 semi-randomly
selected recipe images, on 7-point attractiveness scales. In addition, to address [RQ3], they were asked
to write at least one sentence about why they had given this rating. Finally, to support our examination
of [RQ3], we used 5-point Likert scales on food image dimensions [18], to ask to what extent a recipe’s
appearance, expected taste, healthiness, and familiarity affected their attractiveness ratings.
   We employed the Prolific crowdsourcing platform to recruit 192 users (65% male; 𝑀𝑎𝑔𝑒 = 33.54)
to participate in our study. The study took approximately 11 min to complete and participants were
reimbursed with GBP 1.651 .

       Table 1
       Linear regression models predicting visual attractiveness ratings for recipe images: (A) with low-level
       image visual features, (B) with deep learning-based visual features. ***𝑝 < 0.001, **𝑝 < 0.01, *𝑝 < 0.05.
           (A)
           Low-level Image Features
                                              𝛽 (𝑆.𝐸)

           Colourfulness                      6.725 (1.521)***           (B)
                                                            ***
           Brightness                         2.136 (0.155)                        Image features Extractor
           Naturalness                        1.925 (0.530)***                     VGG16     ResNet    CLIP
           Entropy                            1.026 (0.154)***
           Saturation                         −3.976 (1.020)***          𝑅2        0.351***     0.349***     0.357***
           Sharpness                          −1.182 (1.187)*            RMSE      1.500        1.491        1.501
           RGBContrast                        −1.782 (3.808)
           Contrast                           7.401 (11.101)
           Constant                           −6.884 (1.243)***
           𝑅2                                 0.110***
           RMSE                               1.753


3. Results
To address our research questions, we primarily employed linear regression models. This helped to
understand the principal impacts of image attributes and user characteristics on image attractiveness,
the latter derived from user ratings. For our thematic analysis, the images were split into attractive
and unattractive based on the mid-point of the rating scale (4) (𝑀 = 4.33, 𝑆𝐷 = 1.80). Details of used
materials and conducted analyses can be accessed through the following URL [19].

3.1. RQ1: Predicting Visual Attractiveness
We first modeled perceived visual attractiveness based on the underlying image features. We extracted
diverse low-level visual features using the OpenIMAJ Java Framework (cf. [7]). Subsequently, we
conducted a linear regression analysis to predict attractiveness based on these extracted visual features.
The results are outlined in Table (1.A), revealing that several image features significantly affected the
attractiveness of a recipe image: 𝐹 (8, 2100) = 32.66, 𝑝 < 0.001. Specifically, Colourfulness, Brightness,
Naturalness, and Entropy demonstrated a positive association with image attractiveness. In contrast,
Saturation, Sharpness, and RgbContrast negatively affected image attractiveness. In line with [3], these
results suggested that users perceived colorful, bright, and naturalistic food images as more attractive.

1
    Our study complied with the ethical guidelines of the Research Council of Norway and the guidelines of University of Bergen
    for scientific research. It was judged to pass without further extensive review.
   Going beyond low-level visual image features, we used deep learning architecture models. Our toolkit
included established models, such as VGG16 [20] and ResNet [21], along with the well-known trans-
former [22] architecture for visual feature extraction, CLIP2 [23]. Table (1.B) outlines the performance
of these different models, outperforming our regression model in terms of 𝑅2 and RMSE. This aligns
with previous research where deep learning embeddings also outperformed low-level visual features
within the context of food [7, 24].

3.2. RQ2: User characteristics and Image Attractiveness
We further examined whether user factors affected the perceived visual attractiveness of images.
Accordingly, we divided user characteristics into different categories: User demographics, User profile,
which represented the backbone of a food knowledge-based recommender system, and User knowledge,
which measures the user’s food knowledge and cooking skills. A confirmatory factor analysis, reported in
Table 2, showed that both subjective food knowledge and cooking skills adhered to internal consistency
guidelines (𝛼 > 0.70) while they also met the guidelines for convergent validity (𝐴𝑉 𝐸 > 0.5).

       Table 2
       Results of the principal component factor analysis across different subjective food knowledge and
       cooking skills. Items were measured on 5-point Likert scales. Cronbach’s Alpha is denoted by 𝛼, 𝐴𝑉 𝐸
       is the average variance explained. Items in grey and without loading were omitted.

         Aspect                           Item                                                        Loading

                                          Compared with an average person, I know a lot about          0.777
         Subjective Food Knowledge        healthy eating.
         𝛼 = 0.866                        I think I know enough about healthy eating to feel pretty    0.885
         𝐴𝑉 𝐸 = 0.858                     confident when choosing a recipe.
                                          I know a lot about how to evaluate the healthiness of a      0.773
                                          recipe.
                                          I do not feel very knowledgeable about healthy eating.       0.932
                                          I can confidently cook recipes with basic ingredients.       0.751
                                          I can confidently follow all the steps of simple recipes.
         Cooking skills
                                          I can confidently taste new foods.                           0.737
         𝛼 = 0.783
         AVE = 0.591                      I can confidently cook new foods and try new recipes.        0.869
                                          I enjoy cooking food.                                        0.655
                                          I am satisfied with my cooking skills.                       0.816

   Table (3.A) presents the outcomes of the linear regression model aimed at forecasting the attrac-
tiveness of image recipes: 𝐹 (9, 2090) = 3.60. Among the various user factors examined, only two
significantly affected recipe attractiveness: cooking skills (𝛽 = 0.34, p-value= 0.00021) and recipe
website usage (𝛽 = 0.18, p-value= 0.020). However, none of the other user aspects affected user
ratings for a given image recipe. Additionally, we also analyzed a combined model of image features
and user factors, but this lead to results similar to the separate models reported in Tables (1 and 3.A).
This suggested that low-level visual features had a more significant impact on food image attractiveness
than user features, largely in line with preliminary findings in previous research [3, 18].

3.3. RQ3: Justifications for Visual Attractiveness
To assess the influence of different food image dimensions on user ratings for food images, we modeled
visual attractiveness based on the reported importance of food image dimensions. Table 3 outlines the
results of the regression model: 𝐹 (4, 21) = 2.41.
   Two factors significantly impacted attractiveness. First, appearance significantly impacted user
ratings (𝛽 = 0.12, 𝑝 = 0.03). Second, the expected healthiness from the images also demonstrated a
2
    Contrastive Language-Image Pre-training (CLIP).
    Table 3
    Linear regression models predicting user rating for recipe image attractiveness: (A): with user factors,
    (B): with food image dimensions. ***𝑝 < 0.001, **𝑝 < 0.01, *𝑝 < 0.05.
     (A)
     User Factors
                                    𝛽 (S.E)                    (B)
                                                               Food Image Dimension
     User Demographic                                                                     𝛽(S.E)
     Age                            −0.047 (0.116)
     Education                      −0.424 (0.320)             Appearance                 0.129 (0.061)*
     Gender                         −0.077 (0.088)             Healthiness                0.077 (0.035)*
     User Profile                                              Taste                      −0.005 (0.050)
     Recipe Website Usage           0.201 (0.086)*             Familiarity                0.0231 (0.038)
     Home Cooking                   −0.009 (0.078)             Constant                   3.487 (0.365)***
     Cooking Experience             −0.052 (0.079)             R2                         0.011***
     Eating Goals                   0.019 (0.063)              RMSE                       1.855
     User Knowledge
     Subjective Food Knowledge      −0.213 (0.138)
     Cooking Skills                 0.315 (0.086)***
     Constant                       4.001 (0.570)***
     R2                             0.015***
     RMSE                           1.845


significant impact (𝛽 = 0.07, 𝑝 = 0.03). However, perceived taste and familiarity did not impact user
ratings.
   To gain insights into the reasons behind the visual attractiveness ratings, we collected qualitative
justifications from users. Natural Language Processing (NLP) techniques, such as punctuation removal,
repeated character elimination, and stop word filtering, were applied to analyze 2,019 user justifications
for both attractive and unattractive images. From these responses, we generated two word clouds to
highlight the most prominent terms.

   (A)                                                   (B)


         Figure 2: Word cloud for terms in the user judgment: (A) : judgments for attractive images, (B):
         judgments for unattractive images.


  Figure 2 presents the most frequent responses associated with both attractive and unattractive images.
These findings are discussed in relation to the themes of ‘appearance’ and ‘health’ (cf. Table (3.B)).

3.3.1. Appearance-based justifications
Figure 4 shows a few examples of user textual justifications. Several participants, including user (U𝑎 ),
expressed the term ‘crispy’ in their assessments of attractive images, mainly referring to appearance.
The word ‘simple’ is frequently used by users, such as user (U𝑏 ), to convey the simplicity of recipe
content. In contrast, ‘mess’ was more commonly associated with judgments of unattractive food images,
indicating their unappealing appearance. Moreover, the repeated use of the term ‘fat’ suggested that
fatty foods were generally perceived as unattractive, as in judgments by users (U𝑐 − 𝑑).


 (U𝑎 ): “looks juicy with nice    (U𝑏 ): “Interesting, slightly    (U𝑐 ): “It looks messy and un-   (U𝑑 ): “Too much carbs/fat”
 crispy bits, which is nice and   unusual, and does look visu-     appealing”
 clear in the picture”            ally appealing with simple
                                  ingredients presented well”
        Figure 3: Examples of images used in the study, associated with users’ textual judgments related to
        the appearance. (U𝑎 ) and (U𝑏 ) are textual justifications for attractive images, while (U𝑐 ) and (U𝑑 ) are
        justifications for unattractive images.


3.3.2. Healthiness-based justifications
Judgments related to health frequently appeared in connection with the food’s appearance, such
as by user (U𝑒 ) in Figure ??. The term ‘restaurant’ was employed in various user judgments, often
associated with presentation and healthiness, as described by the user (U𝑓 ). Conversely, the concept of
unhealthiness was linked to fatty foods and messy representation, as evident in the judgments of users
(U𝑔−ℎ ) in Figure ??.


 (U𝑎 ): “Healthy salad option     (U𝑏 ): “The dish looks very      (U𝑐 ): “It looks a bit mushy     (U𝑑 ): “Chicken is unhealthy
 with balanced nutrients. It’s    nice, like in a restaurant. It   and brown and I don’t like       and gross”
 is also quite colorful”          is colorful and looks very       Turkey”
                                  healthy”
        Figure 4: Examples of images used in the study, associated with users’ textual judgments related to
        the appearance. (U𝑎 ) and (U𝑏 ) are textual justifications for attractive images, while (U𝑐 ) and (U𝑑 ) are
        justifications for unattractive images.


4. Conclusion & Future work
This work has explored different aspects of the relationship between the user and food images. Through
an online user study, we have found that various visual features can predict the attractiveness of a
given image (i.e. colorfulness, brightness, naturalness). This prediction accuracy could be slightly
improved using image features extracted using deep learning techniques (RQ1). In line with earlier work
[11, 3, 18], this suggests that the visual attractiveness of food images can be enhanced by increasing
their colorfulness, brightness, and naturalness, while decreasing other features, such as saturating and
sharpness. Obviously, there may be tradeoffs between these features when altering them.
   Regarding user characteristics, none of the user demographics are related to food image attractive-
ness. In contrast, using online recipe websites and cooking skills are positively associated with the
attractiveness of food images (RQ2). More novel is our contribution on the user justifications, for
which we have found image appearance and perceived healthiness to be important dimensions of visual
attractiveness ratings (RQ3). It seems that attractiveness are related to the expect taste or hedonic food
goals (e.g., ’crispy’), while unattractive images focused on poor presentation and disliked ingredients.
   Our study offers valuable insights into techniques for image attractiveness selection for various goals
and domains. In particular, these techniques can be leveraged to persuade or nudge users towards
specific eating goals, such as health [3, 25]. We believe that leveraging the visual appeal of attractive
images can address this issue. Our future studies will focus on designing image selection pipelines for
the application of food recommender systems tailored to guide people toward healthy food choices
without compromising the benefits of personalization. We aim to analyze and categorize the collected
textual judgment through thematic analysis to build word dictionaries related to image dimensions.
These dictionaries can then be used to train learning models, enabling the evaluation of food image
attractiveness based on user textual inputs.


Acknowledgments
This work was supported by industry partners and the Research Council of Norway with funding to
MediaFutures: Research Centre for Responsible Media Technology and Innovation, through the centers
for Research-based Innovation scheme, project number 309339.

The authors acknowledge the use of ChatGPT [26] for checking and correcting the grammar
of this article. No new content was generated this way; only existing text was checked and, if needed,
corrected.


References
 [1] I. Vermeir, G. Roose, Visual design cues impacting food choice: A review and future research
     agenda, Foods 9 (2020) 1495.
 [2] C. Spence, K. Motoki, O. Petit, Factors influencing the visual deliciousness/eye-appeal of food,
     Food Quality and Preference 102 (2022) 104672.
 [3] A. D. Starke, M. C. Willemsen, C. Trattner, Nudging healthy choices in food search through visual
     attractiveness, Frontiers in Artificial Intelligence 4 (2021) 20.
 [4] R. Cadario, P. Chandon, Which healthy eating nudges work best? a meta-analysis of field
     experiments, Marketing Science 39 (2020) 465–486.
 [5] D. Elsweiler, H. Hauptmann, C. Trattner, Food Recommender Food recommenderSystems, Springer
     US, New York, NY, 2022, pp. 871–925. URL: https://doi.org/10.1007/978-1-0716-2197-4_23. doi:10.
     1007/978-1-0716-2197-4_23.
 [6] D. Elsweiler, C. Trattner, M. Harvey, Exploiting food choice biases for healthier recipe recommen-
     dation, in: Proceedings of the 40th international acm sigir conference on research and development
     in information retrieval, 2017, pp. 575–584.
 [7] C. Trattner, D. Moesslang, D. Elsweiler, On the predictability of the popularity of online recipes,
     EPJ Data Science 7 (2018) 1–39.
 [8] Q. Zhang, D. Elsweiler, C. Trattner, Visual cultural biases in food classification, Foods 9 (2020) 823.
 [9] B. Scheibehenne, L. Miesler, P. M. Todd, Fast and frugal food choices: Uncovering individual
     decision heuristics, Appetite 49 (2007) 578–589.
[10] A. Khosla, A. Das Sarma, R. Hamid, What makes an image popular?, in: Proceedings of the 23rd
     international conference on World wide web, 2014, pp. 867–876.
[11] J. San Pedro, S. Siersdorfer, Ranking and classifying attractiveness of photos in folksonomies, in:
     Proceedings of the 18th international conference on World wide web, 2009, pp. 771–780.
[12] C. Musto, C. Trattner, A. Starke, G. Semeraro, Towards a knowledge-aware food recommender
     system exploiting holistic user models, in: Proceedings of the 28th ACM conference on user
     modeling, adaptation and personalization, ACM, New York, NY, USA, 2020, pp. 333–337.
[13] A. D. Starke, C. Musto, A. Rapp, G. Semeraro, C. Trattner, “tell me why”: using natural language
     justifications in a recipe recommender system to support healthier food choices, User Modeling
     and User-Adapted Interaction (2023) 1–34.
[14] A. El Majjodi, A. D. Starke, M. Elahi, C. Trattner, et al., The interplay between food knowledge,
     nudges, and preference elicitation methods determines the evaluation of a recipe recommender
     system, in: Proceedings of the 10th Joint Workshop on Interfaces and Human Decision Making
     for Recommender Systems (IntRS 2023), 2023, pp. 1–18.
[15] L. R. Flynn, R. E. Goldsmith, A short, reliable measure of subjective knowledge, Journal of business
     research 46 (1999) 57–66.
[16] Z. Pieniak, J. Aertsens, W. Verbeke, Subjective and objective knowledge as determinants of organic
     vegetables consumption, Food quality and preference 21 (2010) 581–588.
[17] N. Frans, Development of cooking skills questionnaire for EFNEP participants in Kansas, Ph.D.
     thesis, Kansas State University, 2017.
[18] Q. Zhang, D. Elsweiler, C. Trattner, Understanding and predicting cross-cultural food preferences
     with online recipe images, Information Processing & Management 60 (2023) 103443.
[19] A. El Majjodi, S. A. Khan, A. D. Starke, M. Elahi, C. Trattner, Examining the visual attractiveness
     of digital recipe images: Material, 2024. URL: https://github.com/ayoubGL/Health-RecSys-2024.
[20] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition,
     arXiv preprint arXiv:1409.1556 (2014).
[21] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of
     the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
[22] A. Vaswani, N. M. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin,
     Attention is All you Need, in: Neural Information Processing Systems, 2017. URL: https://api.
     semanticscholar.org/CorpusID:13756489.
[23] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin,
     J. Clark, et al., Learning transferable visual models from natural language supervision, in:
     International conference on machine learning, PMLR, 2021, pp. 8748–8763.
[24] J.-j. Chen, C.-W. Ngo, T.-S. Chua, Cross-modal recipe retrieval with rich food attributes, in:
     Proceedings of the 25th ACM international conference on Multimedia, 2017, pp. 1771–1779.
[25] L. Yang, C.-K. Hsieh, H. Yang, J. P. Pollak, N. Dell, S. Belongie, C. Cole, D. Estrin, Yum-me:
     a personalized nutrient-based meal recommender system, ACM Transactions on Information
     Systems (TOIS) 36 (2017) 1–31.
[26] OpenAI, Chatgpt, 2024. URL: https://openai.com/chatgpt/, accessed: 2024-09-16.