Nudging Towards Health in a Conversational Food
Recommender System Using Multi-Modal Interactions and
Nutrition Labels
Giovanni Castiglia1 , Ayoub El Majjodi2 , Federica Calò1 , Yashar Deldjoo1 , Fedelucio Narducci1 ,
Alain Starke2,3 and Christoph Trattner2
1
  Polytechnic University of Bari, Bari, Italy
2
  Department of information science and media studies, University of Bergen, Bergen, Norway
3
  Marketing and Consumer Behaviour Group, Wageningen University & Research, Wageningen, The Netherlands


                                          Abstract
                                          Humans engage with other humans and their surroundings through various modalities, most notably speech, sight, and
                                          touch. In a conversation, all these inputs provide an overview of how another person is feeling. When translating these
                                          modalities to a digital context, most of them are unfortunately lost. The majority of existing conversational recommender
                                          systems (CRSs) rely solely on natural language or basic click-based interactions.
                                              This work is one of the first studies to examine the influence of multi-modal interactions in a conversational food
                                          recommender system. In particular, we examined the effect of three distinct interaction modalities: pure textual, multi-modal
                                          (text plus visuals), and multi-modal supplemented with nutritional labeling. We conducted a user study (𝑁=195) to evaluate
                                          the three interaction modalities in terms of how effectively they supported users in selecting healthier foods. Structural
                                          equation modelling revealed that users engaged more extensively with the multi-modal system that was annotated with
                                          labels, compared to the system with a single modality, and in turn evaluated it as more effective.

                                          Keywords
                                          Personalization, Health, Food recommendation, Digital Nudges, Nutrition labels


1. Introduction and Context                                                                                       tackling risk factors, such as attaining a healthy food
                                                                                                                  intake [8]. While our food decisions are driven by
Conversational recommender systems (CRSs) represent our overall preferences, the food selection process is
a hotly debated area of study in the field of information extremely contextual and influenced by a variety of
seeking [1, 2]. They combine the power of recommen- factors, such as the user’s mood and dietary constraints.
dation algorithms with conversational strategies. Using Moreover, many of the decisions are made spontaneously
multi-turn conversations, CRSs are able to collect users’ and consumers’ judgments are influenced by factors
nuanced and dynamic preferences in more depth, which unrelated to the food content, such as their perception
can enhance recommendation outcomes and user experi- of the food’s visual characteristics [9]. For instance,
ence. CRSs are utilized in a variety of domains, including the packaging of items with nutritional labels can
medical diagnosis [3], e-commerce [4], and entertain- serve to highlight the nutritious nature of the food
ment [5, 6]. Only a few studies have investigated their (cf. [10]). Moreover, people generally prefer food that
merit for food recommendation [7], and in particular for has a more visually appealing presentation, such as
encouraging users to make healthier food decisions.                                                               food that is presented in an attractive way [11]. People
              Over 60% of all deaths are caused by non- are willing to pay extra for food whose ingredients
communicable diseases, which are preventable by are tastefully/attractively organized, and restaurants
                                                                                                                  strive to generate Instagram-friendly photographs by
4th Edition of Knowledge-aware and Conversational Recommender Sys-
                                                                                                                  enhancing the color composition of their plates.
tems (KaRS) Workshop @ RecSys 2022, September 18–23 2023, Seattle,
WA, USA.                                                                                                             To surface effective and healthy food recommenda-
Envelope-Open g.castiglia@studenti.poliba.it (G. Castiglia);                                                      tions it is crucial to understand these underlying decision
ayoub.majjodiu@uib.no (A. E. Majjodi); f.calo8@studenti.poliba.it                                                 factors. Regrettably, the large majority of existing con-
(F. Calò); yashar.deldjoo@poliba.it (Y. Deldjoo);                                                                 versational recommender systems [12, 13] only consider
fedelucio.narducci@poliba.it (F. Narducci); alain.starke@uib.no
                                                                                                                  a single type of interaction, such as natural language or
(A. Starke); christoph.trattner@uib.no (C. Trattner)
GLOBE https://www.christophtrattner.info/ (C. Trattner)                                                           click-based interaction, thereby neglecting a wealth of
Orcid 0000-0002-7478-5811 (A. E. Majjodi); 0000-0002-6767-358X                                                    information in the actual imaging of meals [14]. The goal
(Y. Deldjoo); 0000-0002-9255-3256 (F. Narducci);                                                                  of the present work at hand is to employ a new conver-
0000-0002-9873-8016 (A. Starke); 0000-0002-1193-0508 (C. Trattner) sational model for food recommendation that permits
                     © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License
                     Attribution 4.0 International (CC BY 4.0).                                                   more natural, multi-modal user-system interaction.
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
   To attain this goal, this paper introduces a multi-          presentation time, healthiness of recipes chosen and a
modal conversational food recommender system (MM-               user’s level of choice satisfaction and experienced system
CFRS). It implements different user-system interaction          effectiveness.
modes, along with nutrition labelling in order to assist the
user in making dietary decisions. Our objective is to ex-
amine the effects of three distinct interaction modes: pure     2. System Design
textual, multi-modal (text plus visuals), and multi-modal
                                                                In this section we describe the features of our conversa-
supplemented with nutritional labeling. While multi-
                                                                tional food recommender system, which supports users
modal conversational information seeking (MMCIS) is
                                                                in making healthier choices.1
gaining attention by the research in the RecSys/IR/HCI
                                                                We designed a system-driven conversation in which the
communities [15, 1, 16], only a few practical studies have
                                                                system requires user feedback (response/input) to con-
been published that focus on topics other than food and
                                                                tinue. The main steps of the conversational flow are
health, such as conservational systems on tourism [17]
                                                                shown in Figure 1. Users can interact with the system us-
and fashion [18, 19]. In the field of food recommendation,
                                                                ing both buttons and textual messages2 . The main steps
Elsweiler et al. [20] provide a good frame of reference for
                                                                of the interaction are reported below:
recent advances in the field of food recommender sys-
tems in general. Specifically for conversational systems,             • Food category acquisition: The user was presented
Barko-Sherif et al. [21] investigate the possibility for con-           with a choice of four different food categories
versational preference elicitation in a food recommender                that were considered in this work: Pasta, Salad,
environment, using a Wizard of Oz study design (see also                Dessert, and Snack.
[22]). Using a between-groups approach, they compare
                                                                      • User constraints acquisition: The user was then
spoken and text-input chat interfaces and reported that
                                                                        prompted to indicate any potential dietary con-
such interfaces are useful for users to describe their needs
                                                                        straints. Initially, the system used an interface
and preferences. In other studies, Samagaio et al. [23]
                                                                        with a single checkbox for each of the most preva-
present a RASA-based chatbot that can recognize and cat-
                                                                        lent intolerances and allergies: Lactose, Meat, Al-
egorize user intentions in the conversation aimed to elicit
                                                                        cohol, Seafood, Reflux, Cholesterol, Diabetes. Af-
food preferences for recommendation purposes. Another
                                                                        terwards, the system asked the user to disclose a
study of Samagaio et al. [24] applies more knowledge-
                                                                        list of ingredients she could not consume.
based elements based on word embedding to optimize
                                                                      • Preference elicitation: According to the con-
conversational ingredient retrieval. These studies, how-
                                                                        straints specified by the user, the user was
ever, focus less on aspects pertaining to health, health la-
                                                                        prompted to submit preferences for five of the
belling, or elicitation modalities. In a non-conversational
                                                                        dishes proposed by the system. Each dish was
recommender context, El Majjodi et al. [25] recently in-
                                                                        accompanied with two buttons: “Like” and “Skip”.
dicated that nutritional labels can reduce user’s choice
                                                                        The skip option was provided to encourage users
difficulty in non-conversational context. The primary
                                                                        to inspect an addition dish, which was retrieved
distinction between our work and previous studies is the
                                                                        from the randomly sorted menu. The retrieval
lack of multiple modalities (typically only text is used),
                                                                        was based on a random active learning strategy.
as well as that only a few studies (e.g., [25]) have used
                                                                        This way, users were encouraged to like five
nutrition labelling.
                                                                        dishes they were interested in, after which the
   To summarize, the goal of this study is to compare the
                                                                        user profile was built by the system.
impact of three user-system interaction and explanation
modalities (textual, multi-modal, and multi-modal with                • Processing: The system constructed the user pro-
nutritional labels) on both behavioral aspects (what type               file by analyzing the user’s five preferences from
of recipe is chosen? How healthy is that recipe?) and                   the previous stage. The cosine similarity was
evaluation aspects (how does the user evaluate the sys-                 computed between the user profile and each of
tem or their chosen recipe?). Using a mediation analysis                the available foods in the catalog, to provide a list
(structural equation modelling), we answer the following                of dishes from which recommendations would
research question:                                                      be selected. The algorithm also provided a list
                                                                        of dishes ranked according to their healthiness
     • RQ: To what extent do different interaction modal-               (based on their FSA score; see Section 3).
       ities affect a user’s recipe choices and evaluation
       in a conversational food recommendation sce-
       nario?                                                   1
                                                                  Code and recipe data used for implementing the chatbot are avail-
                                                                  able at https://github.com/giocast/MMCFRS
   To address this question, we consider different dimen-       2
                                                                  A video demo of the three versions of our system is available at
sions of analysis. This includes system interaction length,       https://tinyurl.com/mtzxr2sw
Figure 1: Our conversational recommender system flow.


       For each food category we built a matrix con-          Table 1
       taining the TF-IDF representation (dish vs. ingre-     Differences between three implementations of the system.
       dient) of dishes in the catalog. The higher the
                                                                        Interaction Mode                 I      E
       TF-IDF score, the greater the ingredient’s signifi-
                                                                           Pure text (T)                 T      T
       cance to this dish (as opposed to other dishes).
                                                                        Multi-modal (MM)                MM      T
     • Recommendation and explanation: The system                Multi-modal with labels (MM-Label)     MM     MM
       provided two personalized recommendations,
       based on the user’s preferences. The system con-
       strained the retrieval to ensure that the two op-    manner by displaying the name and image of each dish
       tions differed in terms of healthiness, so that one  throughout the dialogue. However, the supplied explana-
       option was healthier than the other. Thus, the       tion remains textual. For the first dish, the explanation
       algorithm provided a description of the suggested    can be like ”I recommend these dish because I know
       dishes. Specifically, it explained why the second    that you have diet constraints due to: meat, zucchini.
       dish was healthier than the first and why the ad-    The first dish I proposed contains ingredients that you
       vice was made. The user would then be prompted       might like: carrot, lemon, tuna, olive oil”. For the sec-
       to select one or request a new recommendation.       ond recommendation, the explanation further provides
       The two recommended dishes were chosen using         information about macro nutrients quantities of the two
       the following strategy: The first dish would be      recommended dishes and can be in the form of ”The
       the most similar to the user profile, while the sec- second dish I proposed has less calories (54 Kcal) than
       ond dish (the healthier alternative) was selected    the first one (123 Kcal) and has less fats than the first
       from a list of most similar dishes ranked on their   one. The third version MM-Label (MM + MM) likewise
       FSA scores, selecting the healthiest one (i.e. with  employs a multi-modal interaction approach, but it also
       the lowest FSA score).                               makes use of nutritional explanations in the form of a
                                                            front-of-package nutrition label with FSA’s Multiple Traf-
   Three different interaction modes were implemented
                                                            fic Lights (MTL) [25]. MTL nutrition labels depicted the
by modifying the values associated with the two manipu-
                                                            intake adequacy of a dish in terms of energy and nutri-
lated variables: interaction 𝐼 and explanation 𝐸, according
                                                            tional content, along five dimensions: energy (kcal), fat,
to Table 1.
                                                            saturates, sugars, and salt. This adequacy, per serving
   In the Pure text version (T + T), the system communi-
                                                            and per 100g, was depicted using the colors green, yellow
cates with the user solely through text, displaying sim-
                                                            and red, where green indicated a dish to adhere to the
ply the dish titles and offering textual explanations of
                                                            nutritional intake guideline, while red indicated that the
the food recommendations. In the Multi-modal version
                                                            content was unacceptable. These labels were generated
(MM + T), the system engages the user in a multi-modal
Figure 2: The three implementations of the system. Some details displayed on the interface, such as the chatbot’s and authors’
names are anonymized and will be added after peer review.


for each dish by following the directives of Food Standard      the recommendations, we provide the user with an expla-
Agency and UK department of health [26].                        nation that helps her comprehend the health benefits of
   Figure 2 depicts a snapshot of the chatbot prototype,        the second alternative above the first, which is the dish
visualizing the different interaction phases.                   that best matches her preferences. This is accomplished
   In the Textual (T) version, the user received recom-         either by text (T and MM variants) or a multiple traffic
mendations identified by only the names of the dishes           light nutritional label (MM-Label).
(e.g., Cupcake Princess’ Vanilla Cupcakes, Floating Island         The user can accept one of the two dishes proposed or
II). The recommendations were followed by textual ex-           can ask for another recommendation.
planations, based on the ingredients in the dish that the
user likes. A comparative analysis of the nutritional facts
(e.g., ‘less sugars’) would also be provided. In the Multi-     3. Experimental Evaluation
modal (MM) version, the system additionally provided
                                                                To evaluate the extent to which different versions of the
images of the recommended dishes. The explanation was
                                                                chatbot affected users’ evaluations and decisions, we re-
similar to the one presented in the T version. Finally,
                                                                cruited 195 participants from Amazon MTurk to use our
the Multi-modal with labels (MM-Label) version provided
                                                                system. Participants had to have a hit rate of 95% at least
nutritional labels that were annotated to the depicted
                                                                and were compensated with 2 dollars. On average, user
images (e.g., Sugar 2.3g, Fat 10.7g, etc.) presented with
                                                                required around 15 minutes to complete the study.3 Users
red, yellow, and/or green colors according to the FSA
                                                                3
score. As stated previously, following the presentation of          The research conformed to the ethical standards of the Norwegian
                                                                    Centre for Research Data (NSD). The collected data is available in
Table 2
Questionnaire items used in the confirmatory factor analysis. Alpha denotes Cronbach’s Alpha, AVE denotes the Average
Variance Explained, indicating construct validity if AVE > 0.5. Items in gray and without loading were omitted from analysis.
Choice Satisfaction did not form a sensible aspect, because of a lack of construct validity.
    Aspect                     Item                                                                            Loading
                               I think, I would enjoy eating the dish I have chosen in the end
    Choice Satisfaction        I would recommend the dish I’ve chosen in the end to others
                               My chosen dish could become my favorite
                               It was easy to make my final choice on the dish                                   0.737
                               I interacted a lot with the system before getting the dish of my choice
    System Effectiveness       The explanation influenced my final choice of dish
                               I think, that I would use this system frequently
    Alpha = 0.740              I found the system easy to use and understand                                     0.724
    AVE = 0.534                I felt very confident using the system                                            0.661
                               I would imagine that most people would learn to use this system very quickly      0.722


performed the processes outlined in Section 2, interact-          tion duration was significantly longer (𝑝 < 0.05) than in
ing with our chatbot for preference elicitation, evaluating       the text-based condition . This indicated that the usage
recipe recommendations, selecting one recipe, and evalu-          of nutrition labels affected conversation time, on top of
ating the experience. A user’s experience was evaluated           the other modalities.
through choice satisfaction and system effectiveness, us-            The duration of the conservation affected, in turn, the
ing questionnaire items that were evaluated on 5-point            evaluation of the user. Inferred from our confirmatory
Likert scales.                                                    factor analysis (cf. Table 2), users who interacted with
   Chosen recipes were evaluated according to their               the chatbot for longer periods of time indicated greater
healthiness. This was evaluated using the FSA score [27].         levels of system effectiveness (𝑝 < 0.01). This indicated
Each recipe was scored between 4 and 12, where 4 indi-            that an extended engagement did not frustrate users. In-
cated that all four nutrients (sugar, fat, saturated fat, salt)   stead, it indicated that they were enthusiastic about using
adhered to nutritional guidelines per 100g [9, 28], while         the system. Figure 3 also shows that the healthiness of
12 would indicate that a recipe was unhealthy because             chosen recipes was not significantly related to any of the
of all nutritional contents being too high.                       other aspects or factors. Note that the MM-Label condi-
   The responses to the evaluation questionnaire item             tion led the healthiest recipe choices, but the differences
were submitted to a confirmatory factor analysis (CFA;            with the other conditions were not significant.
see Table 2). Unfortunately, we could not infer a reli-
able construct for choice satisfaction, as the variance
explained by the questionnaire items was too low, while           4. Conclusion and Future Work
Cronbach’s Alpha was only acceptable (0.60). Other items
                                                                  We have presented a novel chatbot-like recommender
were dropped from the system effectiveness aspect be-
                                                                  system that introduces multi-modality in interaction with
cause of low factor loadings.
                                                                  user, presentation of results and explanation of the rec-
   We organized the different factors (e.g., conversation
                                                                  ommendations with nutrition labels in a conversational
time, condition factors) and aspects (i.e., system effective-
                                                                  scenario. We have designed and analyzed the impact
ness) into a path model using Structural Equation Mod-
                                                                  of three distinct version of our chatbot: pure textual,
elling. Figure 3 depicts the resulting model, which had de-
                                                                  multi-modal (use of text and images), and multi-modal
cent fit statistics: 𝜒 2 (17) = 28.064, 𝑝 < 0.05, 𝐶𝐹 𝐼 = 0.969,
                                                                  supplemented with nutritional labels.
𝑇 𝐿𝐼 = 0.954, 𝑅𝑀𝑆𝐸𝐴 = 0.058, 90% − 𝐶𝐼: [0.009, 0.095].
                                                                     Our experimental evaluation reveals that our chatbot
The relevant AVEs of the aspects was sufficiently high to
                                                                  is the most effective when accompanied by explanatory
form a path model [29].
                                                                  labels. This is indicated by the length of conversation, as
   Our analysis revealed that the MM-Label condition
                                                                  well as by the user’s evaluation of the system effective-
with nutrition labels (MM-label) stood out in terms of
                                                                  ness.
how long users interacted with our chatbot. Figure 3
                                                                     Limitations to this study could be viewed from dif-
illustrates this, while the use of multi-modal approaches
                                                                  ferent viewpoints. In terms of analysis, we have been
alone had no effect on the interaction or evaluation fac-
                                                                  unable to infer the choice satisfaction evaluation aspect.
tors considered. For MM-Label, our mediation analysis
                                                                  Other research have demonstrated that decision satisfac-
suggested that in the MM-Label condition, the conversa-
                                                                  tion is a good predictor of post-interaction engagement
the project’s GitHub repository.                                  with selected item, such as for household energy con-
Figure 3: Structural Equation Model (SEM). Numbers on the arrows represent the 𝛽-coefficients, standard errors are denoted
between brackets. Effects between the subjective constructs are standardized and can be considered as correlations, other
effects show regression coefficients. Aspects are grouped by color: Objective system aspects are purple, behavioral indicators
are blue (note: the FSA score represents recipe unhealthiness) and experience aspects are orange. The thinner arrows are
non-significant relations, in addition: ∗∗∗ 𝑝 < 0.001, ∗∗ 𝑝 < 0.01, ∗ 𝑝 < 0.05.


servation [30]. Moreover, rather than relying solely on                              Adapt. Interact. 30 (2020) 251–284. URL: https://
system-driven interaction, it might be intriguing and nat-                           doi.org/10.1007/s11257-019-09250-7. doi:1 0 . 1 0 0 7 /
ural to investigate user-driven scenarios in which users                             s11257- 019- 09250- 7.
might query the system with an image and textual query.                          [6] A. Iovine, F. Narducci, G. Semeraro, Conversa-
The food categories considered in this work (pasta, salad,                           tional recommender systems and natural language:
dessert, snack) could additionally be expanded to include                            : A study through the converse framework, De-
more meal categories and their combinations, such as                                 cis. Support Syst. 131 (2020) 113250. URL: https:
to create a complete meat (first dish, second dish and                               //doi.org/10.1016/j.dss.2020.113250. doi:1 0 . 1 0 1 6 / j .
vegetables). On top of that, the distinctions between var-                           dss.2020.113250.
ious label modalities are an additional intriguing topic                         [7] C. Trattner, D. Elsweiler, Food recommendations,
we wish to investigate more in-depth [31].                                           in: Collaborative recommendations: Algorithms,
                                                                                     practical challenges and applications, World Scien-
                                                                                     tific, 2019, pp. 653–685.
References                                                                       [8] R. Y. Toledo, A. A. Alzahrani, L. Martinez, A food
                                                                                     recommender system considering nutritional infor-
 [1] H. Zamani, J. R. Trippas, J. Dalton, F. Radlinski,
                                                                                     mation and user preferences, IEEE Access 7 (2019)
     Conversational information seeking, arXiv preprint
                                                                                     96695–96711.
     arXiv:2201.08808 (2022).
                                                                                 [9] A. D. Starke, M. C. Willemsen, C. Trattner, Nudg-
 [2] D. Jannach, A. Manzoor, W. Cai, L. Chen, A survey
                                                                                     ing healthy choices in food search through visual
     on conversational recommender systems, ACM
                                                                                     attractiveness, Frontiers in Artificial Intelligence 4
     Computing Surveys 54 (2022) 1–36. doi:1 0 . 1 1 4 5 /
                                                                                     (2021) 621743.
     3453154.
                                                                                [10] E. J. Van Loo, C. Grebitus, J. Roosen, Explaining
 [3] P. Cordero, M. Enciso, D. López, A. Mora, A conver-
                                                                                     attention and choice for origin labeled cheese by
     sational recommender system for diagnosis using
                                                                                     means of consumer ethnocentrism, Food Quality
     fuzzy rules, Expert Systems with Applications 154
                                                                                     and Preference 78 (2019) 103716.
     (2020) 113449. doi:1 0 . 1 0 1 6 / j . e s w a . 2 0 2 0 . 1 1 3 4 4 9 .
                                                                                [11] Y. Peng, J. B. Jemott III, Feast for the eyes: Effects
 [4] D. Griol, J. Milina, From voicexml to multimodal
                                                                                     of food perceptions and computer vision features
     mobile apps: development of practical conversa-
                                                                                     on food photo popularity., International Journal of
     tional interfaces, ADCAIJ Adv. Distrib. Comput.
                                                                                     Communication (19328036) 12 (2018).
     Artif. Intell. J. 5 (2016) 43.
                                                                                [12] C. Zhou, Y. Jin, K. Zhang, J. Yuan, S. Li, X. Wang,
 [5] F. Narducci, P. Basile, M. de Gemmis, P. Lops, G. Se-
                                                                                     Musicrobot: Towards conversational context-
     meraro, An investigation on the user interac-
                                                                                     aware music recommender system, in: Interna-
     tion modes of conversational recommender sys-
                                                                                     tional Conference on Database Systems for Ad-
     tems for the music domain, User Model. User
     vanced Applications, Springer, 2018, pp. 817–820.              ingredient retrieval, in: 3rd Conference on Lan-
[13] J. Schaffer, T. Hollerer, J. O’Donovan, Hypothetical           guage, Data and Knowledge (LDK 2021), Schloss
     recommendation: A study of interactive profile ma-             Dagstuhl-Leibniz-Zentrum für Informatik, 2021.
     nipulation behavior for recommender systems, in:          [25] A. El Majjodi, A. D. Starke, C. Trattner, Nudging
     The Twenty-Eighth International Flairs Conference,             towards health? examining the merits of nutrition
     2015, pp. 507–512.                                             labels and personalization in a recipe recommender
[14] Y. Deldjoo, M. Schedl, P. Cremonesi, G. Pasi, Rec-             system, in: Proceedings of the 30th ACM Confer-
     ommender systems leveraging multimedia content,                ence on User Modeling, Adaptation and Personal-
     ACM Computing Surveys (CSUR) 53 (2020) 1–38.                   ization, 2022, pp. 48–56.
[15] Y. Deldjoo, J. R. Trippas, H. Zamani, Towards multi-      [26] Department of Health and Social Care UK, Front
     modal conversational information seeking, in: Pro-             of Pack nutrition labelling guidance, 2016. URL:
     ceedings of the 44th International ACM SIGIR con-              https://www.gov.uk/government/publications/
     ference on research and development in Informa-                front-of-pack-nutrition-labelling-guidance.
     tion Retrieval, 2021, pp. 1577–1587.                      [27] D. of Health UK, F. S. Agency,                 Guide
[16] R. G. Sousa, P. M. Ferreira, P. M. Costa, P. Azevedo,          to creating a front of pack (fop) nutri-
     J. P. Costeira, C. Santiago, J. Magalhaes, D. Semedo,          tion label for pre-packed products sold
     R. Ferreira, A. I. Rudnicky, et al., ifetch: Multimodal        through retail outlets (2016). URL: https:
     conversational agents for the online fashion market-           //assets.publishing.service.gov.uk/government/
     place, in: Proceedings of the 2nd ACM Multimedia               uploads/system/uploads/attachment_data/file/
     Workshop on Multimodal Conversational AI, 2021,                566251/FoP_Nutrition_labelling_UK_guidance.pdf.
     pp. 25–26.                                                [28] C. Trattner, D. Elsweiler, Investigating the healthi-
[17] L. Liao, L. H. Long, Z. Zhang, M. Huang, T.-S. Chua,           ness of internet-sourced recipes: implications for
     Mmconv: an environment for multimodal conversa-                meal planning and recommender systems, in: Pro-
     tional search across multiple domains, in: Proceed-            ceedings of the 26th international conference on
     ings of the 44th International ACM SIGIR Confer-               world wide web, ACM, New York, NY, USA, 2017,
     ence on Research and Development in Information                pp. 489–498.
     Retrieval, 2021, pp. 675–684.                             [29] B. P. Knijnenburg, M. C. Willemsen, Evaluating
[18] S. Moon, S. Kottur, P. A. Crook, A. De, S. Pod-                recommender systems with user experiments, in:
     dar, T. Levin, D. Whitney, D. Difranco, A. Beirami,            Recommender systems handbook, Springer, 2015,
     E. Cho, et al., Situated and interactive multimodal            pp. 309–352.
     conversations, arXiv preprint arXiv:2006.01460            [30] A. Starke, M. Willemsen, C. Snijders, Effective user
     (2020).                                                        interface designs to increase energy-efficient behav-
[19] Y. Yuan, W. Lam, Conversational fashion image                  ior in a rasch-based energy recommender system,
     retrieval via multiturn natural language feedback,             in: Proceedings of the eleventh ACM conference
     in: Proceedings of the 44th International ACM SI-              on recommender systems, 2017, pp. 65–73.
     GIR Conference on Research and Development in             [31] Y. Deldjoo, M. Schedl, B. Hidasi, Y. Wei, X. He, Mul-
     Information Retrieval, 2021, pp. 839–848.                      timedia recommender systems: Algorithms and
[20] D. Elsweiler, H. Hauptmann, C. Trattner, Food                  challenges, in: Recommender systems handbook,
     recommender systems, in: Recommender Systems                   Springer, 2022, pp. 973–1014.
     Handbook, Springer, 2022, pp. 871–925.
[21] S. Barko-Sherif, D. Elsweiler, M. Harvey, Conversa-
     tional agents for recipe recommendation, in: Pro-
     ceedings of the 2020 Conference on Human Infor-
     mation Interaction and Retrieval, 2020, pp. 73–82.
[22] A. Steinfeld, O. C. Jenkins, B. Scassellati, The oz
     of wizard: simulating the human for interaction
     research, in: Proceedings of the 4th ACM/IEEE in-
     ternational conference on Human robot interaction,
     2009, pp. 101–108.
[23] Á. Mendes Samagaio, H. Lopes Cardoso, D. Ribeiro,
     A chatbot for recipe recommendation and prefer-
     ence modeling, in: EPIA Conference on Artificial
     Intelligence, Springer, 2021, pp. 389–402.
[24] Á. M. Samagaio, H. Lopes Cardoso, D. Ribeiro, En-
     riching word embeddings with food knowledge for