=Paper= {{Paper |id=Vol-1953/healthRecSys17_paper_11 |storemode=property |title=Investigating Substitutability of Food Items in Consumption Data |pdfUrl=https://ceur-ws.org/Vol-1953/healthRecSys17_paper_11.pdf |volume=Vol-1953 |authors=Sema Akkoyunlu,Cristina Manfredotti,Antoine Cornuéjols,Nicolas Darcel,Fabien Delaere |dblpUrl=https://dblp.org/rec/conf/recsys/AkkoyunluMCDD17 }} ==Investigating Substitutability of Food Items in Consumption Data== https://ceur-ws.org/Vol-1953/healthRecSys17_paper_11.pdf
 Investigating substitutability of food items in consumption data
                Sema Akkoyunlu                                        Cristina Manfredotti                           Antoine Cornuéjols
       UMR MIA-Paris, AgroParisTech,                            UMR MIA-Paris, AgroParisTech,                   UMR MIA-Paris, AgroParisTech,
                   INRA                                                      INRA                                            INRA
           Université Paris-Saclay                                   Université Paris-Saclay                        Université Paris-Saclay
               Paris, France                                             Paris, France                                   Paris, France
      sema.akkoyunlu@agroparistech.fr                         cristina.manfredotti@agroparistech.             antoine.cornuejols@agroparistech.fr
                                                                                fr

                                              Nicolas Darcel                                     Fabien Delaere
                                  UMR PNCA, AgroParisTech, INRA                            Danone Nutricia Research
                                       Université Paris-Saclay                                 Palaiseau, France
                                             Paris, France                                fabien.delaere@danone.com
                                   nicolas.darcel@agroparistech.fr

ABSTRACT                                                                                   In order to promote healthy and sustainable diet and prevent
Food based dietary guidelines are insufficiently followed by con-                      chronic diseases, dietary guidelines targeted to the general pop-
sumers. One of the principal explanations of this failure is that they                 ulation are produced by public health agencies. However, it has
are too general and do not take into account eating habits. Providing                  been noted that the compliance to the guidelines is usually low
personalized dietary recommendations via nutrition recommender                         although the awareness concerning the food based dietary recom-
system can hence help people improve their eating habits. Under-                       mendations is rather good [9]. Nutrition related knowledge does
standing eating habits is a keystone in order to build a context                       not imply adherence to dietary guidelines. Several causes explain
aware recommender system that delivers personalized dietary rec-                       this phenomenon: cultural and personal preferences, difficulty of
ommendations. As a first step towards this goal, we explore food                       implementing dietary changes, availability of food items [12]. Nu-
relationships on real-world data using the INCA 2 dataset, a French                    tritionists stress the fact that it is crucial to understand consumers’
consumption survey. We particularly focus on extracting food sub-                      behaviours in order to make practical food-based recommendations
stitutions, i.e food items that can replace each other. We consider                    because making changes is challenging [4].
that two food items can be substituted if they are consumed during                         One fair assumption is that people are more likely to follow
similar contexts. We define the context in the nutrition field and we                  recommendations if these are acceptable from their point of view.
introduce a measure of substitutability between food items based                       We hypothesize that the user acceptance is a prerequisite for the
on consumption data that encodes the context.                                          compliance and could be improved by producing user-tailored rec-
                                                                                       ommendations that take into account dietary habits. On the long
CCS CONCEPTS                                                                           term, our objective is to build a nutrition recommender system
                                                                                       taking into account dietary habits in order to encourage people
• Information systems → Information extraction; Recom-
                                                                                       toward healthier alternatives with high compliance.
mender systems;
                                                                                           In food related recommender systems, the recommended items
KEYWORDS                                                                               are recipes [6], [8] or food items themselves [7]. We rather want to
                                                                                       build a food item based recommender system that delivers message
recommender system, substitution, food consumption, nutrition                          such as "instead of eating X, eat Y". In order to deliver relevant
ACM Reference format:                                                                  recommendations, it is important for a recommender systems to
Sema Akkoyunlu, Cristina Manfredotti, Antoine Cornuéjols, Nicolas Darcel,              know substitutability relationships between items [11], [14]. This
and Fabien Delaere. 2017. Investigating substitutability of food items in              is important in food recommender systems as well.
consumption data. In Proceedings of the Second International Workshop on                   Moreover, it has been shown that context-aware recommender
Health Recommender Systems co-located with ACM RecSys 2017, Como, Italy,
                                                                                       systems produce better recommendations than recommender sys-
August 2017 (HealthRecSys’17), 5 pages.
                                                                                       tems that do not take into account the context [2]. In order to
                                                                                       extract meaningful relationships between food items, in our model
1    INTRODUCTION                                                                      we consider contextual information. To the best of our knowledge,
Nutritional quality of diets is proven to be an important factor in                    one study tackled the subject of food substitutability based on
health dysfunctions. The risk of developing modern chronic dis-                        real-world consumption data [1]. However, they do not take into
eases such as cardiovascular diseases, obesity or diabetes is linked                   account contextual information such as the type of meal where
to unhealthy eating habits [13].                                                       substitutability relationships can be highly different.
∗ International Workshop on Health Recommender Systems, August 2017, Como, Italy.
                                                                                           In this paper, we specifically investigate food substitutability.
©2017. Copyright for the individual papers remains with the authors. Copying permit-
                                                                                       To do that, we define the concept of dietary context as the set of
ted for private and academic purposes. This volume is published and copyrighted by     food items a food is consumed with and the concept of food intake
its editors.
HealthRecSys’17, August 2017, Como, Italy                                                                                      Akkoyunlu et al.


context as the setting of food consumption. Our intuition is that          limited by the characteristics of the available data. Instead of in-
two food items are substitutable if they are consumed in similar           vestigating all the dietary contexts of a food item, we decided to
dietary contexts and that substitutability differs according to the        explore collections of meals that differ only by one item. We define
food intake context.                                                       the dietary context of a meal database c as the intersection of a
   The rest of the paper is organized as follows. Section 2 describes      set of meals Sm such that :
our methodology. Section 3 reports the results. Finally in section 4,
                                                                                                len(c) = max (len(x)) − 1                     (1)
we discuss our results and present our future perspectives.                                              x ∈Sm

                                                                           Let us define the substitutable set Sc associated to a dietary con-
2 OUR APPROACH                                                             text c as the set of food items such that the context c plus one item
2.1 Notations and problem statement                                        of Sc can be effectively consumed together. For instance, the substi-
                                                                           tutable set of the dietary context c = {bread, jam, juice} might be
Let X be the set of food items. A meal is a collection of food items
                                                                           Sc = {co f f ee, tea, yoдurt }.
consumed at the same timeframe. For instance, {coffee, bread, jam,
juice} is a meal. The meal database DB is the set of all meals. Let us
                                                                           2.3    Mining substitutable items
denote DBbr eak f ast the database of breakfasts and DBlunch the
database of lunches.                                                       To efficiently retrieve interesting sets of dietary contexts and their
   Our objective is to mine food pair substitutability applied by          substitutable set, in this paper, we propose an approach based on
consumers when they compose their meals. Given a database of               graph mining techniques. Let us denote the meal graph G = (V , E)
meals, we want to extract substitutability relationships based on          where V is the set of nodes representing meals from the database
the way people consume food. No nutritional information is used            and E is the set of edges such that two nodes are connected if there
during this process. Instead, contextual information is used in order      is at most one item that changes between them. A meal should
to extract meaningful substitutability relationships.                      appear at least once in the database in order to appear as a node in
                                                                           the graph. Figure 1 is a simple illustration of a meal network.

2.2    Defining Context
The notion of context is quite complex and difficult to define uni-
versally. In the field of recommender systems, the context is usually
defined according to the field of application of the system.
   In the nutrition field, we define two types of contexts: the dietary
context and the food intake context. The dietary context of a food
item x is the set of food items c with which x is consumed. For
instance, in the meal {coffee, bread, jam, juice}, the dietary context
of {coffee} is {bread, jam, juice}. We think that the dietary context is
fundamental when seeking substitutability of food items because
the way people compose their meals is intrinsically dependent on
the relationships between the items.
   The food intake context is defined as the set of all variables
such as the type of the meal (breakfast, lunch, dinner, snack), the
location (home, workplace, restaurant), the participants (family,                   Figure 1: Example of a simple meal network
friend, coworkers, alone). This corresponds to the notion of context
usually used in context-aware recommender systems [2].                        Designed in this way, the nodes of the substitutable set of a di-
   There are three paradigms for incorporating context in recom-           etary context are adjacent. They form a sub-graph that is completely
mender systems : contextual pre-filtering, contextual post-filtering       connected. Such an object is called a clique in graph mining. More
and contextual modelling [2]. Contextual pre (post)-filtering con-         specifically, the nodes form a maximal clique. A maximal clique
sists in splitting the dataset according to contextual variables before    is a clique to which another node cannot be added. In our setting,
(after) applying algorithms. Contextual modelling consists in incor-       discovering substitutable sets is similar to mining maximal cliques
porating contextual information in the algorithm. In our framework,        in a graph. In this paper we use the algorithm of Bron-Kerbosh [5]
dietary context is used in order to model substitutability whereas         to search for maximal cliques.
the food intake context is used for contextual pre-filtering.                 All discovered maximal cliques are not cliques that are inter-
   Our objective is to investigate substitutability among food items       esting for our study. We want cliques such that the size of the
based on the assumption that two food items are highly substi-             intersection of the nodes is a dietary context as defined above. We
tutable if they are consumed in similar dietary contexts and in the        denote these cliques as substitutable cliques. However, we may
same intake context.                                                       encounter cliques as in Figure 2. In this case, the intersection of
   Investigating all possible dietary contexts of a food item is com-      the nodes is {A} and we cannot derive a substitutable set from this
putationally expensive because the number of possible dietary con-         clique.
text is exponential in the number of food items and the length of             To avoid retrieving uninteresting cliques, we apply Algorithm 1
the dietary context. The number of interesting contexts is actually        that filters out substitutable cliques.
Investigating substitutability of food items in consumption data                                        HealthRecSys’17, August 2017, Como, Italy


                                    ABC                                     context then the score equals 0. The higher |Ax :y | + |Ay:x | is, the
                                                                            higher the association of x and y is and the lower the score is.
                     ABD                         AED
                                                                            3 EXPERIMENTS
         Figure 2: Example of an uninteresting clique                       3.1 The INCA 2 database
                                                                            The French dataset INCA 21 is the result of a survey conducted
Algorithm 1 Find substitutable clique                                       during 2006-2007 about individual food consumption. Individual
                                                                            7-day food diaries are reported for 2624 adults and 1455 children
  function isSubcliqe(clique)
                                                                            over several months taking into account possible seasonality in
     context = getContext(clique)
                                                                            eating habits. A day is composed of three main meals : breakfast,
     lenmax = max(len(x) for x in clique)
                                                                            lunch and dinner. The moments in between are denoted as snacking.
     if lenmax - len(context) = 1 then
                                                                            For the main meals, the location (home, work, school, outdoor) and
         return True
                                                                            the companion (family, friends, coworkers, alone) are registered.
     else
                                                                               The 1280 food entries are organized in 44 groups and 110 sub-
         return False
                                                                            groups of food items. We chose to consider the medium level of
                                                                            hierarchy in order to capture substitution relationships inter-groups
                                                                            and intra-groups.
   For instance, when we apply our algorithm to the example of
                                                                               Only adults are considered in this paper. All meals are gathered
Figure 1, we get that this graph is a maximal clique and a substi-
                                                                            in a meal database DBmeals regardless of the type of meal. The
tutable clique more particularly. The context is {bread, butter} and
                                                                            database can be split according to contextual information in order
the substitutable set associated to this context is {coffee, tea, milk,
                                                                            to get better results [3]. We compare the results of our methodology
jam, nothing}. In this particular case, it is possible to substitute an
                                                                            on three datasets : DBbr eak f astlunch , DBbr eak f ast and DBlunch .
item by nothing because {bread, butter } can be consumed as such.
                                                                            3.2     Results
2.4    Computing a substitutability score
                                                                            Applying our algorithm on DBbr eak f ast yields 2368 contexts. Some
Substitutability is not a binary relationship because there are differ-
                                                                            of these and their substitutable sets are given in Table 1. Our results
ent degrees of substitutability. Moreover, if two items are consumed
                                                                            are coherent. For example, either bread, rusk or viennoiserie can
together, they are less substitutable because they might be associ-
                                                                            be consumed for breakfast with coffee, sugar and water.
ated. Therefore, we need a function to quantify the relationship
of substitutability that incorporates the possibility of associativity.
                                                                                                  Context         Substitutable set
Our hypothesis is that two items are highly substitutable if they
are consumed in similar dietary contexts.                                                                         bread
   We want to compute a substitutability score such as :                             coffee, sugar, water, butter rusk
                                                                                                                  viennoiserie
   (1) Two items are highly substitutable if they are consumed in                                                 yogurt
       similar contexts.                                                                                          sugar
   (2) Two items are less substitutable if they are consumed to-                        tea/infusions, donuts
                                                                                                                  jam/honey
       gether.                                                                                                    nothing
   (3) Substitutability is a symmetrical relationship.
                                                                            Table 1: Results of context and substitutable set retrieval for
   Let us denote, for an item x, the context set C x as the set of          breakfasts
dietary contexts in which x is a substitutable item. If the cardinality
of C x denoted as |C x | is high, then x is substitutable in many dietary      We applied our algorithm to the three datasets. The results are
contexts.                                                                   reported in Table 2. We can see that we can obtain inter-group
   For two items x and y, the condition (1) is described by the             substitutions such as {potatoes ⇒ green beans} but also intra-group
intersection of C x and Cy . If |C x ∩ Cy | is high, then x and y are       substitutions as {bread ⇒ rusk}.
consumed in similar contexts.                                                  The substitutions proposed are consistent with regards to eating
   We denote Ax :y the set of contexts of x where y appears :               habits. Substitutes of drinks are also drinks : the substitutes of coffee
                                                                            are tea, cocoa and chicory. It is also the case for spreadable food
                        Ax :y = {c ∈ C x |y ∈ c}                      (2)   items : the substitutes for butter for breakfast are spreadable items.
The cardinality of Ax :y denotes how y is associated to x.                  No semantic information describing how a food item can be eaten
   Taking into account these considerations, we propose the sub-            is available in the dataset and yet considering the dietary context
stitutability score inspired by the Jaccard index [10]:                     helps us retrieving this kind of information.
                                                                               Substitutions between food items of the same nutritional food
                                      |C x ∩ Cy |
                f (x, y) =                                            (3)   groups are found. For instance, the substitutes for potatoes are pasta
                             |C x ∪ Cy | + |Ax :y | + |Ay:x |               and rice: they all contain starches.
The score equals 1 when x and y appear in exactly the same contexts         1 https://www.data.gouv.fr/fr/datasets/donnees-de-consommations-et-habitudes-
and Ax :y = Ay:x = ∅. If x and y are never consumed in the same             alimentaires-de-letude-inca-2-3/
HealthRecSys’17, August 2017, Como, Italy                                                                                                Akkoyunlu et al.

                               Breakfast and lunch               Breakfast                              Lunch
                      Substitute item                        Substitute item            Substitute item
         Food Item                                  Score                       Score                                                      Score
                      (ordered by score)                     (ordered by score)         (ordered by score)
                      Rusk                          0.2234 Rusk                 0.3716 Fruits                                              0.0497
         Bread        Viennoiserie                  0.1359 Viennoiserie         0.2010 Yogurt                                              0.0490
                      Cakes                         0.0745 Cakes                0.1243 Potatoes                                            0.0468
                      Tea                           0.2799 Tea                  0.4219 Sodas                                               0.065
         Coffee       Cocoa                         0.1729 Chicory              0.2550 Yogurt                                              0.0642
                      Chicory                       0.1486 Cocoa                0.2255 Fruits                                              0.0633
                      Coffee                        0.2799 Coffee               0.4219 Cakes                                               0.0536
         Tea          Cocoa                         0.1721 Chicory              0.1965 Viennoiserie                                        0.0417
                      Chicory                       0.1289 Cocoa                0.1462 Coffee                                              0.0412
                      Chicory                       0.2171 Chicory              0.2211 Cereal bars                                         0.25
         Cocoa        Coffee                        0.1729 Coffee               0.2077 Preprocessed vegetables                             0.0526
                      Tea                           0.1289 Tea                  0.1965 Hamburgers                                          0.0256
                      Margarine                     0.2413 Margarine            0.4030 Margarine                                           0.0602
         Butter       Honey/jam                     0.0924 Chocolate spread 0.1240 Fruits                                                  0.0431
                      Chocolate spread              0.0786 Honey/jam            0.1175 Sauces                                              0.0431
                      Juice                         0.1409 Yogurt               0.1815 Doughnut                                            0.0869
         Milk         Yogurt                        0.1264 Juice                0.1504 Other milk                                          0.0666
                      Sugar                         0.1089 Tap water            0.1361 Milk in powder                                      0.0625
                      Sodas                         0.0814                              Sodas                                              0.0860
         Wine         Beer                          0.0704 /                    /       Tap water                                          0.0755
                      Tap water                     0.0412                              Beer                                               0.0746
                      Sandwich baguette             0.2429                              Sandwiches baguette                                0.2810
         Pizza        Other sandwiches              0.1729 /                    /       Other sandwiches                                   0.2177
                      Meals with pasta or potatoes 0.1513                               Meal with pasta or potatoes                        0.1658
                      Pasta                         0.1111                              Pasta                                              0.1142
         Potatoes     Green beans                   0.0922 /                    /       Green beans                                        0.0941
                      Rice                          0.0602                              Rice                                               0.0616
                            Table 2: Top 3 substitutable items for several items for breakfast and lunch



4   DISCUSSION AND CONCLUSIONS                                          5   ACKNOWLEDGEMENT
We proposed a score of substitutability based on consumption data       This study was funded by Danone Nutricia Research.
that can be used in a recommender system together with other
scores such as a nutritional score that takes into account the nutri-   REFERENCES
tional contribution of the substitution and a user preference score.    [1] Achananuparp, P., and Weber, I. Extracting food substitutes from food diary
The substitutability score is based on the assumption that two items        via distributional similarity. CoRR abs/1607.08807 (2016).
                                                                        [2] Adomavicius, G., and Tuzhilin, A. Context-aware recommender systems. In
are substitutable if they are consumed in similar contexts. Prelimi-        Recommender Systems Handbook, F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor,
nary results on the INCA2 dataset show that this assumption helps           Eds. Springer US, 2011, pp. 217–253.
retrieving substitutability relationships based on consumption data.    [3] Baltrunas, L., and Ricci, F. Context-based splitting of item ratings in collab-
                                                                            orative filtering. In Proceedings of the Third ACM Conference on Recommender
   When we split the dataset according to the contextual variable           Systems (New York, NY, USA, 2009), RecSys ’09, ACM, pp. 245–248.
"type of meal", the substitutes and the scores are different. Coffee    [4] Bier, D. M., Derelian, D., German, J. B., Katz, D. L., Pate, R. R., and Thompson,
can be substituted by tea, chicory and coffee for breakfast whereas         K. M. Improving compliance with dietary recommendations. Nutrition Today 43,
                                                                            5 (sep 2008), 180–187.
for lunch, it can be substituted by sodas, yogurt and fruits. Food      [5] Bron, C., and Kerbosch, J. Algorithm 457: Finding all cliques of an undirected
items are consumed differently according to the type of meal. The           graph. Commun. ACM 16, 9 (Sept. 1973), 575–577.
                                                                        [6] Freyne, J., and Berkovsky, S. Intelligent food planning: personalized recipe
relationship of substitutability is therefore different too.                recommendation. In Proceedings of the 15th International Conference on Intelligent
   Difference of scale in scores is noted according to the type of          User Interfaces, IUI 2010, Hong Kong, China, February 7-10, 2010 (2010), pp. 321–324.
meal. It may be due to the fact that the diversity of food items        [7] Ge, M., Elahi, M., Fernaández-Tobías, I., Ricci, F., and Massimo, D. Using
                                                                            tags and latent factors in a food recommender system. In Proceedings of the 5th
consumed during lunch is higher than during breakfast. A rescaling          International Conference on Digital Health 2015 (New York, NY, USA, 2015), DH
factor based on the diversity of the type of meal can be introduced.        ’15, ACM, pp. 105–112.
As future work we plan to investigate this aspect and implement         [8] Harvey, M., Ludwig, B., and Elsweiler, D. You are what you eat: Learning
                                                                            user tastes for rating prediction. In String Processing and Information Retrieval
the nutritional score and the user preference related score.                - 20th International Symposium, SPIRE 2013, Jerusalem, Israel, October 7-9, 2013,
                                                                            Proceedings (2013), pp. 153–164.
                                                                        [9] Ivens, B. J., and Smith Edge, M. Translating the Dietary Guidelines to Promote
Investigating substitutability of food items in consumption data                           HealthRecSys’17, August 2017, Como, Italy


     Behavior Change: Perspectives from the Food and Nutrition Science Solutions
     Joint Task Force. J Acad Nutr Diet 116, 10 (Oct 2016), 1697–1702.
[10] Jaccard, P. The distribution of the flora in the alpine zone.1. New Phytologist 11,
     2 (1912), 37–50.
[11] McAuley, J. J., Pandey, R., and Leskovec, J. Inferring networks of substitutable
     and complementary products. In Proceedings of the 21th ACM SIGKDD Inter-
     national Conference on Knowledge Discovery and Data Mining, Sydney, NSW,
     Australia, August 10-13, 2015 (2015), pp. 785–794.
[12] Webb, D., and Byrd-Bredbenner, C. Overcoming consumer inertia to dietary
     guidance. Adv Nutr 6, 4 (Jul 2015), 391–396.
[13] World Health Organization. Diet, nutrition and the prevention of chronic
     diseases: report of a joint who/fao expert consultation. Tech. rep., 2003.
[14] Zheng, J., Wu, X., Niu, J., and Bolivar, A. Substitutes or complements: an-
     other step forward in recommendations. In Proceedings 10th ACM Conference on
     Electronic Commerce (EC-2009), Stanford, California, USA, July 6–10, 2009 (2009),
     pp. 139–146.