You are What You Eat! Tracking Health Through Recipe Interactions Alan Said Alejandro Bellogín TU-Delft Universidad Autónoma de Madrid The Netherlands Spain alansaid@acm.org alejandro.bellogin@uam.es ABSTRACT 1. INTRODUCTION On today’s World Wide Web, social recommender systems have be- Today, Internet users turn to the Web for help with the planning and come a commodity regardless of application domain. Even tangible selection of many daily tasks; whether what music to listen to (Spo- items such as food and clothes have become social. Together with tify), what consumer products to purchase (Amazon), what movies a seemingly endless amount of personalization and recommender to watch (Netflix), or what food to prepare (Allrecipes). Consumers systems ranging from movies, music, or consumer products, recipe put a considerable amount of trust into systems which are able to recommender systems are attracting many users looking for inspi- simplify their information needs, no matter the type of information ration on the next thing to purchase or cook. There is however a (or products) sought for. Often, these online services implement conceptual difference between recommending consumer goods for persuasion systems telling the users to buy, listen to, watch, or even leisure and entertainment, and recommending food. What people eat items or products that their peers have interacted with. It should eat has a direct effect on their health, an aspect commonly over- however be noted that there is a distinct conceptual difference in looked in the context of recommendation. recommending a piece of information to be consumed online, e.g. In this work, we present an early analysis of users’ interactions a news article or a song, and a tangible object, e.g. a computer or with recipes (ratings) on the online social network Allrecipes.com. a car. Among the differences between the types of objects, we find We compare the interaction patterns of users from locations known aspects such as consumption cost (in terms of money, time, effort), to have poor health to users from locations known to have good the expected longevity of a product (a music track lasting a few health in order to identify whether there is an observable difference minutes, a book lasting a week, a car lasting several years), etc. between the two populations. These aspects need to be accounted for when creating a personal- Our results point to a statistically significant difference between ized experience, whether for an online consumption case, or for a the healthy and unhealthy groups, a difference that could poten- real-world product. tially be used to create health-conscious, personalized, recommen- In turn, when recommending food and recipes, there is an addi- dation services to aid people in their daily lives. tional dimension of the recommendation that needs to be consid- ered: the health aspect of what is being recommended to a specific Categories and Subject Descriptors user. A personalization system which has a (more or less) direct effect on the user’s daily life and health, such as a recipe recom- H.3.5 [Information Storage and Retrieval]: Online Information mender, needs to be aware of the potential outcome of the recom- Services - Commercial Services; H.3.3 [Information Storage and mendation, not only in terms of increased business value for the Retrieval]: Information Search and Retrieval - Information filter- vendor and the general utility as experienced by the consumer, but ing; H.1.2 [Models and Principles]: User/Machine Systems - Hu- also of the well-being of the consuming user. man Factors; K.4.1 [Computers and Society]: Public Policy Is- It is because of the above stated aspect that we, in this paper, fo- sues - Computer-related Health Issues cus on health aspects involved in personalizing users’ experiences in a food-related online social network. We do so by taking into General Terms account the general health in the area where the user lives. By using data from County Health Rankings & Roadmaps1 in com- Human Factors; Experimentation; Design bination with data from the recipe-focused online social network Allrecipes2 we are able to show that there is a significant differ- Keywords ence in consumption patterns between users from counties with a Personalization; Food Recommendation; Health; Human-Data In- high health ranking and users from counties with a low health rank- teraction; Recommender Systems; Persuasion; Social Web ing. Our motivation is that these differences can be used to identify users with higher health risks, even in cases where the geographical location is not known. The main contribution of our work is to show a significant cor- relation between recipe usage on an online social network and the reported health in users’ geographic locations. Proceedings of the 6th Workshop on Recommender Systems and the Social Web (RSWeb 2014), collocated with ACM RecSys 2014, 10/06/2014, Foster 1 City, CA, USA. Copyright held by the authors. www.countyhealthrankings.org 2 . www.allrecipes.com 2. RELATED WORK Over the last decade, a massive body of work on multimedia rec- ommender systems has been accumulated, e.g. movies [1] mu- sic [4], online news [9], and practically any other type of consumer products [2]. Food recommendation on the other hand, which also has been an online phenomenon for a long time, has only recently started gaining attraction from information system and personaliza- tion researchers and practitioners, e.g. improving the food prepa- ration competence of cooks [14], dinner planning for groups [3], educating potential cooks on healthy foods [7] or diversifying the meals served in care facilities [5]. When personalizing the culinary experience, it is important to be aware of the conceptual difference between recommending a movie to watch or a song to listen to, compared to recommending a dish to eat or cook. The movies one watches and songs one listens to have no direct effect on the health of the subject receiving recom- Figure 1: Map showing the US states where the analyzed counties mendations. Recommending food on the other hand, as mentioned lie. Blue counties indicate low adult obesity, red counties indicate in Section 1, means that the recommendation will indeed have an high obesity. Note that two of the counties (Boulder, La Plata) with effect on the user’s health, either by simply proposing the user to the lowest obesity are in Colorado, thus the figure only shows four eat something unhealthy directly, or, by attempting to altering a blue states. user’s (long term) food habits – which might remain even after the user is no longer using the service. However, there exists only a limited body of work on food recommendation and personalization The health ranking dataset contains data for more than 3, 400 US from a health-oriented aspect, e.g. Hsiao and Chang [12] show that counties, including the percentage of obese adults. by aiding in planning meals it is possible to improve the health of The dataset collected from Allrecipes does not contain the coun- a system’s users. Some research approaches food recommendation ties where users live in. In order to connect users to counties, we from the perspective of diet and exercise [8], attempting to under- used a mapping of 42, 000 US cities to 3, 200 US counties5 . This stand the users’ reasoning around recipes. More recently, Harvey allowed us to link the recipe and health datasets to each other. It et al. [10, 11] reported on a study attempting to identify the factors should be noted that users of the Allrecipes social network do not that affect the ratings given to recipes in order to leverage this infor- have to state their hometown, and when they choose to do so, this mation in a recipe recommender system able to recommend recipes is done in free text. The implication is that it is not possible to auto- which are not only nutritional, but also well-liked by the users. matically map all users to counties, e.g. some users state made up In this work, we base our finding on geographical areas with cities, or local slang names (Chicagoland for Chicago, The Big Ap- good or bad health, inspired by the line of research known as Health ple for New York, etc.), or simply misspell the name of their home- Geography [13]. Here, Dummer showed that “Geography and health town. Additionally, large cities (e.g. Dallas, TX) may be com- are intrinsically linked" [6]. With this in mind, we attempt to find posed of several counties, making the mapping of these cities onto whether it is possible to use concepts from information manage- distinct counties problematic unless additional information is avail- ment and human-computer interaction to alleviate potential health able or manual mapping is performed. Furthermore, the counties effects in online recommendation services even when the location in the county health ranking dataset and the city-to-county map- of the user is not known. ping dataset do not overlap perfectly, as noted above the county health data contains 3, 400 counties whereas the county mapping data contains 3, 200. However, with some manual tuning (replac- ing e.g. Hollywoodland with Hollywood, The Big Apple with New 3. RECIPES & HEALTH DATA York City, etc.) we were able to infer the counties for the majority To perform our analysis, we scraped the recipe-related social net- of the users. work Allrecipes.com. In this process, we collected user profiles, recipes, ingredients, recipe boxes (users collect and rate their recipes in virtual recipe boxes making them easily accessible at later points 4. MAPPING UNHEALTHY INGREDIENTS in time), social connections, and demographic information on users TO HEALTH DATA (location, interests, hobbies, etc.). This data collection3 was per- In order to analyze whether it is indeed possible to use the county formed during October 2013, and resulted in a dataset containing health ranking data in combination with food-oriented websites, information on more than 170 thousand users, 54 thousand recipes, e.g. Allrecipes, we focused on a relatively small number of healthy 8, 400 ingredients, and 17 million recipe box assignments (which and unhealthy counties. we refer to as ratings4 ). As a first step, we identified how often a certain ingredient is Having collected the data, we used health rankings by county used by users in a certain county. This was accomplished by map- from County Health Rankings to identify users living in healthy and ping each recipe onto its composing ingredients, and correspond- unhealthy counties. Our health focus was specifically on obesity, ingly mapping all ratings given by users (per county) on the recipes i.e. the percentage of adults suffering from obesity in each county. onto the ingredients of the recipes. This process war repeated for the one hundred and ten most used ingredients in each county. Fol- 3 The scripts used to scrape the data from the Allrecipes website are lowing this, we calculated the percentage of how often an ingredi- available at github.com/alansaid/RecipeCrawler ent was used in average in the counties with low obesity and high 4 Even though users can rate the recipes they put in their recipe obesity separately. This information allowed us to identify the five boxes (if they wish), in the scope of this paper we have only ana- 5 lyzed the binary relationships between users and recipes. www.farinspace.com/us-cities-and-state-sql-dump Table 1: The counties used in the analysis and the data available for each county, the top five (Table 1a) are counties with the lowest percentage of adults suffering from obesity, the bottom five (Table 1b) are counties with the highest percentage of adults suffering from obesity. Note that there are many power users with several hundred to several thousand rated recipes in their recipe boxes. Also note that the total number of recipes has been excluded as the individual recipes are not distinct across rows. (a) Statistics for counties with low obesity percentage. State County Adult obesity Users Ratings Recipes New Mexico Santa Fe 14% 26 3009 2721 Colorado Boulder 15% 99 9938 6614 New York New York 15% 384 32468 14118 California Marin 15% 12 570 537 Colorado La Plata 16% 16 2439 2069 Total 537 48424 (b) Statistics for counties with high obesity percentage. State County Adult obesity Users Ratings Recipes Mississippi Lowndes 37% 11 827 783 Kansas Wyandotte 38% 49 6924 5235 South Carolina Berkeley 38% 159 12637 7539 Virginia Portsmouth 39% 18 1512 1400 Michigan Saginaw 40% 33 1315 1224 Total 149 46430 Table 2: The twenty most commonly used ingredients and their popularity as a percentage of how often they appear in counties with high (↑) and low (↓) obesity. The ingredients are sorted by the percentage of times they appear in recipes stored by cooks in counties with high obesity. Note, for instance, the difference between usage of olive oil and garlic vs. dairy products (milk, cheddar and cream cheeses) between the county types. No. Salt Butter Sugar Eggs Flour Onions Garlic Water Pepper Milk ↑ Obesity 51.04% 33.72% 30.67% 27.25% 26.14% 23.93% 22.79% 21.96% 20.65% 14.96% 1-10 ↓ Obesity 55.30% 32.92% 31.01% 26.77% 25.68% 24.86% 27.31% 21.54% 21.42% 13.23% No. Vanilla Olive Oil Brown Sugar Chicken Cinnamon Parmesan Baking Soda Veg. Oil Cheddar Ch. Cream Ch. ↑ Obesity 14.85% 14.07% 12.54% 10.20% 9.81% 7.96% 7.89% 7.29% 6.81% 6.79% 11-20 ↓ Obesity 14.52% 18.04% 12.56% 8.70% 10.00% 8.25% 8.75% 7.41% 5.35% 5.21% most obese and five least obese counties with available ingredi- high and low risk users independent of their geographical location. ent data. Due to the mapping procedure and dataset described in Thus ensuring that high/low-risk users can be identified by their the previous section, the five counties with the lowest percentage online recipe interaction patterns. of obese adults selected were within the top 15 of the least obese The obtained p-value from the t-test (p < 0.05) confirms that the counties. Similarly, the counties with the highest percentage of ingredient usage in counties with high obesity is in fact different obesity were within the top 100 of the most obese counties. The from that of counties with low obesity. The implication of this is top counties together with statistics for each are shown in Table 1. that high-risk/low-risk users can be identified simply by their recipe It should be noted that the geographic distribution of the counties interactions in an online social network. This information can in is not limited to an isolated geographical location within the US, turn be used to personalize a food recommendation system based instead the counties are spread throughout the country, as shown on the recorded interactions of a user. in Fig. 1. This should further strengthen the health aspect of the analysis, while minimizing potential effects of local food trends 6. DISCUSSION found in isolated geographical locations [13]. In the previous sections, we have described our analysis of a health- related dataset and an analysis of a real-world recipe-focused online 5. ANALYSIS & RESULTS social network. Our results point to that it is possible to identify For each group of counties, i.e. with high and low obesity percent- users from high-risk (poor health) areas just from their recipe in- age, we identified the top 110 most popularly used ingredients in teractions. This suggests that, should a recommendation system be both types of counties, i.e. the top intersecting ingredients used employed, it can be tailored to not only provide high-quality recipes by users in both types of counties. Table 2 shows the 20 ingre- to the user, but also take into consideration the potential health as- dients used most often in counties with high (↑) obesity and the pects of the user. The health effects can be mitigated by either corresponding percentage in counties with low (↓) obesity. Having filtering out recipes which can be deemed unhealthy, or to create this information, we performed a statistical significance analysis personalized recipes – by altering the doses of certain ingredients (t-test) on the vectors containing the percentages of how often the – and still fulfilling the users’ expectations. This needs however ingredients were used in both type of counties (the same ingredi- be done in such a way as to not lower the usability and quality of ents appearing in the same places in both vectors). The justification the system, as perceived by the user. A personalization approach of this is that, if the ingredients were in fact used differently in the of this type would serve as an insurance that the service would not two types of counties, we should be able to distinguish between be the cause of, or aiding to, any detrimental effects on the users’ health. Given the increasing quality of recommender systems, a [3] S. Berkovsky and J. Freyne. Group-based recipe system being conscious of the (inferred) health of its users appears recommendations: Analysis of data aggregation strategies. In as a plausible next step. Proceedings of the Fourth ACM Conference on We are aware of the limits of our analysis, e.g. only analyzing the Recommender Systems, RecSys ’10, pages 111–118, New binary connections between a recipe and an ingredient – not taking York, NY, USA, 2010. ACM. into consideration the amount of the ingredient used. Nevertheless, [4] Ò. Celma. Music Recommendation and Discovery in the we believe our results to be indicative of what can be attained when Long Tail. PhD thesis, Universitat Pompeu Fabra, Barcelona, using the ingredient amount as well. This is currently the focus of 2008. our ongoing work, however, the ambiguous and non-standardized [5] T. De Pessemier, S. Dooms, and L. Martens. A food unit and ingredient declaration in recipes, e.g. one cucumber, half recommender for patients in a care facility. In Proceedings of a cup of sugar, one glass of water, two crackers, etc., makes this a the 7th ACM Conference on Recommender Systems, RecSys non-trivial task. ’13, pages 209–212, New York, NY, USA, 2013. ACM. It should be noted that the results obtained in our analysis are the [6] T. J. Dummer. Health geography: supporting public health result of early work, we do however believe that this is a feasible policy and planning. Canadian Medical Association Journal, approach to proactively care for the users of similar food- or other- 178(9):1177–1180, 2008. wise health-oriented services. As mentioned in Section 1, there is [7] J. Freyne and S. Berkovsky. Intelligent food planning: a conceptual difference between recommending an entertainment- Personalized recipe recommendation. In Proceedings of the focused item (song, movie) compared to domains where the per- 15th International Conference on Intelligent User Interfaces, sonalization system has a direct effect on the user’s health. IUI ’10, pages 321–324, New York, NY, USA, 2010. ACM. [8] J. Freyne, S. Berkovsky, and G. Smith. Recipe 7. CONCLUSION & FUTURE WORK recommendation: Accuracy and reasoning. In Proceedings of In this work, we have analyzed a recipe dataset and combined it the 19th International Conference on User Modeling, with data reporting health aspects in US counties. We have identi- Adaption, and Personalization, UMAP’11, pages 99–110, fied counties that suffer from poor health (large percentage of adults Berlin, Heidelberg, 2011. Springer-Verlag. suffering from obesity) and found that there exist statistically sig- [9] F. Garcin and B. Faltings. Pen recsys: A personalized news nificant differences in how users from poor health counties interact recommender systems framework. In Proceedings of the with recipes compared to users from counties with good health (low 2013 International News Recommender Systems Workshop percentage of adults suffering from obesity). Our work suggests a and Challenge, NRS ’13, pages 3–9, New York, NY, USA, potential approach to health-oriented recommender systems which 2013. ACM. takes into account the possible adverse effects on a user, based on [10] M. Harvey, B. Ludwig, and D. Elsweiler. Learning user demographic information as well as through information on the tastes: a first step to generating healthy meal plans? In recorded interactions (ratings) with the system. Proceedings of the ECIR Workshop on Searching4Fun, As for future (and current) work paths, we are currently investi- Searching4Fun ’12’, 2012. gating whether there are other user-related features that also corre- [11] M. Harvey, B. Ludwig, and D. Elsweiler. You are what you late to health aspects, e.g. inferring health through stated interests eat: Learning user tastes for rating prediction. In and hobbies. Similarly, we intend to investigate whether the social Proceedings of the 20th International Symposium on String ties (follower/followee relationships) between users, a concept that Processing and Information Retrieval, SPIRE, pages has been proven to be useful in personalization and recommenda- 153–164. Springer, 2013. tion approaches in other domains, hold similar health-related infor- [12] J.-H. Hsiao and H. Chang. Smartdiet: A personal diet mation. Additionally, we plan to study whether the nutritional as- consultant for healthy meal planning. In Proceedings of the pects of ingredients can help in identifying health-oriented aspects 2010 IEEE 23rd International Symposium on in individual users. Computer-Based Medical Systems, CBMS ’10, pages 421–425, Washington, DC, USA, 2010. IEEE Computer 8. ACKNOWLEDGMENTS Society. [13] G. Moon. Health geography. In R. Kitchin and N. Thrift, This work was in part carried out during the tenure of an ERCIM editors, International Encyclopedia of Human Geography, “Alain Bensoussan” Fellowship Programme. The research leading volume 5, pages 35–55. Elsevier, July 2009. to these results has received funding from the European Union Sev- enth Framework Programme (FP7/2007-2013) under grant agree- [14] J. Wagner, G. Geleijnse, and A. van Halteren. Guidance and ment no.246016. support for healthy food preparation in an augmented The authors would like to thank Arjen P. de Vries and Jacco van kitchen. In Proceedings of the 2011 Workshop on Ossenbruggen from CWI for feedback during the work resulting in Context-awareness in Retrieval and Recommendation, CaRR this paper. ’11, pages 47–50, New York, NY, USA, 2011. ACM. 9. REFERENCES [1] X. Amatriain and J. Basilico. Netflix recommendations: Beyond the 5 stars (part 1) – the netflix tech blog. http://techblog.netflix.com/2012/04/ netflix-recommendations-beyond-5-stars. html (retrieved May 12, 2012), April 2012. [2] C. Anderson. The Long Tail: Why the Future of Business Is Selling Less of More. Hyperion, 2006.