=Paper= {{Paper |id=None |storemode=property |title=You are What You Eat! Tracking Health Through Recipe Interactions |pdfUrl=https://ceur-ws.org/Vol-1271/Paper6.pdf |volume=Vol-1271 |dblpUrl=https://dblp.org/rec/conf/recsys/SaidB14b }} ==You are What You Eat! Tracking Health Through Recipe Interactions == https://ceur-ws.org/Vol-1271/Paper6.pdf

You are What You Eat!
Tracking Health Through Recipe Interactions

Alan Said Alejandro Bellogín
TU-Delft Universidad Autónoma de Madrid
The Netherlands Spain
alansaid@acm.org alejandro.bellogin@uam.es

ABSTRACT 1. INTRODUCTION
On today’s World Wide Web, social recommender systems have be- Today, Internet users turn to the Web for help with the planning and
come a commodity regardless of application domain. Even tangible selection of many daily tasks; whether what music to listen to (Spo-
items such as food and clothes have become social. Together with tify), what consumer products to purchase (Amazon), what movies
a seemingly endless amount of personalization and recommender to watch (Netflix), or what food to prepare (Allrecipes). Consumers
systems ranging from movies, music, or consumer products, recipe put a considerable amount of trust into systems which are able to
recommender systems are attracting many users looking for inspi- simplify their information needs, no matter the type of information
ration on the next thing to purchase or cook. There is however a (or products) sought for. Often, these online services implement
conceptual difference between recommending consumer goods for persuasion systems telling the users to buy, listen to, watch, or even
leisure and entertainment, and recommending food. What people eat items or products that their peers have interacted with. It should
eat has a direct effect on their health, an aspect commonly over- however be noted that there is a distinct conceptual difference in
looked in the context of recommendation. recommending a piece of information to be consumed online, e.g.
In this work, we present an early analysis of users’ interactions a news article or a song, and a tangible object, e.g. a computer or
with recipes (ratings) on the online social network Allrecipes.com. a car. Among the differences between the types of objects, we find
We compare the interaction patterns of users from locations known aspects such as consumption cost (in terms of money, time, effort),
to have poor health to users from locations known to have good the expected longevity of a product (a music track lasting a few
health in order to identify whether there is an observable difference minutes, a book lasting a week, a car lasting several years), etc.
between the two populations. These aspects need to be accounted for when creating a personal-
Our results point to a statistically significant difference between ized experience, whether for an online consumption case, or for a
the healthy and unhealthy groups, a difference that could poten- real-world product.
tially be used to create health-conscious, personalized, recommen- In turn, when recommending food and recipes, there is an addi-
dation services to aid people in their daily lives. tional dimension of the recommendation that needs to be consid-
ered: the health aspect of what is being recommended to a specific
Categories and Subject Descriptors user. A personalization system which has a (more or less) direct
effect on the user’s daily life and health, such as a recipe recom-
H.3.5 [Information Storage and Retrieval]: Online Information mender, needs to be aware of the potential outcome of the recom-
Services - Commercial Services; H.3.3 [Information Storage and mendation, not only in terms of increased business value for the
Retrieval]: Information Search and Retrieval - Information filter- vendor and the general utility as experienced by the consumer, but
ing; H.1.2 [Models and Principles]: User/Machine Systems - Hu- also of the well-being of the consuming user.
man Factors; K.4.1 [Computers and Society]: Public Policy Is- It is because of the above stated aspect that we, in this paper, fo-
sues - Computer-related Health Issues cus on health aspects involved in personalizing users’ experiences
in a food-related online social network. We do so by taking into
General Terms account the general health in the area where the user lives. By
using data from County Health Rankings & Roadmaps1 in com-
Human Factors; Experimentation; Design
bination with data from the recipe-focused online social network
Allrecipes2 we are able to show that there is a significant differ-
Keywords ence in consumption patterns between users from counties with a
Personalization; Food Recommendation; Health; Human-Data In- high health ranking and users from counties with a low health rank-
teraction; Recommender Systems; Persuasion; Social Web ing. Our motivation is that these differences can be used to identify
users with higher health risks, even in cases where the geographical
location is not known.
The main contribution of our work is to show a significant cor-
relation between recipe usage on an online social network and the
reported health in users’ geographic locations.

Proceedings of the 6th Workshop on Recommender Systems and the Social
Web (RSWeb 2014), collocated with ACM RecSys 2014, 10/06/2014, Foster 1
City, CA, USA. Copyright held by the authors. www.countyhealthrankings.org
2
. www.allrecipes.com
2. RELATED WORK
Over the last decade, a massive body of work on multimedia rec-
ommender systems has been accumulated, e.g. movies [1] mu-
sic [4], online news [9], and practically any other type of consumer
products [2]. Food recommendation on the other hand, which also
has been an online phenomenon for a long time, has only recently
started gaining attraction from information system and personaliza-
tion researchers and practitioners, e.g. improving the food prepa-
ration competence of cooks [14], dinner planning for groups [3],
educating potential cooks on healthy foods [7] or diversifying the
meals served in care facilities [5].
When personalizing the culinary experience, it is important to be
aware of the conceptual difference between recommending a movie
to watch or a song to listen to, compared to recommending a dish
to eat or cook. The movies one watches and songs one listens to
have no direct effect on the health of the subject receiving recom- Figure 1: Map showing the US states where the analyzed counties
mendations. Recommending food on the other hand, as mentioned lie. Blue counties indicate low adult obesity, red counties indicate
in Section 1, means that the recommendation will indeed have an high obesity. Note that two of the counties (Boulder, La Plata) with
effect on the user’s health, either by simply proposing the user to the lowest obesity are in Colorado, thus the figure only shows four
eat something unhealthy directly, or, by attempting to altering a blue states.
user’s (long term) food habits – which might remain even after the
user is no longer using the service. However, there exists only a
limited body of work on food recommendation and personalization The health ranking dataset contains data for more than 3, 400 US
from a health-oriented aspect, e.g. Hsiao and Chang [12] show that counties, including the percentage of obese adults.
by aiding in planning meals it is possible to improve the health of The dataset collected from Allrecipes does not contain the coun-
a system’s users. Some research approaches food recommendation ties where users live in. In order to connect users to counties, we
from the perspective of diet and exercise [8], attempting to under- used a mapping of 42, 000 US cities to 3, 200 US counties5 . This
stand the users’ reasoning around recipes. More recently, Harvey allowed us to link the recipe and health datasets to each other. It
et al. [10, 11] reported on a study attempting to identify the factors should be noted that users of the Allrecipes social network do not
that affect the ratings given to recipes in order to leverage this infor- have to state their hometown, and when they choose to do so, this
mation in a recipe recommender system able to recommend recipes is done in free text. The implication is that it is not possible to auto-
which are not only nutritional, but also well-liked by the users. matically map all users to counties, e.g. some users state made up
In this work, we base our finding on geographical areas with cities, or local slang names (Chicagoland for Chicago, The Big Ap-
good or bad health, inspired by the line of research known as Health ple for New York, etc.), or simply misspell the name of their home-
Geography [13]. Here, Dummer showed that “Geography and health town. Additionally, large cities (e.g. Dallas, TX) may be com-
are intrinsically linked" [6]. With this in mind, we attempt to find posed of several counties, making the mapping of these cities onto
whether it is possible to use concepts from information manage- distinct counties problematic unless additional information is avail-
ment and human-computer interaction to alleviate potential health able or manual mapping is performed. Furthermore, the counties
effects in online recommendation services even when the location in the county health ranking dataset and the city-to-county map-
of the user is not known. ping dataset do not overlap perfectly, as noted above the county
health data contains 3, 400 counties whereas the county mapping
data contains 3, 200. However, with some manual tuning (replac-
ing e.g. Hollywoodland with Hollywood, The Big Apple with New
3. RECIPES & HEALTH DATA York City, etc.) we were able to infer the counties for the majority
To perform our analysis, we scraped the recipe-related social net- of the users.
work Allrecipes.com. In this process, we collected user profiles,
recipes, ingredients, recipe boxes (users collect and rate their recipes
in virtual recipe boxes making them easily accessible at later points 4. MAPPING UNHEALTHY INGREDIENTS
in time), social connections, and demographic information on users TO HEALTH DATA
(location, interests, hobbies, etc.). This data collection3 was per- In order to analyze whether it is indeed possible to use the county
formed during October 2013, and resulted in a dataset containing health ranking data in combination with food-oriented websites,
information on more than 170 thousand users, 54 thousand recipes, e.g. Allrecipes, we focused on a relatively small number of healthy
8, 400 ingredients, and 17 million recipe box assignments (which and unhealthy counties.
we refer to as ratings4 ). As a first step, we identified how often a certain ingredient is
Having collected the data, we used health rankings by county used by users in a certain county. This was accomplished by map-
from County Health Rankings to identify users living in healthy and ping each recipe onto its composing ingredients, and correspond-
unhealthy counties. Our health focus was specifically on obesity, ingly mapping all ratings given by users (per county) on the recipes
i.e. the percentage of adults suffering from obesity in each county. onto the ingredients of the recipes. This process war repeated for
the one hundred and ten most used ingredients in each county. Fol-
3
The scripts used to scrape the data from the Allrecipes website are lowing this, we calculated the percentage of how often an ingredi-
available at github.com/alansaid/RecipeCrawler ent was used in average in the counties with low obesity and high
4
Even though users can rate the recipes they put in their recipe obesity separately. This information allowed us to identify the five
boxes (if they wish), in the scope of this paper we have only ana-
5
lyzed the binary relationships between users and recipes. www.farinspace.com/us-cities-and-state-sql-dump
Table 1: The counties used in the analysis and the data available for each county, the top five (Table 1a) are counties with the lowest percentage
of adults suffering from obesity, the bottom five (Table 1b) are counties with the highest percentage of adults suffering from obesity. Note
that there are many power users with several hundred to several thousand rated recipes in their recipe boxes. Also note that the total number
of recipes has been excluded as the individual recipes are not distinct across rows.

(a) Statistics for counties with low obesity percentage.

State County Adult obesity Users Ratings Recipes
New Mexico Santa Fe 14% 26 3009 2721
Colorado Boulder 15% 99 9938 6614
New York New York 15% 384 32468 14118
California Marin 15% 12 570 537
Colorado La Plata 16% 16 2439 2069
Total 537 48424
(b) Statistics for counties with high obesity percentage.

State County Adult obesity Users Ratings Recipes
Mississippi Lowndes 37% 11 827 783
Kansas Wyandotte 38% 49 6924 5235
South Carolina Berkeley 38% 159 12637 7539
Virginia Portsmouth 39% 18 1512 1400
Michigan Saginaw 40% 33 1315 1224
Total 149 46430

Table 2: The twenty most commonly used ingredients and their popularity as a percentage of how often they appear in counties with high
(↑) and low (↓) obesity. The ingredients are sorted by the percentage of times they appear in recipes stored by cooks in counties with high
obesity. Note, for instance, the difference between usage of olive oil and garlic vs. dairy products (milk, cheddar and cream cheeses) between
the county types.

No. Salt Butter Sugar Eggs Flour Onions Garlic Water Pepper Milk
↑ Obesity 51.04% 33.72% 30.67% 27.25% 26.14% 23.93% 22.79% 21.96% 20.65% 14.96%
1-10
↓ Obesity 55.30% 32.92% 31.01% 26.77% 25.68% 24.86% 27.31% 21.54% 21.42% 13.23%
No. Vanilla Olive Oil Brown Sugar Chicken Cinnamon Parmesan Baking Soda Veg. Oil Cheddar Ch. Cream Ch.
↑ Obesity 14.85% 14.07% 12.54% 10.20% 9.81% 7.96% 7.89% 7.29% 6.81% 6.79%
11-20
↓ Obesity 14.52% 18.04% 12.56% 8.70% 10.00% 8.25% 8.75% 7.41% 5.35% 5.21%

most obese and five least obese counties with available ingredi- high and low risk users independent of their geographical location.
ent data. Due to the mapping procedure and dataset described in Thus ensuring that high/low-risk users can be identified by their
the previous section, the five counties with the lowest percentage online recipe interaction patterns.
of obese adults selected were within the top 15 of the least obese The obtained p-value from the t-test (p < 0.05) confirms that the
counties. Similarly, the counties with the highest percentage of ingredient usage in counties with high obesity is in fact different
obesity were within the top 100 of the most obese counties. The from that of counties with low obesity. The implication of this is
top counties together with statistics for each are shown in Table 1. that high-risk/low-risk users can be identified simply by their recipe
It should be noted that the geographic distribution of the counties interactions in an online social network. This information can in
is not limited to an isolated geographical location within the US, turn be used to personalize a food recommendation system based
instead the counties are spread throughout the country, as shown on the recorded interactions of a user.
in Fig. 1. This should further strengthen the health aspect of the
analysis, while minimizing potential effects of local food trends 6. DISCUSSION
found in isolated geographical locations [13].
In the previous sections, we have described our analysis of a health-
related dataset and an analysis of a real-world recipe-focused online
5. ANALYSIS & RESULTS social network. Our results point to that it is possible to identify
For each group of counties, i.e. with high and low obesity percent- users from high-risk (poor health) areas just from their recipe in-
age, we identified the top 110 most popularly used ingredients in teractions. This suggests that, should a recommendation system be
both types of counties, i.e. the top intersecting ingredients used employed, it can be tailored to not only provide high-quality recipes
by users in both types of counties. Table 2 shows the 20 ingre- to the user, but also take into consideration the potential health as-
dients used most often in counties with high (↑) obesity and the pects of the user. The health effects can be mitigated by either
corresponding percentage in counties with low (↓) obesity. Having filtering out recipes which can be deemed unhealthy, or to create
this information, we performed a statistical significance analysis personalized recipes – by altering the doses of certain ingredients
(t-test) on the vectors containing the percentages of how often the – and still fulfilling the users’ expectations. This needs however
ingredients were used in both type of counties (the same ingredi- be done in such a way as to not lower the usability and quality of
ents appearing in the same places in both vectors). The justification the system, as perceived by the user. A personalization approach
of this is that, if the ingredients were in fact used differently in the of this type would serve as an insurance that the service would not
two types of counties, we should be able to distinguish between be the cause of, or aiding to, any detrimental effects on the users’
health. Given the increasing quality of recommender systems, a [3] S. Berkovsky and J. Freyne. Group-based recipe
system being conscious of the (inferred) health of its users appears recommendations: Analysis of data aggregation strategies. In
as a plausible next step. Proceedings of the Fourth ACM Conference on
We are aware of the limits of our analysis, e.g. only analyzing the Recommender Systems, RecSys ’10, pages 111–118, New
binary connections between a recipe and an ingredient – not taking York, NY, USA, 2010. ACM.
into consideration the amount of the ingredient used. Nevertheless, [4] Ò. Celma. Music Recommendation and Discovery in the
we believe our results to be indicative of what can be attained when Long Tail. PhD thesis, Universitat Pompeu Fabra, Barcelona,
using the ingredient amount as well. This is currently the focus of 2008.
our ongoing work, however, the ambiguous and non-standardized [5] T. De Pessemier, S. Dooms, and L. Martens. A food
unit and ingredient declaration in recipes, e.g. one cucumber, half recommender for patients in a care facility. In Proceedings of
a cup of sugar, one glass of water, two crackers, etc., makes this a the 7th ACM Conference on Recommender Systems, RecSys
non-trivial task. ’13, pages 209–212, New York, NY, USA, 2013. ACM.
It should be noted that the results obtained in our analysis are the [6] T. J. Dummer. Health geography: supporting public health
result of early work, we do however believe that this is a feasible policy and planning. Canadian Medical Association Journal,
approach to proactively care for the users of similar food- or other- 178(9):1177–1180, 2008.
wise health-oriented services. As mentioned in Section 1, there is [7] J. Freyne and S. Berkovsky. Intelligent food planning:
a conceptual difference between recommending an entertainment- Personalized recipe recommendation. In Proceedings of the
focused item (song, movie) compared to domains where the per- 15th International Conference on Intelligent User Interfaces,
sonalization system has a direct effect on the user’s health. IUI ’10, pages 321–324, New York, NY, USA, 2010. ACM.
[8] J. Freyne, S. Berkovsky, and G. Smith. Recipe
7. CONCLUSION & FUTURE WORK recommendation: Accuracy and reasoning. In Proceedings of
In this work, we have analyzed a recipe dataset and combined it the 19th International Conference on User Modeling,
with data reporting health aspects in US counties. We have identi- Adaption, and Personalization, UMAP’11, pages 99–110,
fied counties that suffer from poor health (large percentage of adults Berlin, Heidelberg, 2011. Springer-Verlag.
suffering from obesity) and found that there exist statistically sig- [9] F. Garcin and B. Faltings. Pen recsys: A personalized news
nificant differences in how users from poor health counties interact recommender systems framework. In Proceedings of the
with recipes compared to users from counties with good health (low 2013 International News Recommender Systems Workshop
percentage of adults suffering from obesity). Our work suggests a and Challenge, NRS ’13, pages 3–9, New York, NY, USA,
potential approach to health-oriented recommender systems which 2013. ACM.
takes into account the possible adverse effects on a user, based on [10] M. Harvey, B. Ludwig, and D. Elsweiler. Learning user
demographic information as well as through information on the tastes: a first step to generating healthy meal plans? In
recorded interactions (ratings) with the system. Proceedings of the ECIR Workshop on Searching4Fun,
As for future (and current) work paths, we are currently investi- Searching4Fun ’12’, 2012.
gating whether there are other user-related features that also corre- [11] M. Harvey, B. Ludwig, and D. Elsweiler. You are what you
late to health aspects, e.g. inferring health through stated interests eat: Learning user tastes for rating prediction. In
and hobbies. Similarly, we intend to investigate whether the social Proceedings of the 20th International Symposium on String
ties (follower/followee relationships) between users, a concept that Processing and Information Retrieval, SPIRE, pages
has been proven to be useful in personalization and recommenda- 153–164. Springer, 2013.
tion approaches in other domains, hold similar health-related infor- [12] J.-H. Hsiao and H. Chang. Smartdiet: A personal diet
mation. Additionally, we plan to study whether the nutritional as- consultant for healthy meal planning. In Proceedings of the
pects of ingredients can help in identifying health-oriented aspects 2010 IEEE 23rd International Symposium on
in individual users. Computer-Based Medical Systems, CBMS ’10, pages
421–425, Washington, DC, USA, 2010. IEEE Computer
8. ACKNOWLEDGMENTS Society.
[13] G. Moon. Health geography. In R. Kitchin and N. Thrift,
This work was in part carried out during the tenure of an ERCIM
editors, International Encyclopedia of Human Geography,
“Alain Bensoussan” Fellowship Programme. The research leading
volume 5, pages 35–55. Elsevier, July 2009.
to these results has received funding from the European Union Sev-
enth Framework Programme (FP7/2007-2013) under grant agree- [14] J. Wagner, G. Geleijnse, and A. van Halteren. Guidance and
ment no.246016. support for healthy food preparation in an augmented
The authors would like to thank Arjen P. de Vries and Jacco van kitchen. In Proceedings of the 2011 Workshop on
Ossenbruggen from CWI for feedback during the work resulting in Context-awareness in Retrieval and Recommendation, CaRR
this paper. ’11, pages 47–50, New York, NY, USA, 2011. ACM.

9. REFERENCES
[1] X. Amatriain and J. Basilico. Netflix recommendations:
Beyond the 5 stars (part 1) – the netflix tech blog.
http://techblog.netflix.com/2012/04/
netflix-recommendations-beyond-5-stars.
html (retrieved May 12, 2012), April 2012.
[2] C. Anderson. The Long Tail: Why the Future of Business Is
Selling Less of More. Hyperion, 2006.