<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Copenhagen, Denmark, September</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Personalized, Health-Aware Recipe Recommendation: An Ensemble Topic Modeling Based Approach</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Barry Smyth</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>barry.smyth@ucd.ie</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Mansura A. Khan</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <volume>20</volume>
      <issue>2019</issue>
      <fpage>4</fpage>
      <lpage>10</lpage>
      <abstract>
        <p>Food choices are personal and complex and have a significant impact on our long-term health and quality of life. By helping users to make informed and satisfying decisions, Recommender Systems (RS) have the potential to support users in making healthier food choices. Intelligent users-modeling is a key challenge in achieving this potential. This paper investigates Ensemble Topic Modelling (EnsT M) based Feature Identification techniques for eficient user-modeling and recipe recommendation. It builds on findings in EnsT M to propose a reduced data representation format and a smart user-modeling strategy that makes capturing user-preference fast, eficient and interactive. This approach enables personalization, even in a cold-start scenario. We compared three EnsT M based variations through a user study with 48 participants, using a large-scale, real-world corpus of 230,876 recipes, and compare against a conventional Content Based (CB) approach. EnsT M based recommenders performed significantly better than the CB approach. Besides acknowledging multi-domain contents such as taste, demographics and cost, our proposed approach also considers user's nutritional preference and assists them finding recipes under diverse nutritional categories. Furthermore, it provides excellent coverage and enables implicit understanding of user's food practices. Subsequent analysis also exposed correlation between certain features and healthier lifestyle.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>CCS CONCEPTS</title>
      <p>• Information systems → Recommender systems.
HealthRecSys’19, September 20, 2019, Copenhagen, Denmark
© 2019 Copyright for the individual papers remains with the authors. Use
permitted under Creative Commons License Attribution 4.0 International
(CC BY 4.0). This volume is published and copyrighted by its editors</p>
    </sec>
    <sec id="sec-2">
      <title>1 INTRODUCTION</title>
      <p>
        Food has a direct, complex and multifaceted relationship with
our lifestyle and personality. People have explicit preferences
regarding activities around food, such as cooking, plating,
grocery and eating-out. Studies showed people are becoming
more mindful towards healthier lifestyles and the fact that
healthy eating/cooking impacts psychosocial and physical
well-being [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] However, finding food-ideas/recipes that
acknowledge one’s circumstance and preference remains a
challenge for many people. Food Recommender Systems (FRS)
have the potential to assist users in navigating through the
overwhelming amount of online resources on food/recipes
and guide them towards healthier choices.
      </p>
      <p>
        Recommending food is challenging as our choices are
deifned by many cross-domain factors including demographic
and contextual factors, health awareness, social and
ethical factors, together with practical considerations such as
cost, cooking-time and methods, and the availability of
ingredients. In order to develop efective FRS, we must design
user-models that capture user data across these diverse
factors. Approaches are also required that enable Recommender
Systems (RS) to fit user’s preference data on a massive
information space around food. As Teng et al. note, there are
millions of food-items/recipes as diferent ingredients are grown
at diferent geographical locations and recipes originate from
diferent cultural groups worldwide [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ]. In this context
coverage and diversity are important constraints, where
coverage corresponds to the percentage of items for which a
RS is able to generate a prediction [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Higher coverage
enables the RS to implement varying diversity approaches and
draw from more options. Taken together, these challenges
necessitate FRS that can (1) identify the attributes/features
which are significant for human food-choices, (2) capture
user’s preference on the identified features, (3) filter a large
information-space, (4) generate recommendations eficiently
and finally (5) guide users towards healthier choices.
      </p>
      <p>
        We explored Ensemble Topic Modelling (EnsT M) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
accompanied by a series of custom text-prepossessing to extract
significant food features. The aim was to identify
representative or agent contents of diverse domains connected to
human food choice. In our study 288 features and their
corresponding significance scores were extracted from a corpus
of 230,876 recipes. Which later worked as the basis for our
intelligent user-modeling approach. As summarized in Table
1, the identified feature set is rich in contents representing
multiple domains. The paper describes a foreshortened data
representation format based on the extracted features which
aims to reduce computational complexity of food
recommendation.
      </p>
      <p>We implemented three distinct EnsT M based personalized
FRS: a Food Feature based Recommender (FFbR), a Weighted
Food Feature based Recommender (WFFbR), and a Food
Feature based Collaborative Filtering (FFbCF). To evaluate these
approaches we conducted a user study comparing EnsT M
based recommenders to a conventional Content Based (CB)
approach. Results show that all EnsT M based approaches
significantly outperformed CB approach. In contrast to prior
work, the EnsT M approach also efectively supported
recommendations across diverse social and cultural groups,
even in a first recommendation scenario. Finally, the strong
adaptation of the concept of dislike across all three
methods proved efective in implicitly identifying user’s food
practice (e.g. vegetarian, halal) and filtering accordingly.
Further exploratory analysis exposed previously unknown
pattern in user’s interactions towards certain features. That is,
some features are more popular than others among healthier
user-groups. The existing correlation between healthier
usergroups and certain food features argue for further research
on feature based FRS with healthiness cues.
2</p>
    </sec>
    <sec id="sec-3">
      <title>RELATED WORK</title>
      <p>
        Previous research has produced seminal contributions
towards FRS, aimed at ensuring user-preference, diversity and
nutritional development in diet. Freyne et al. [
        <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
        ]
describe an ingredient-based approach where they inferred
user’s preference on a new recipe as the cumulative sum of
his/her preference for each ingredient in that recipe. This
formed the basis of their novel user-based K-NN
Collaborative Filtering (CF) approach [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], which has been influential
and was applied by others including [
        <xref ref-type="bibr" rid="ref19 ref26">19, 26</xref>
        ]. Subsequently,
more advanced methods emerged for tackling diferent
challenges such as, Teng et al. [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ] used item-centric CF and
applied an ingredient-network to identify similar recipes,
where the ingredient-network was generated based on
cooccurrence of ingredients within recipes and menus. Kuo
et al. [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] proposed a weighted graph based menu planning
approach where ingredients were grouped into subsets and
each subset was considered as contents. However, while
these approaches are very interesting, they focus purely on
ingredients.
      </p>
      <p>
        Ge et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] proposed a method that leverages tags and
latent factors to recommend recipes. Pinxteren et al. adopted
a diferent approach [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ] where, first they added custom
annotations to each recipe in their corpus, then asked users to
rate individual recipes and finally recommended recipes that
share annotations with those rated positively by the user.
This method was successful in addressing more food-choice
factors, but the annotation set was relatively small and
specific to their recipe corpus. As they mentioned, this limited
their FRS from automatically adopting to new user groups.
Further notable work includes: Gu et al. [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] case-based
FRS based on user’s previous consumption cases; Sobeck
et al. [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] hybrid FRS incorporating fuzzy inference with
stereotype demographic filtering; and Bianca et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] hybrid
model incorporating meta-heuristic and genetic algorithms.
Elsweile et al.[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and Ueta et al. [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ] discussed automatic
meal planning approach to support balanced nutrition. While
efective in constrained contexts, each of these approaches
depends on suficient pre-existing user preference data. They
are thus susceptible to failure in cold-start scenarios [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
Trattner et al. [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ] proposing a novel method to recommend
recipes to people in a cold-start scenario.
      </p>
      <p>
        There was also a significant number of interesting research
work producing domain specific knowledge to facilitate
future research interests.[
        <xref ref-type="bibr" rid="ref29">29</xref>
        ] is a seminal work form Trattner
et al. on summarizing, "to which extent current
recommendation algorithms can adopt healthy recipes
recommendation?" and "what resources are out there?". [
        <xref ref-type="bibr" rid="ref24 ref25 ref30 ref32">24, 25, 30, 32</xref>
        ]
showed how online recipe repositories could be potential
sources for knowledge discovery to support personalized
and group-based recipe recommendations. [
        <xref ref-type="bibr" rid="ref11 ref19 ref5">5, 11, 19</xref>
        ] looked
into patterns in users’ online activity around food.
Contributions of This Work . The related works unveil
seminal solutions available to address the 5 dominant challenges
(as summarized in introduction) in FRS research. Unlike our
EnsT M based approach that consider multi-domain
foodfeatures, most of these solutions focus on ingredients while
generating recommendation. While some of the existing
work proposed significant approaches to consider
sociocultural and contextual features, they are often limited to
their food-corpus and user-group. Diferently from our
approach, many existing FRS approaches depend on pre-existing
recipe ratings from user. Also, there dose not exist many
works which try to reduce the food data format in the aim
of enabling the FRS to perform with large recipe corpus
(e.g., 230,000+ recipes). Our contributions are summarized
as follows:
• a novel method to identify significant multi-domain
      </p>
      <p>Food Features from any food-corpus.
• a Food Feature based intelligent user-modeling
technique that fosters higher personalization since
coldstart scenario.
• fine-grained recommendation algorithms that
considers user’s preference on multi-domain food features.
• a reduced data representation format that enables FRS
to perform faster and at the same time preservers the
integrity of the recipe information.
• a substantial user study that showed the
recommendation approach achieves the level of user-satisfaction
that it thrives for.</p>
    </sec>
    <sec id="sec-4">
      <title>3 RECOMMENDER STRATEGIES</title>
      <p>
        To create a recipe data-set, we developed a web-scraper
for geniuskitchen.com [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Our final data-set comprises of
230,876 recipes. Each recipe was stored as a plain-text
document that included information on ingredients, instructions,
servings, cuisine, cooking-time, cooking-approach, cooking
equipment, context, taste (e.g. sour or spicy) and nutrition
data.
      </p>
      <p>
        The first aim of our work was to uncover common
foodfeatures across the recipe data-set that could then be used
to model user-preference and resolve user-to-recipe
relationships. One traditional approach to achieving this is to
apply TF-IDF [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. This provides a term (word) frequency
matrix that favors intra-document dominance of a word over
intra-corpus dominance. However, it does not produce any
knowledge about the term beyond the occurrence frequency.
Topic Modelling (TM) is an alternative and widely
investigated approach, which attempts to discover the underlying
thematic structure within a text corpus as derived from
cooccurrences of words across the documents [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. A Topic
Model typically consists of k topics, each represented by a
ranked list of strongly-associated terms/words. Each topic
represents trend or theme of the contents of the document.
Belford et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] extended TM in their EnsT M. They built on
evidence by Topchy et al. [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] that ensemble procedures
encourage diversity and improve quality by integrating results
across multiple iterations of individual algorithms.
      </p>
      <p>
        To extract a set of significant features from our recipe
corpus, we proceeded with EnsT M [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] based on the
generation and integration of the results produced by 100 runs of
TM based on non-negative matrix factorization [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. This
produced a Topic-Term Weight Matrix where each column
is a topic and each row determines the level of association
between {Topic, Term} pair. To achieve a diverse and novel
feature set we selected the top 30 topics and top 15 terms
within each of these topics. We followed [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] for deciding on
the number of topics and number of terms-per-topic. Term
number t=15 gave the highest stability score [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] for our
recipe corpus. Some terms appeared over multiple topics as
they are involved in multiple food-trends.
      </p>
      <p>We consider the value of each {Topic, Term} pair in the
Topic-Term Weight Matrix as the significance weight wi
for each term i within the corresponding topic. For terms
existing over multiple topics we assigned wi as the
cumulative sum of their weight over all the corresponding topics.
This produced a final set of 288 unique terms representing
diverse aspects of food, e.g. cooking-approach, ingredient,
equipment, serving-techniques, preservation-techniques and
context. These 288 terms, summarized in Table 1, are our
identified Food Features and their corresponding weight are
the proposed Feature Scores1.</p>
      <p>Feature-Type Features
context holiday-food, beginner-cook, week-night,
inexpensive , 6-people-or-more, potluck
cuisine italian, hawaiian, tex-mex, chinese, cajun
equipment saucepan, thermomix, wok, dutch-oven
cooking few-steps-recipe, less-than-one-hour, fried,
process slow-cooked, marinated, 4-hours-or-more
ingredient poultry, feta, spaghetti, ham, shredded-meat
category risotto, lasagna, stew, appetizer, pot-roast
nutrition high-calcium, low-cholesterol, egg-free
Table 1: Summary1 of the extracted features from ETM
In this work, we adopted a simple recipe-to-feature
relationship by representing each recipe as a vector of 288
features, where each feature value corresponds to its TF-IDF
within the recipe. The transformation of the recipe corpus
into a recipe-to-feature matrix, as shown in figure 1, reduces
the bulk overload of food data while still holding enough
information to retrieve each recipe.</p>
      <sec id="sec-4-1">
        <title>Recipes</title>
        <p>R1
R2
.</p>
        <p>Rn</p>
      </sec>
      <sec id="sec-4-2">
        <title>Plaintext</title>
        <p>Document1
Document2</p>
        <p>......</p>
        <p>Documentn</p>
        <p>EnsT M
−−−−−−→</p>
        <p>R1
R2
.</p>
        <p>Rn
f1
0.79
0
.
0.61</p>
        <p>In the next step we used the identified food-features to
learn user’s preference. During their initial interaction with
our FRS, users are asked to choose features with a like or
dislike. (Note there was no requirement for users to rate all
288 features). To build the user-to-feature matrix the FRS
assigns +5 to liked features, -5 to disliked features and 0 to
any feature that has not been selected by the corresponding
user. Unlike typical RS approaches we assigned an extreme
1The complete set of 288 features, their corresponding weights and
set of food features correlated to healthier lifestyle are available at
https://github.com/MAK273/SupportingFileForHealthRecsys2019
negative value to disliked features. This was an important
design decision and was done with the view to producing
insights beyond user’s food preferences, by enabling our
system to implicitly capture important considerations such
as nutritional restrictions or foods which users deliberately
avoid.</p>
        <p>We implemented three EnsT M based recommendation
algorithms: FFbR, WFFbR, and FFbCF. Each uses the
recipeto-feature matrix to transform user’s positive and negative
scores on features to user’s scores on recipes.</p>
        <p>• Food Feature based Recommender (FFbR): This
strategy assigns a preference score P for user ua on a target
recipe rn based on the cumulative sum of ua ’s rating
(dis/like) for all features fi(1,2, ..,m) present in rn . Where
fi,ua is ua ’s rating on a feature fi and m is the total
number features consisting rn .</p>
        <p>P (ua, rn ) =
m
Õ
i=0
fi,ua
!′(0,5)
Instead of taking an average, we normalized the
cumulative sum to a range {0 to 5} to favor recipes with
more liked features over others. FFbR treats all
foodfeatures equally, assuming that each feature has an
equal impact on user preferences.
• Weighted Food Feature based Recommender (WFFbR):
With WFFbR we aimed to account for the difering
impact of diferent food features. It scales ua ’s preference
on a feature fb with its corresponding feature score
wb and predicts ua ’s preference on rn as the
cumulative sum of the weighted preferences on all m features
within rn .</p>
        <p>P (ua, rn ) =</p>
        <p>
          fi,ua × wi
m
Õ
i=0
!′(0,5)
• Food Feature based Collaborative Filtering (FFbCF):
FFbCF applies the CF proposed by Freyne et al. [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] in
order to increase the knowledge on user’s preference
and predict user’s preference score on food-features
not been liked or disliked by the user. When user ua
ifrst interacts with it the FFbCF identifies ua ’s nearest
neighbors based on similar ratings on overlapping
features. We implemented KNN clustering [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] to identify
top n nearest neighbours of ua . For a new feature fb
FFbCF predicted ua ’s preference as,
        </p>
        <p>P (fb,ua ) = Íin=0nfb,ui (3)
With this more densely populated user-to-feature
matrix FFbCF generates P (ua, rn ) using equation 1.
(1)
(2)</p>
        <p>
          To compare proposed EnsT M based recommenders we
implemented the generic CB [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] approach as our baseline.
• Content-Based(CB): CB predicts P (ua, rn ) based on
ua ’s explicit preference on the ingredients Inдi(1,2, ..,m)
comprising rn . Where m is the total number
ingredients in rn .
        </p>
        <p>P (ua, rn ) =
Ími=0 Inдi,ua
m
(4)
4</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>EVALUATION</title>
      <p>In order to test the EnsT M base FRS strategies, we conducted
a user study with 48 users of varying nationalit and
ethnicity. The user-group belongs to an age-rage of 21 to 65’ and
comprises of students, professionals and athletes. 45% of our
participants identified them as female and 55% as male.
Participants were recruited though social media groups within
UCD. All participants were entered into a draw for a 50¤
gift voucher. Ethics permission for this study was provided
by UCD ofice of research ethics.</p>
      <p>A smaller recipe-corpus of 92,539 recipes with valid images
was used as the primary recipe data-set. The study compared
four approaches: the three EnsT M based FRS strategies and
a CB approach. Each approach predicted user’s preference
on all 92,539 recipes. For each recommendation strategy,
the top 2,100 recipes with highest prediction score were
divided into 7 equal sized epochs and from each epoch one
recipe was randomly selected. This approach was taken to
support diversity and allow users to have more options at
their disposal.</p>
      <p>We developed a website2. and hosted it under the
university domain. Participants were first required to access the
website and indicate their informed consent and then create
a user-name and password. They could then log into a secure
website that displayed an interactive panel of images
representing all 288 features, in the order of their feature weight.
They were asked to select at least 20 features which they like
and at least 20 features which they dislike. This information
was used to create a user profile. Once created, participants
could log into their profile and browse the features to
update their likes and dislikes. To populate user’s profile for
the baseline approach participants were asked to elicit the
ingredients they like or eats frequently. Each user had to
type in at least 20 ingredients. Participants also selected an
appointment time for the main experiment.</p>
      <p>During the main experiment participants were shown a
series of four recommendation lists corresponding to each
of our recommendation algorithms. Each list consisted of
seven recipes. The order in which the recommendation lists
were presented was fully counter-balanced across the 48
2Demo of the website could be found at https://youtu.be/ujaB0FiqRwk
participants. Within each list, participants were required to
rate each individual recipe on a 5 star rating scale, where
0 and 5 represented "not like at all" and "liked very much"
respectively.</p>
    </sec>
    <sec id="sec-6">
      <title>RESULTS</title>
      <p>Accuracy: The accuracy of the recommendations has been
evaluated based on participant ratings of recipes. For each
participant, the average rating across the seven-item list
generated by each recommendation strategy was calculated.
Figure 2 shows the mean score of each algorithm across all
users. The pure CB approach was the poorest performer. This
was confirmed though statistical analysis. We first conducted
a repeated measures analysis of variance that compared the
mean ratings of participants across the four algorithms. The
result, F(3,188)= 14.42229, p&lt;0.001, indicates a significant
diference within the results. Paired sample t-tests were then
conducted between the individual algorithms, with a null
hypothesis in each case of no diference in the mean ratings.
We do not find a significant diference between participants
ratings across the EnsT M approaches, indicating that they all
performed equally well in terms of accuracy. There was
however a significant diference in participants ratings between
each of the EnsT M approaches and the CB baseline, with p
&lt; 0.001 in each case. This suggests that each EnsT M based
approach performed significantly better than the baseline
CB approach.
provided 100% coverage, with predictions for all recipe-user
pairs.</p>
      <p>se 100
p
i
c
e
r
f
o
eag 50
t
n
e
c
r
e
p
0
Implicitly capturing food practices: Another practical
aspect of knowledge building for a FRS is an algorithm’s
ability to predict important aspects of a user’s food practices
from available user information. For example, while both
vegetarians and vegans eat vegetables, eggs should only be
recommended to vegetarians. Figure 4 shows that the CB
baseline performed poorly in this regard. In contrast FFbR ,
WFFbR identified user’s food practice 100% accurately. Here
the feature-to-recipe direct relationship extends the dislike
property of the FRS as an efective identifier tool. The reason
FFbCF failed to predict food practice for some users is the
collaborative efect of their neighbour’s food practice.
CB</p>
      <p>FFbR
WFFbR
FFbCF
2.8
3.45
3.33
3.42
3.5
1.5
2
2.5
3
4
4.5
5
Coverage: Here we consider the coverage achieved by
each algorithm across all users, that is, the percentage of
recipe-user pairs where the algorithm was able to generate a
prediction. Figure 3 details the coverage achieved by each
algorithm. The notable outlier is CB, which produced coverage
of only 20%. FFbR and WFFbR both had user’s preferences
for an average of 51 of our 288 features and both produced a
coverage of 91.57%, with predictions for all recipe-user pairs.
FFbCF, with a more densely populated user-to-feature matrix,</p>
      <p>
        Correlation between lifestyle and food-features:
Further analysis on the data-set collected from the user study
exposed interesting associations between users’ lifestyle and
their feature-preference. Users were categorized under
different health-groups based on three diferent healthiness
measures: activity_level, BMI and average food_healthScore.
User’s activity_level was a self reported assessment by user.
BMI was calculated from users’ height and weight following
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. User’s average food_healthScore was defined as the
average FSA health-score [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] of all recipes user liked (rated 4 or
more). Table 2 summarizes the category labels
corresponding to each healthiness measure and the guideline associated
with each categorization criteria.
      </p>
      <p>The activity_level and food_healthScore based
categorization showed agreement on the healthiness of user’s lifestyle
preference. Figure 5 illustrates the spread of the 48
participants over diferent activity based categories. It also
illustrates the percentage of each food_healthScore based
categories within each activity based categories. The proportion
of LessHealthy user-group decreased with the increase in
activity level. The BMI based categorization was not
predictive of either of activity_level and food_healthScore based
categorization.</p>
      <p>Scale
Activity
level
BMI</p>
      <p>The aim of the categorization was to investigate, if there
is any pattern in the interactions between certain
healthgroup and any food features. Finding the correlation between
these two variables allows us to assess whether healthier
users tend to like or dislike a particular feature. A natural
approach for such analysis is the application of machine
learning classification algorithms to access the predictive
capabilities of these features, although due the small sample
size (48 users) and the high degree of imbalance in the class
size across all three scales, a simple correlation analysis is
used in favour of these methods in this instance.
Average Food HealthScore
sedentary lightly_active moderately_active extra_active</p>
      <p>
        Results expressed interesting associations between
healthgroups and features. Given that the group/category-level
associated with activity_level and food_healthScore are
ordinal in nature, we conducted a Spearman rank correlation
analysis [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] to find the degree of association between
preference (positive/negative) for features and health-groups.
Table 3 shows the strongest significant features with p&lt;0.05
for a sample of 48 users. The coherence between user’s
personality factors, food_choice and activity_level, negotiates
for the features popular among the healthier user-group to
be leverages as initial recommendations for new users who
are looking for inspiration on healthier food-ideas/recipes.
      </p>
      <sec id="sec-6-1">
        <title>Average Food HealthScore</title>
        <p>Feature r
peanut-butter 0.447989
granola 0.365171
lentil 0.360767
indian 0.356347
cauliflower 0.352353
low-cholesterol 0.350818
maple 0.321131
vegetable 0.307459
wheat 0.303326
carrot 0.303052</p>
      </sec>
      <sec id="sec-6-2">
        <title>Activity Level</title>
        <p>Feature r
wing 0.441152
tuna 0.430467
tilapia 0.363502
salmon 0.359852
hawaiian 0.346401
canadian 0.322470
smoothy 0.314174
chicken-thighs-legs 0.314059
halibut 0.310990
main-dish 0.303345</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>CONCLUSIONS AND FUTURE WORK</title>
      <p>This work presents an initial evaluation of EnsT M based FRS.
Results show that EnsT M based approaches performs
significantly better than a conventional CB approach. It provides
a universal feature extraction approach that can generate
a set of significant food-features from any recipe/ menu/
food corpus. The features have the added advantage of
being human understandable and allowed us to directly model
user preferences. EnsT M based feature identification resolves
the limitation of user-group dependency and is capable of
making food recommendations for users from diverse
nationality, ethnicity and culture. It allows for the generation
of recommendations without the need for existing user
ratings on recipes, helping to address the cold start problem.
By working with a reduced feature set, EnsT M also enables
computationally eficient recommendation. Furthermore the
the subset of nutritional features within our food features
supports the proposed EnsT M approaches to personalize the
Reclist according user’s nutritional preference.</p>
      <p>While there was no significant diference between the
three EnsT M based approaches in terms of users’ recipe
ratings, the use of EnsT M in combination with CF provided
best coverage, predicting user preferences across 100% of our
recipe corpus. However, the CF based approach performed
more poorly in terms of implicit understanding of users’
food practices. In future work we aim to focus on applying
the EnsT M based recommenders to support diet/menu
planning by incorporating health-aware filtering strategies, with
the view to providing long-term, guided and healthier food
choices. The positive and negative popularity of features
among certain health-groups also inspired us to investigate
food feature in comparison with healthiness clues for user
modeling and recipe recommendation.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1] [n. d.].
          <source>FSA Nutrient and Food Guidelines</source>
          . https: //www.ptdirect.com/training-design/nutrition/national-nutritionguidelines
          <string-name>
            <surname>-</surname>
          </string-name>
          united-kingdom
          <source>Accessed : March</source>
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2] [n. d.]. Geniuskitchen. http://www.geniuskitchen.
          <source>com Accessed : March</source>
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <year>2009</year>
          .
          <article-title>FAO energy requirement guideline</article-title>
          . http://www.fao.
          <source>org/3/ y5686e/y5686e07.htm Accessed :March</source>
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <year>2009</year>
          . WHO :
          <article-title>Body mass index</article-title>
          . http://www.euro.who.int/en/healthtopics/disease
          <article-title>-prevention/nutrition/a-healthy-lifestyle/body-massindex-bmi</article-title>
          <source>Accessed :March</source>
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <fpage>2019</fpage>
          .
          <article-title>Investigating and predicting online food recipe upload behavior</article-title>
          .
          <source>Information Processing and Management</source>
          <volume>56</volume>
          ,
          <issue>3</issue>
          (
          <year>2019</year>
          ),
          <fpage>654</fpage>
          -
          <lpage>673</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Carole</surname>
            <given-names>A Bisogni</given-names>
          </string-name>
          , Margaret Jastran,
          <string-name>
            <given-names>Marc</given-names>
            <surname>Seligson</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Alyssa</given-names>
            <surname>Thompson</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>How People Interpret Healthy Eating: Contributions of Qualitative Research</article-title>
          .
          <source>Journal of nutrition education and behavior</source>
          <volume>44</volume>
          (07
          <year>2012</year>
          ),
          <fpage>282</fpage>
          -
          <lpage>301</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Mark</given-names>
            <surname>Belford</surname>
          </string-name>
          , Brian MacNamee, and
          <string-name>
            <given-names>Derek</given-names>
            <surname>Greene</surname>
          </string-name>
          .
          <year>2016</year>
          . Ensemble Topic Modeling via Matrix Factorization.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>JesúS</given-names>
            <surname>Bobadilla</surname>
          </string-name>
          , Fernando Ortega, Antonio Hernando, and
          <string-name>
            <given-names>JesúS</given-names>
            <surname>Bernal</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>A Collaborative Filtering Approach to Mitigate the New User Cold Start Problem</article-title>
          .
          <source>Know.-Based Syst</source>
          .
          <volume>26</volume>
          (
          <issue>Feb</issue>
          .
          <year>2012</year>
          ),
          <fpage>225</fpage>
          -
          <lpage>238</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>T.</given-names>
            <surname>Cover</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Hart</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>Nearest Neighbor Pattern Classification</article-title>
          .
          <source>IEEE Trans. Inf. Theor</source>
          .
          <volume>13</volume>
          ,
          <issue>1</issue>
          (Sept.
          <year>2006</year>
          ),
          <fpage>21</fpage>
          -
          <lpage>27</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>David</given-names>
            <surname>Elsweiler</surname>
          </string-name>
          and
          <string-name>
            <given-names>Morgan</given-names>
            <surname>Harvey</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Towards Automatic Meal Plan Recommendations for Balanced Nutrition</article-title>
          .
          <source>In Proceedings of the 9th ACM Conference on Recommender Systems (RecSys '15)</source>
          .
          <fpage>313</fpage>
          -
          <lpage>316</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>David</given-names>
            <surname>Elsweiler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Christoph</given-names>
            <surname>Trattner</surname>
          </string-name>
          , and Morgan Harvey.
          <year>2017</year>
          .
          <article-title>Exploiting Food Choice Biases for Healthier Recipe Recommendation</article-title>
          .
          <source>In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '17)</source>
          .
          <fpage>575</fpage>
          -
          <lpage>584</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Jill</given-names>
            <surname>Freyne</surname>
          </string-name>
          and
          <string-name>
            <given-names>Shlomo</given-names>
            <surname>Berkovsky</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Intelligent Food Planning: Personalized Recipe Recommendation</article-title>
          .
          <source>In Proceedings of the 15th International Conference on Intelligent User Interfaces (IUI '10)</source>
          .
          <fpage>321</fpage>
          -
          <lpage>324</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Jill</given-names>
            <surname>Freyne</surname>
          </string-name>
          and
          <string-name>
            <given-names>Shlomo</given-names>
            <surname>Berkovsky</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Recommending Food: Reasoning on Recipes and Ingredients</article-title>
          .
          <source>In Proceedings of the 18th International Conference on User Modeling, Adaptation, and Personalization (UMAP'10)</source>
          .
          <fpage>381</fpage>
          -
          <lpage>386</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Mouzhi</surname>
            <given-names>Ge</given-names>
          </string-name>
          , Mehdi Elahi, Ignacio Fernaández-Tobías,
          <string-name>
            <given-names>Francesco</given-names>
            <surname>Ricci</surname>
          </string-name>
          , and
          <string-name>
            <given-names>David</given-names>
            <surname>Massimo</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Using Tags and Latent Factors in a Food Recommender System</article-title>
          .
          <source>In Proceedings of the 5th International Conference on Digital Health 2015 (DH '15)</source>
          .
          <fpage>105</fpage>
          -
          <lpage>112</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Mouzhi</surname>
            <given-names>Ge</given-names>
          </string-name>
          , Francesco Ricci, and
          <string-name>
            <given-names>David</given-names>
            <surname>Massimo</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Health-aware Food Recommender System</article-title>
          .
          <source>In Proceedings of the 9th ACM Conference on Recommender Systems (RecSys '15)</source>
          .
          <fpage>333</fpage>
          -
          <lpage>334</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Derek</surname>
            <given-names>Greene</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Derek O'Callaghan</surname>
            ,
            <given-names>and Pádraig</given-names>
          </string-name>
          <string-name>
            <surname>Cunningham</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>How Many Topics? Stability Analysis for Topic Models</article-title>
          .
          <source>In Machine Learning and Knowledge Discovery in Databases</source>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Hanshen</given-names>
            <surname>Gu</surname>
          </string-name>
          and
          <string-name>
            <given-names>Dong</given-names>
            <surname>Wang</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>A Content-aware Fridge Based on RFID in Smart Home for Home-healthcare</article-title>
          .
          <source>In Proceedings of the 11th International Conference on Advanced Communication Technology - Volume 2 (ICACT'09)</source>
          .
          <fpage>987</fpage>
          -
          <lpage>990</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Morgan</surname>
            <given-names>Harvey</given-names>
          </string-name>
          , Bernd Ludwig, and David Elsweiler. [n. d.].
          <article-title>Learning user tastes: a first step to generating healthy meal plans?</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Morgan</surname>
            <given-names>Harvey</given-names>
          </string-name>
          , Bernd Ludwig, and
          <string-name>
            <given-names>David</given-names>
            <surname>Elsweiler</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>You Are What You Eat: Learning User Tastes for Rating Prediction</article-title>
          .
          <source>In Proceedings of the 20th International Symposium on String Processing and Information Retrieval - Volume 8214 (SPIRE</source>
          <year>2013</year>
          ).
          <fpage>153</fpage>
          -
          <lpage>164</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Yehuda</surname>
            <given-names>Koren</given-names>
          </string-name>
          , Robert Bell, and
          <string-name>
            <given-names>Chris</given-names>
            <surname>Volinsky</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Matrix Factorization Techniques for Recommender Systems</article-title>
          .
          <source>Computer 42</source>
          , 8 (Aug.
          <year>2009</year>
          ),
          <fpage>30</fpage>
          -
          <lpage>37</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Fang-Fei Kuo</surname>
          </string-name>
          , Cheng-Te
          <string-name>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>Man-Kwan Shan</surname>
          </string-name>
          , and
          <string-name>
            <surname>Suh-Yin Lee</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Intelligent Menu Planning: Recommending Set of Recipes by Ingredients</article-title>
          .
          <source>In Proceedings of the ACM Multimedia 2012 Workshop on Multimedia for Cooking and Eating Activities (CEA '12)</source>
          .
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>Mavuto</given-names>
            <surname>Mukaka</surname>
          </string-name>
          .
          <year>2012</year>
          . Statistics Corner:
          <article-title>A guide to appropriate use of Correlation coeficient in medical research</article-title>
          .
          <source>Malawi medical journal : the journal of Medical Association of Malawi</source>
          <volume>24</volume>
          (09
          <year>2012</year>
          ),
          <fpage>69</fpage>
          -
          <lpage>71</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>Juan</given-names>
            <surname>Ramos</surname>
          </string-name>
          .
          <year>2003</year>
          .
          <article-title>Using TF-IDF to determine word relevance in document queries</article-title>
          .
          <source>(01</source>
          <year>2003</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Markus</surname>
            <given-names>Rokicki</given-names>
          </string-name>
          , Eelco Herder, Tomasz Kuśmierczyk, and
          <string-name>
            <given-names>Christoph</given-names>
            <surname>Trattner</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Plate and Prejudice: Gender Diferences in Online Cooking</article-title>
          .
          <source>In Proceedings of the 2016 Conference on User Modeling Adaptation and Personalization (UMAP '16)</source>
          .
          <fpage>207</fpage>
          -
          <lpage>215</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Markus</surname>
            <given-names>Rokicki</given-names>
          </string-name>
          , Christoph Trattner, and
          <string-name>
            <given-names>Eelco</given-names>
            <surname>Herder</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>The Impact of Recipe Features, Social Cues and Demographics on Estimating the Healthiness of Online Recipes</article-title>
          .
          <source>In ICWSM.</source>
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>Janusz</given-names>
            <surname>Sobecki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Babiak</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Slanina</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>Application of Hybrid Recommendation in Web-based Cooking Assistant</article-title>
          .
          <source>In Proceedings of the 10th International Conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part III (KES'06)</source>
          .
          <fpage>797</fpage>
          -
          <lpage>804</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <surname>Chun-Yuen</surname>
            <given-names>Teng</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu-Ru Lin</surname>
          </string-name>
          , and
          <string-name>
            <surname>Lada</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Adamic</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Recipe Recommendation Using Ingredient Networks</article-title>
          .
          <source>In Proceedings of the 4th Annual ACM Web Science Conference (WebSci '12)</source>
          .
          <fpage>298</fpage>
          -
          <lpage>307</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <surname>Alexander</surname>
            <given-names>Topchy</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Anil K. Jain</surname>
            , and
            <given-names>William F.</given-names>
          </string-name>
          <string-name>
            <surname>Punch</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>Clustering ensembles: models of consensus and weak partitions</article-title>
          .
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          <volume>27</volume>
          ,
          <issue>12</issue>
          (
          <year>2005</year>
          ),
          <fpage>1866</fpage>
          -
          <lpage>1881</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>Christoph</given-names>
            <surname>Trattner</surname>
          </string-name>
          and
          <string-name>
            <given-names>David</given-names>
            <surname>Elsweiler</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Food Recommender Systems: Important Contributions, Challenges and Future Research Directions</article-title>
          .
          <source>CoRR abs/1711</source>
          .02760 (
          <year>2017</year>
          ). arXiv:
          <volume>1711</volume>
          .02760 http://arxiv. org/abs/1711.02760
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>Christoph</given-names>
            <surname>Trattner</surname>
          </string-name>
          and
          <string-name>
            <given-names>David</given-names>
            <surname>Elsweiler</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Investigating the Healthiness of Internet-Sourced Recipes: Implications for Meal Planning and Recommender Systems</article-title>
          .
          <source>In Proceedings of the 26th International Conference on World Wide Web (WWW '17)</source>
          .
          <fpage>489</fpage>
          -
          <lpage>498</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <surname>Christoph</surname>
            <given-names>Trattner</given-names>
          </string-name>
          , Dominik Moesslang, and
          <string-name>
            <given-names>David</given-names>
            <surname>Elsweiler</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>On the predictability of the popularity of online recipes</article-title>
          .
          <source>EPJ Data Science</source>
          <volume>7</volume>
          ,
          <issue>1</issue>
          (
          <issue>05</issue>
          <year>Jul 2018</year>
          ),
          <fpage>20</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <surname>Christoph</surname>
            <given-names>Trattner</given-names>
          </string-name>
          , Markus Rokicki, and
          <string-name>
            <given-names>Eelco</given-names>
            <surname>Herder</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>On the Relations Between Cooking Interests, Hobbies and Nutritional Values of Online Recipes: Implications for Health-Aware Recipe Recommender Systems</article-title>
          .
          <source>In Adjunct Publication of the 25th Conference on User Modeling, Adaptation and Personalization (UMAP '17)</source>
          .
          <fpage>59</fpage>
          -
          <lpage>64</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <surname>Tsuguya</surname>
            <given-names>Ueta</given-names>
          </string-name>
          , Masashi Iwakami, and
          <string-name>
            <given-names>Takayuki</given-names>
            <surname>Ito</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>A Recipe Recommendation System Based on Automatic Nutrition Information Extraction</article-title>
          .
          <source>In Proceedings of the 5th International Conference on Knowledge Science, Engineering and Management (KSEM'11)</source>
          .
          <fpage>79</fpage>
          -
          <lpage>90</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <surname>Youri</surname>
            <given-names>van Pinxteren</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gijs Geleijnse</surname>
            , and
            <given-names>Paul</given-names>
          </string-name>
          <string-name>
            <surname>Kamsteeg</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Deriving a Recipe Similarity Measure for Recommending Healthful Meals</article-title>
          .
          <source>In Proceedings of the 16th International Conference on Intelligent User Interfaces (IUI '11)</source>
          .
          <fpage>105</fpage>
          -
          <lpage>114</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>