=Paper= {{Paper |id=Vol-3303/paper4 |storemode=property |title=Bundle Recommender from Recipes to Shopping Carts - Optimizing Ingredients, Kitchen Gadgets and their Quantities |pdfUrl=https://ceur-ws.org/Vol-3303/paper4.pdf |volume=Vol-3303 |authors=Chahak Sethi,Melvin Vellera,Diane Myung-kyung Woodbridge,Joey Jonghoon Ahnn |dblpUrl=https://dblp.org/rec/conf/recsys/SethiVWA22 }} ==Bundle Recommender from Recipes to Shopping Carts - Optimizing Ingredients, Kitchen Gadgets and their Quantities== https://ceur-ws.org/Vol-3303/paper4.pdf

Bundle Recommender from Recipes to Shopping Carts -
Optimizing Ingredients, Kitchen Gadgets and their
Quantities
Chahak Sethi1,† , Melvin Vellera1,† , Diane Myung-kyung Woodbridge1 and
Joey Jonghoon Ahnn2
1
University of San Francisco, San Francisco, California, USA
2
Target, Sunnyvale, California, USA

Abstract
In this paper, we introduce a recommender system where it automatically captures the context of what users or guests look for
and recommends a bundle of products to be added to their shopping cart. The recommendation system takes selected recipes
from a user as input and recommends a shopping cart with ingredients in optimized quantities as well as any kitchen gadgets
that might be necessary to efficiently prepare the recipes using neural networks. We propose a system architecture, dive
deep into the individual components, and evaluate the performance of information retrieval, semantic search, and quantity
optimization algorithms. Using an ensemble methodology, we attained a mean average precision of over 0.9 for ingredient
and quantity recommendations. The recipe-based bundle recommender system may be used not only to improve the user’s
shopping experiences but also to enable and encourage them to have healthier eating habits, aiming at providing personalized
product recommendations.

1. Introduction or TV shows, including genre, language, cast, director,
and popularity, to recommend them to their users [28].
Recommendation systems provide personalized recom- With the COVID-19 pandemic, there has been in-
mendations for the customers using various data from creased need and interest in meal preparation, including
customers and products to enhance customer experience shopping and cooking. This even helped some people
and maximize the conversion rate, significantly contribut- rediscover their liking for meal preparation and discover
ing to revenue growth in the retail industry. In 2020, new recipes. However, in recommending items for cook-
the recommendation system market was valued at 1.77 ing recipes, there are unique challenges, including: (1)
billion US dollars globally with a projected compound Recommending in-stock grocery items with correct quan-
annual growth rate of 33.0% by 2028 [9]. tity as the recipe uses volume while the product uses
Recommendation systems generally utilize two cate- weight as a measure. For example, in Figure 1, the recipe
gories of algorithms, including Collaborative filtering- for classic french toast uses four tablespoons of unsalted
based recommendation [23][5] and content-based recom- butter, whereas it is sold in a 16-ounce pack; (2) Recom-
mendation [16] algorithms. Collaborative filtering (CF) mending kitchen gadgets using unstructured text data
relies on the user’s history data and matches a user A with in directions. For example, in the figure 1, the recipe for
a similar user B and recommends A what B liked. Today, classic french toast requires to ”whisk”, which implicitly
most big e-commerce giants with a massive amount of indicates that the user would need a ”whisker”; (3) Op-
data, use collaborative filtering to recommend products timizing quantity for repeating ingredients in multiple
to their customers [4]. Content-based recommendations recipes. For example, if the classic French toast recipe
create a profile to characterize each item. Some of the (Figure 1, Table 1) requires six slices of bread, while an-
industry applications of content-based recommendation other recipe requires eight slices, which indicates a total
systems include a name that suggests jobs to the users by of 1 pack of bread is needed to complete both recipes.
matching their interests and skills with the features of job In this research, we developed a content-based rec-
postings [2]. IMDb uses the information of the movies ommendation system with natural language processing
to solve the three aforementioned sub-problems: (1) the
ORSUM@ACM RecSys 2022: 5th Workshop on Online Recommender developed system parses ingredients and kitchen gadgets
Systems and User Modeling, jointly with the 16th ACM Conference on in the recipes corpus and recommends the most similar
Recommender Systems, September 23rd, 2022, Seattle, WA, USA
† products from our product catalog database; (2) it parses
The authors contributed equally.
unstructured text in the recipe direction section to rec-
Envelope-Open csethi2@usfca.edu (C. Sethi); mvellera@usfca.edu (M. Vellera);
dwoodbridge@usfca.edu (D. M. Woodbridge); ommend the kitchen gadgets required to cook a recipe;
joey.ahnn@target.com (J. J. Ahnn) (3) Finally, it optimizes the number of products and bun-
© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0). dles the recommendations together as a set of items that
CEUR
Workshop
Proceedings
http://ceur-ws.org
ISSN 1613-0073
CEUR Workshop Proceedings (CEUR-WS.org)
Figure 1: Text of an Example Recipe’s (Classic French Toast) and Recommended Ingredients and Kitchen Gadgets

Table 1 present in this field in Section 2. Next, we provide the
Shopping Cart Recommendation - French Toast overview of the developed system, including mapping
Product Qty Price ($) kitchen gadgets and ingredients in the recipes to relevant
Hood Heavy Cream - 1pt 1 3.49 products from the product catalog database, optimizing
McCormick Ground Cinnamon - 2.37oz 1 1.99
Arnold Stone Ground Wheat Bread - 16oz 1 2.59
the quantities to be recommended, and providing mul-
Grade A Large Eggs - 12ct - Good & Gather™ 1 1.69 tiple candidates and their corresponding ranking to the
Iodized Salt - 26oz - Good & Gather™ 1 0.49 user in Section 3. We have employed a few techniques
Tillamook Unsalted Butter Spread - 16oz 1 4.19
Squish 1.5qt Mixing Bowl 1 8.99
to measure the performance of the system which will
9” Whisk Stainless Steel 1 6.00 be discussed in Section 4. We summarize our work with
Westinghouse Cast Iron Seasoned Skillet 6.5- 1 23.50 some future works section 5. The authors make the codes
inch
used for the research available to the public [24].

customers can purchase individually or as a bundle, a 2. Related Work
group of complementary items that can be purchased
together [13] [29]. In late 2015, an American company that operates a gro-
Our research offers the convenience of personally cu- cery delivery and pick-up service in the United States
rated shopping experiences to users by reducing their and Canada, integrated with AllRecipes, a top recipe site,
shopping efforts to find out right products as a bundle. to allow users to select a recipe and fill their cart with all
The system helps users without any prior experience the necessary ingredients [22]. Although the ingredient
in cooking easily purchase the required ingredients and recommendations provided by a e-commerce company
kitchen gadgets. An automatic quantity optimization are accurate for a good number of recipes, we observed
will reduce wastage of resources and optimize costs for that the recommended quantities were not ideal for cer-
the users by suggesting the correct quantity of the prod- tain recipes. There were also cases where no matching
uct. We believe that this approach leads to increased product was found for a recipe ingredient, such as veg-
user basket sizes, eventually raising the business revenue. etable oil, even though there are other closely related
Furthermore, it can also aid in automated promotional products, including olive oil and canola oil. In addition
emails to the consumers with all the ingredients and gad- to these limitations, we also identified an opportunity for
gets in a recipe rather than the hand-curated contents augmenting ingredient recommendations with kitchen
which we find time-consuming. gadget recommendations that could help improve a user’s
In this paper we discuss the related work already cooking experience, especially those new to cooking. Our
work has been primarily motivated by these use cases, corpus (millions) while still being certain that recommen-
and in this paper, we propose a methodology that gen- dations are personalized and engaging for the user. Our
erates accurate ingredient recommendations as well as research employed a two-stage approach for candidate
kitchen gadget recommendations. generation (retriever stage) and ranking (re-ranker stage)
To our knowledge, there has not been any notable re- [7].
search regarding the recommendations of an optimized
shopping cart of ingredients and kitchen gadgets based
on recipes. We find that most of the existing literature in 3. System Overview
the food domain is related to recommending recipes or in-
The proposed system first takes a recipe as input from
gredient substitutes. [25, 26, 11, 17]. Anirudh Jagithyala
a guest shopping on an e-commerce company’s website.
[11] developed a recommendation system that recom-
The recipe then gets split into two sections: 1) cooking
mends recipes based on recipe ratings, ingredients, and
instructions and 2) ingredients. The cooking instructions
review text. A number of approaches including memory-
are parsed in order to detect and extract any kitchen
based collaborative filtering and TF-IDF were tried along
gadgets that might be required for the recipe, while the
with similarity measures such as cosine and Pearson cor-
required ingredients and quantities are extracted from
relation. The research evaluated multiple approaches us-
the recipe’s ingredients section. The quantities and units
ing the mean average precision (mAP) and showed that
of ingredients in the recipe text and the product catalog
collaborative filtering on recipe ratings performed better.
database are standardized for accurately matching vary-
Chantal Pellegrini et al. [17] explored the use of text and
ing units from recipes and products. The ingredients and
image embeddings for identifying ingredient substitutes.
kitchen gadgets required by the recipe are then fed into
They generated context-free embeddings using word2vec
the recommender system to search for the best match-
as well as context-based embeddings using transformer-
ing products from the product catalog database based
based models. In the end, the research showed that the
on textual and quantity information. The system then
transformer-based multi-modal approach using text and
adds these products to the shopping cart of the guest
image embeddings together gave the best results with a
(Figure 2).
precision of 0.84 for the top 1000 most common ingre-
For advanced natural language processing (NLP), we
dients. Chun-Yuen Teng et al. [25] explored the recom-
utilized Open-source software libraries, including Spacy
mendation of recipes and ingredient substitutes using
[10] and NLTK [3] for extracting recipe ingredients and
network structures. The system identifies ingredient sub-
kitchen gadgets from the recipe text. The ingredients,
stitutes using a graph structure where nodes represent in-
along with the required quantities, were parsed from the
gredients and edges represent the degree of substitutabil-
ingredient section of the recipe (Figure 1) using regular
ity. To derive pairs of related recipes, they computed the
expressions, after which these ingredients were then pre-
cosine similarity between the ingredient lists for the two
processed using the NLTK library for stop words removal,
recipes, weighted by the inverse document frequency.
stemming, and 𝑛-gram expansion. The Spacy library was
Mayumi Ueda et al. [26] applied user preferences and
used for extracting kitchen gadgets from the recipe in-
ingredient quantity for recommending recipes. Their
structions using named entity recognition (NER). Once
method breaks recipes down into their ingredients and
the ingredients and kitchen gadgets of the recipe are
scores them based on the ingredients’ frequency of use
identified, they are compared against the products in the
and specificity.
product catalog database to get the most relevant prod-
Some of the existing algorithms for searching and rank-
uct in stock for each ingredient and user. This process
ing relevant items use classical information retrieval al-
also involved the novel usage of a language representa-
gorithms such as TF-IDF [19], BM-25 [21], or Glove [18],
tion model, Bidirectional Encoder Representations from
while others make use of deep learning models such as
Transformers (BERT) [8], which is designed to pre-train
BERT [8]. BERT [8] is a language representation model
deep bidirectional representations from an unlabeled text
that can give accurate contextual embeddings for words
by jointly conditioning on both left and right contexts in
in most cases. Unfortunately, the BERT does not gener-
all layers. The advantage of using a pre-trained architec-
ally give accurate representations for sentences, and the
ture is that we can use transfer learning to transfer the
construction of BERT makes it unsuitable for semantic
already trained features to the current data without the
similarity search. To overcome these issues, we applied
complexity of training heavy machine learning models
the Sentence-BERT, [20] model, which was trained using
[15].
Siamese BERT-Networks.
We also developed an algorithm to recommend the
The preferred recommendation system architecture
optimal number of products in case a guest chooses more
was one based a two-stage approach consisting of can-
than one recipe using the same products. The quantity
didate generation and ranking stages. This two-stage
or weight of the common ingredients is recommended
approach allows for recommendations from a very large
Figure 2: System Overview

Figure 3: Ingredient Recommendation Workflow

based on the sum of the required amount, which helps search time, the algorithm embeds a recipe ingredient
create the optimal baskets for the guests and leads to less into the same vector space and calculates the similarity
wastage of resources. between an ingredient (𝐼) and product (𝑃) to find the most
relevant products. These products would have a high
3.1. Ingredient Recommendation semantic overlap with the ingredient. We used cosine
similarity in Equation 1 to find the closest embeddings
The developed system utilizes a combination of semantic from the corpus.
search and information retrieval algorithms to recom-
𝑒
mend the most relevant products for the ingredients in 𝐼 ⋅𝑃 ∑𝑗=1 𝐼𝑗 ⋅ 𝑃𝑗
a recipe. Semantic search can improve search accuracy 𝑆𝑖𝑚(𝐼 , 𝑃) = = (1)
‖𝐼 ‖‖𝑃‖ 𝑒
∑ 𝐼 2 ∑ 𝑃2
𝑒
by understanding the context and content of the search √ 𝑗=1 𝑗 √ 𝑗=1 𝑗
query. In contrast to conventional search algorithms, , where 𝑒 is the embedding dimension of the ingredient
which only find documents based on lexical matches, se- and product vectors.
mantic search can also find synonyms. Semantic search After computing the cosine similarity between the
aims to embed all entries in the corpus into vector space, embedding of an ingredient and the embeddings of the
where the query is embedded into the same vector space, products, the top 𝑚 products are retrieved.
to find the closest embeddings from the corpus. In our For complex search tasks, the search can be signifi-
case, a recipe ingredient is a query, and the products are cantly improved by using a retrieve and re-rank frame-
all the entries in the corpus, where the products are em- work, where the top 𝑡 products are retrieved efficiently,
bedded into vector space and stored separately. During followed by a re-ranker that ranks these 𝑡 products and
recommends 𝑚 products. time of inference [6]. A bi-encoder model can encode
an input text, such as a recipe ingredient or a product,
3.1.1. Retriever and output a vector (embedding) that captures seman-
tic information. If a recipe ingredient embedding and a
Given an ingredient, we first use a retrieval system that product embedding are similar, then the cosine similarity
quickly retrieves 𝑡 products that are potentially relevant (Equation 1) between these two embeddings will be high.
for the given ingredient. Then the 𝑡 products are re- Hence, by comparing an ingredient embedding with all
ranked, and the top 𝑚 matches are sent through to the the product embeddings using cosine similarity, we can
quantity recommendation module. For our retrieval identify the most similar products for an ingredient.
system, we make use of the BM25 algorithm for lexi- In order to reduce the search space efficiently, hierar-
cal search [21]. For an ingredient query (𝐼) with terms chical classification models were created for the following
𝑖1 , … , 𝑖𝑛 , the BM25 score for a product text 𝑃 is in Equa- levels: class, subclass, and item-type. These models were
tion 2. trained using the preprocessed text of the products as
feature vectors and the respective hierarchical level val-
𝑛 𝑡𝑓 (𝑖𝑗 , 𝑃) ⋅ (𝑐 + 1) ues as the target labels. A softmax activation function
𝐵𝑀25(𝑃, 𝐼 ) = ∑ 𝐼 𝐷𝐹 (𝑖𝑗 ) (Equation 4) was used in the final layer of the multi-class
|𝐷|
𝑗=1 𝑡𝑓 (𝑖𝑗 ) + 𝑐 ⋅ (1 − 𝑏 + 𝑏 ⋅ 𝑑 ) classification models, along with the cross-entropy loss
𝑎𝑣𝑔
(2) function (Equation 5).
, where 𝑡𝑓 (𝑖𝑗 , 𝑃) is the number of times term 𝑖𝑗 of the
exp(𝑧𝑖 )
ingredient text occurs in 𝑃. |𝐷| is the number of words in Softmax(𝑧𝑖 ) = 𝐶 (4)
𝑃. 𝑑𝑎𝑣𝑔 is the average number of words for product text. ∑ 𝑗=1 exp(𝑧𝑗 )
𝑏 and 𝑐 are saturation parameters for document length
, where 𝐶 is the number of output classes, 𝑧𝑖 is an element
and term frequency respectively. In general, values such
of a vector 𝑧 of size 𝐶 corresponding to a particular class.
as 0.5 < 𝑏 < 0.8 and 1.2 < 𝑐 < 2 are reasonably good in
many circumstances [21].
Equation 3 describes inverse document frequency 1
𝑁
(𝐼 𝐷𝐹 (𝑖𝑗 )) for a corpus with 𝑁 products with term 𝑖𝑗 . Cross Entropy Loss = − (∑ 𝑦𝑖 ⋅ 𝑙𝑜𝑔(𝑦𝑖̂ )) (5)
𝑁 𝑖=1
𝑁 − 𝑁 (𝑖𝑗 ) + 0.5
𝐼 𝐷𝐹 (𝑖𝑗 ) = 𝑙𝑜𝑔 (3) , where 𝑁 is the number of observations, 𝑦𝑖 is the true
𝑁 (𝑖𝑗 ) + 0.5 label vector and 𝑦𝑖̂ is the predicted label probability vector.
, where 𝑁 (𝑖𝑗 ) is the number of product texts in the
database that contain the term 𝑖𝑗 of the ingredient query. 3.1.2. Re-ranker
For improving the accuracy of the retrieval stage, we
After retrieving the top 𝑡 products, the re-ranker stage
combined the BM25 algorithm with a bi-encoder sentence
ranks the products more accurately using a cross-encoder
transformer model that was fine-tuned using Microsoft’s
sentence transformer model that was fine tuned using
MiniLM model [27]. The MiniLM model is a compressed
the Microsoft’s MS Marco dataset [1], which is a large
Transformer model that uses an approach termed as deep
scale information retrieval corpus that was created based
self-attention distillation to reduce the number of parame-
on real user search queries using Bing search engine. In
ters required by a transformer model. It is twice as fast as
contrast to a bi-encoder model that performs two inde-
BERT while retaining more than 99% accuracy on SQuAD
pendent self-attentions for the query and the document,
2.0 and several GLUE benchmark tasks using only 50%
a cross-encoder model performs full self-attention across
of BERT’s model parameters. The bi-encoder sentence
the entire query-document pair. As a result, the cross-
transformer model [20] that uses the MiniLM model was
encoder can model the interaction between a query and
trained using a dataset of 1 billion sentence pairs and a
a document, and the resulting representations contain
self-supervised contrastive learning objective: given a
contextualized embeddings [6]. In our use case, an in-
sentence from a sentence pair, the model should predict
gredient and a product are passed simultaneously to the
which out of a set of randomly sampled other sentences
cross-encoder, which then outputs a score indicating how
was actually paired with one in the dataset. A bi-encoder
relevant the product is for the given ingredient. As the
model performs two independent self-attentions for the
cross-encoder models the interaction between an ingre-
query and the document, and the document is mapped
dient and a product during inference time, it is slower
to a fixed BERT representation regardless of the choice
than the bi-encoder, and hence, it can only be used for
of a query. This makes it possible for bi-encoder models
a small subset of products. However, we can achieve
to pre-compute document representations offline, signifi-
a higher accuracy as they perform attention across the
cantly reducing the computational load per query at the
query and the document. After the re-ranker stage, the
Table 2 of 1 lb rather than two packs of 0.5 lb flour if 1 lb-pack is
Commonly used units in recipes and corresponding conversion more cost-efficient.
to the International System of Units (SI) If a user selects multiple recipes, the quantity is opti-
Common Unit Converted Unit in mized such that minimum units of the common products
in Recipe International System of Units (SI) are recommended. For example, for two recipes using
1 cup 225 ml
one tablespoon and two tablespoons of salt each, the rec-
1 teaspoon 5 ml
1 tablespoon 15 ml ommended unit of salt can/bottle will be optimized to
1 fluid ounce 30 ml one to reduce waste and cost.

3.3. Kitchen Gadget Recommendation
top 𝑚 matched ingredients are then optimized based on
quantity. The recommendations for kitchen gadgets follow a simi-
lar approach to ingredient recommendations using a com-
3.2. Unit Normalization and Optimization bination of semantic search and information retrieval al-
gorithms. The required kitchen gadget is implicitly men-
Once the algorithm selects the top 𝑚 matched ingredients, tioned in unstructured recipe instruction text, whereas
the next important step is to recommend the optimal ingredients are explicitly listed in the ingredient sec-
quantity of the product needed in the recipe (Figure 4). tion. To identify the kitchen tools and methods post
For this, we start with retrieving the ingredient quantity the pre-processing of the recipe instructions, a custom
specified in the recipe and normalize the units required to NER model from Spacy [10] was trained on these entities.
a standard SI units (Table 2), including tablespoon (tbsp), The NER model identifies the kitchen gadgets (nouns)
teaspoon (tsp), milliliter (ml), cup, count, pound (lb), and used in the recipe and the methods (verbs) that can iden-
ounce (oz). These standard quantities are either weight tify a kitchen gadget. For example, for ”Chop the garlic
or in volumes handled differently from each other. and add to the pan”, a pan will be identified as a gadget,
As product descriptions generally utilize weight as a whereas chop will be identified as a method associated
measure while most recipes use volumes, we converted with the gadgets including a knife and chopping board.
the volume with the standardized unit to weight using Similar to ingredient recommendations, products are
density (𝑑). For instance, the weight (𝑤) for 𝑛 cups of a embedded into vector space and stored separately. Dur-
grocery product in the recipe, where 1 cup is 225 ml, can ing search time, a gadget from the recipe instruction is
be calculated as the following. also embedded into the same vector space, and the system
searches for the most relevant products. These products
𝑤 = 𝑛 ∗ 225 ∗ 𝑑 (6) would have a high semantic overlap with kitchen gad-
gets. After computing the cosine similarity between the
Once the required weight is calculated, the system com-
embedding of the kitchen gadget and the embeddings
pares it against the weight of the recommended products
of all products, the top 𝑚 products with the maximum
in product catalog’s database. The recommended num-
similarity are retrieved. For these search tasks, we used
ber of units is then calculated for each matched product
embeddings from the RoBERTa [12], an improved and
using Equation 7, where 𝑞 is the recommended quantity,
robustly trained version of BERT with further tuned hy-
𝑤 is the ounces required in the recipe, and 𝑝 is the ounces
perparameters. RoBERTa has achieved state-of-the-art
sold or packaged.
results on GLUE, RACE, and SQuAD.
𝑤 For complex search tasks, the search is significantly
𝑞=⌈ ⌉ (7)
𝑝 improved by using a retrieve (Section 3.1.1 ) and re-rank
For fresh produce items that use a count as a unit in (Section 3.1.1) framework, just like for the ingredients
a recipe, like two onions or three potatoes, the average (Figure 5). For quantity optimization, if the user selects
weight of the given fruits and vegetables is used to con- multiple recipes, the common kitchen gadgets are only
vert it to weight [14]. The reverse conversion is also recommended once, along with all the other gadgets used
applied if the unit for a product at e-commerce compa- in each recipe.
nies uses count and the recipe specifies weight instead.
Once the recommended quantity is known for 𝑚 4. Result
matched ingredients, we sort 𝑚 ingredients by the quan-
tity recommended and the price in ascending order. In For assessing the performance of different algorithms,
addition, the system recommends lower-priced items if we identified the relevant products for the top 100 most
multiple packaging options are available for the same common ingredients and kitchen gadgets (queries) and
products. For instance, the system recommends one pack
Figure 4: Quantity Recommendation Flow

Figure 5: Kitchen Gadget Recommendations

calculated the mean average precision (𝑚𝐴𝑃) at different 4.1. Ingredient Search
values of 𝐾, where 𝐾 is the number of retrieved products.
For ingredient search, different algorithms, as well as an
𝑄 ensemble of these algorithms, were evaluated using the
∑𝑞=1 𝐴𝑃@𝐾 (𝑞)
𝑚𝐴𝑃@𝐾 = (8) 𝑚𝐴𝑃@𝐾 metric. The BM25 algorithm was considered
𝑄 as the baseline model against more complex models. An
, where 𝑄 is the number of queries, 𝐾 is the number of interesting thing to note from Figure 6. is that the BM25
retrieved products, and 𝐴𝑃@𝐾 is the average precision algorithm gives a very good performance for 𝐾=1 since
at 𝐾. lexical search algorithms generally have high precision
due to exact keyword matching. However, for higher val-
𝐾
1 ues of 𝐾, the drop in mean average precision is quite high.
𝐴𝑃@𝐾 = ∑ 𝑃(𝑘) ⋅ 𝑟𝑒𝑙(𝑘) (9) The experiment results showed that transformer models
𝑚𝑖𝑛(𝐾 , 𝑅) 𝑘=1
such as MiniLM and MS Marco are more general with
, where 𝑅 is the number of relevant products, 𝐾 is the consistently high precision values across different 𝐾 val-
number of retrieved products, 𝑟𝑒𝑙(𝑘) is an indicator that ues. Using BM25 along with the MS Marco transformer
says whether that 𝑘 𝑡ℎ item was relevant (𝑟𝑒𝑙(𝑘)=1) or not model gave the best performance for all 𝐾 values.
(𝑟𝑒𝑙(𝑘)=0), and 𝑃(𝑘) is the precision at 𝑘. Once the best performing model was identified, we
𝑘 further evaluated the final model with 100 recipes from
∑ 𝑟𝑒𝑙(𝑖)
𝑃(𝑘) = 𝑖=1 (10) the Recipe1M+ corpus, which is a large-scale, structured
𝑘
Figure 6: Mean Average Precision of Ingredient Search for Different 𝐾

Table 3 4.3. Kitchen Gadget
Quantity Recommendation Accuracy Measurement
The NER model to identify kitchen gadgets was trained
Ingredient Total recipes Correct Qty Percentage Correct
Salt 32,506 32,443 99.806%
on 500 recipes and tested on 100 recipes. The custom
Sugar 19,983 16,473 82.435% entities’ kitchen gadgets and methods were marked with
Butter 18,958 14,875 78.463% their position in the text using regex for these 600 recipes.
A manual review of the annotations was performed to
update any incorrect or missed annotations, and we
corpus of over one million cooking recipes and 13 million achieved a test F1 score of 99.627% (Equation 11).
food images. The 𝑚𝐴𝑃@1 value for all the ingredients
from these 100 recipes was 0.949, which is similar to 2 ⋅ 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ⋅ 𝑟𝑒𝑐𝑎𝑙𝑙
the 𝑚𝐴𝑃@𝐾 values we see in Figure 6 for the top 100 𝐹1 = (11)
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙
ingredients.
, where precision is out of the total documents retrieved,
how many are relevant and recall is out of the total rele-
4.2. Quantity Normalization and vant documents how many relevant documents are re-
Optimization trieved.
A custom mapping is used to convert cooking methods
For evaluating quantity normalization, we measured the to kitchen gadgets which along with other identified gad-
accuracy of quantity normalization, using the most com- gets form search queries for the recipe. For evaluation,
monly used ingredients in the randomly chosen 100,000 different algorithms were evaluated at mAP@K, similar
recipes. The three most commonly used ingredients, salt, to ingredient search. We developed transformer models
sugar, and butter, are tracked to measure if the recom- including MiniLM, MS Marco, and Roberta for the se-
mended quantity after unit normalization is accurate or mantic search tasks. The experiment results show that
not. MS Marco, a combination of retriever and a re-ranker,
We found that salt is used in 32,506 out of the chosen performs consistently better than MiniLM and Roberta
100,000 recipes ( Table 3). Out of the total 32,506 recipes, across all 𝐾 (Figure 7).
the quantity of salt is correctly recommended in 32,443 As an example, we present a recipe in Figure 8 that
recipes which is 99.806% of the total. Similarly, the accu- consists of two sections: ingredients and directions. The
racies of the recommended quantity were 82.435% and ingredients section is used for ingredient and quantity
78.463% for sugar and butter respectively. However, the recommendations, whereas the directions section is used
relatively low accuracy in quantity recommendations for for kitchen gadget recommendations. The keywords used
sugar and butter is due to the incorrect recipe text. For for kitchen gadget recommendations are highlighted in
example, certain recipes say 34 cups of butter instead of red.
3/4 cups of butter. This has been further discussed in The product recommendations based on the recipe
Section 5. (Figure 8) is given below in Table 4.
Further, the quantity optimization process was also
evaluated with randomly chosen 100 recipes from the
Recipe1M+ corpus. The mAP@1 value for all the ingredi- 5. Conclusion
ents from these 100 recipes was 0.914. For a combination
of recipes, the result has been manually verified for 5 We presented a content-based recommendation system
random sets of any two recipes. to recommend relevant retail products to a user or guest
based on selected recipe contents. The recommended
products are primarily based on the ingredients required
Figure 7: Mean Average Precision of Kitchen Gadget Search for Different 𝑘

Ingredients Directions
1/2 cupoats
1. In a medium bowl combine rolled oats and water, let stand 5
2 cupswater
minutes.
2 cupspancake mix
1/2 cupapples 2. Meanwhile, heat large nonstick skillet or griddle to medium
2 tablespoonssugar high heat (375F).
1/2 teaspooncinnamon 3. Grease lightly with oil. Add remaining ingredients to rolled
oats mixture; stir just until all ingredients are moistened.
4. For each pancake, pour 1/4 cup batter into hot skillet.
5. Cook 1 to 1.5 minutes, turning when edges look cooked and
bubbles begin to break on surface.
6. Continue to cook 1 to 1.5 minutes or until golden brown.
7. Serve with syrup and butter, if desired.
Figure 8: Apple Oat Breakfast Pancakes

Table 4 assessing the top product recommendations for the 100
Shopping Cart Recommendation most frequently occurring ingredients in the 1M+ recipe
Product Qty Price ($) corpus. For ingredient search, the ensemble approach of
Red Delicious Apple 1 0.99 BM25 and MS Marco gave the best performance, while
McCormick Ground Cinnamon - 2.37oz 1 1.99
Good & Gather Organic Oats - 18oz 1 2.39
for kitchen gadget search, the MiniLM model proved to
Buttermilk Pancake Mix - 32oz 1 2.19 be more accurate. The accuracy of quantity recommenda-
Imperial Granulated Pure Sugar - 4lb 1 2.19 tions was measured by evaluating certain high-frequency
Good & Gather Alkaline Water - 1L 1 0.99
Squish 1.5qt Mixing Bowl - Green 1 8.99
ingredients such as salt, sugar, and butter across more
Anchor 8oz Glass Measuring Cup 1 3.49 than 50,000 recipes from the 1 million+ (1M+) recipe
Nylon Ladle with Soft Grip - Made By Design 1 3.00 corpus.
Lakeside 10” Nonstick Aluminum Skillet with 1 16.47
Faux Granite finish
There were a few challenges that we faced while im-
plementing the system that we hope to address in the
near future. Firstly, the recipe text used in the 1M+ recipe
by the recipe, which are extracted from the recipe text corpus has some inconsistencies where the quantity re-
along with associated quantities. More significantly, the quired is not accurately defined. For example, a lot of
system is also capable of recommending the relevant recipes say 34 cups of an ingredient instead of 3/4 cups. In
kitchen gadgets that might be required for making the future work, we can work on parsing the text with more
recipe based on the instructions provided in the recipe. scrutiny and rules to avoid such cases. Alternatively,
When a user selects multiple recipes, the recommender we could also build an entirely new recipe scraping al-
system optimizes quantities for each product, where the gorithm to handle such cases. Secondly, the quantity
quantities are adjusted according to the amounts of the optimization for multiple recipes is in place, but there is
common ingredients and kitchen gadgets present in the no formal way to measure how well is the performance
recipes. of the process. As a part of future work, we would devise
We conducted experiments for evaluating the effec- a way to quantify the performance of this process.
tiveness of various algorithms for the recommendation We also identified extra features for the current sys-
system, such as BM25, MiniLM, MS Marco, and RoBERTa. tem. First, the dietary restrictions of a user can be ac-
In our experiments, we compared these algorithms by counted for by considering the ingredients, nutritional
information, and key allergens that might be present in [5] Lin Chen, Rui Li, Yige Liu, Ruixuan Zhang, and
the product using natural language processing. Second, Diane Myung-kyung Woodbridge. 2017. Ma-
there could be preference filters provided to users before chine learning-based product recommendation us-
they add a recipe to the shopping cart that could consider ing Apache Spark. In 2017 IEEE SmartWorld, Ubiq-
their dietary preferences such as whether they prefer uitous Intelligence Computing, Advanced Trusted
non-GMO or organic products only. Third, instead of Computed, Scalable Computing Communications,
asking for preferences explicitly, the preferences of the Cloud Big Data Computing, Internet of People
users could be determined automatically by analyzing and Smart City Innovation (SmartWorld/SCAL-
their past shopping behaviors such as clicks, views, or COM/UIC/ATC/CBDCom/IOP/SCI). 1–6. https:
product purchases. Based on this implicit feedback, we //doi.org/10.1109/UIC-ATC.2017.8397470
could determine the user’s preferences, such as whether [6] Jaekeol Choi, Euna Jung, Jangwon Suh, and Won-
the user is a vegetarian or if the user is currently pur- jong Rhee. 2021. Improving Bi-encoder Docu-
chasing products specific to a particular diet, such as the ment Ranking Models with Two Rankers and Multi-
ketogenic diet. Such information can then be used in the teacher Distillation. In Proceedings of the 44th In-
recommendation system for personalizing recommended ternational ACM SIGIR Conference on Research and
products for different users. Development in Information Retrieval. ACM.
The content-based recommendation system proposed [7] Paul Covington, Jay Adams, and Emre Sargin. 2016.
in this paper could potentially be used to improve the Deep neural networks for youtube recommenda-
shopping experience of users by providing them options tions. In Proceedings of the 10th ACM conference on
to add all the necessary products required by a recipe recommender systems. 191–198.
automatically to their shopping cart. Kitchen gadget rec- [8] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and
ommendations could be helpful not just for improving Kristina Toutanova. 2018. BERT: Pre-training of
the shopping experiences of the users but also for their Deep Bidirectional Transformers for Language Un-
cooking experiences. These recommendations also al- derstanding.
low a business to increase the basket sizes of their users, [9] Grand View Research. 2021. Recommendation En-
thereby increasing revenue. Moreover, the system could gine Market Size, Share & Trends Analysis. Recom-
also aid in automated promotional emails that recom- mendation Engine Market Report (2021).
mend products required by recipes that may be of interest [10] Matthew Honnibal and Ines Montani. 2017. spaCy
to customers. 2: Natural language understanding with Bloom em-
beddings, convolutional neural networks and incre-
mental parsing. (2017). To appear.
References [11] Anirudh Jagithyala. 2014. Recommending recipes
based on ingredients and user reviews. Ph. D. Disser-
[1] Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng,
tation. Kansas State University.
Jianfeng Gao, Xiaodong Liu, Rangan Majumder, An-
[12] Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du,
drew McNamara, Bhaskar Mitra, Tri Nguyen, Mir
Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis,
Rosenberg, Xia Song, Alina Stoica, Saurabh Tiwary,
Luke Zettlemoyer, and Veselin Stoyanov. 2019.
and Tong Wang. 2016. MS MARCO: A Human Gen-
RoBERTa: A Robustly Optimized BERT Pretraining
erated MAchine Reading COmprehension Dataset.
Approach.
[2] Shivam Bansal, Aman Srivastava, and Anuja Arora.
[13] Kevin F. McCardle, Kumar Rajaram, and Christo-
2017. Topic Modeling Driven Content Based Jobs
pher S. Tang. 2007. Bundling retail products: Mod-
Recommendation Engine for Recruitment Industry.
els and analysis. European Journal of Operational
Procedia Computer Science 122 (2017), 865–872. 5th
Research 177, 2 (2007), 1197–1217.
International Conference on Information Technol-
[14] Niklas. 2022. Average Weight of All Fruits and
ogy and Quantitative Management, ITQM 2017.
Vegetables. https://weightofstuff.com/average-w
[3] Steven Bird, Ewan Klein, and Edward Loper. 2009.
eight-of-all-fruits-and-vegetables/
Natural language processing with Python: analyzing
[15] Sinno Jialin Pan and Qiang Yang. 2010. A Survey on
text with the natural language toolkit. ” O’Reilly
Transfer Learning. IEEE Transactions on Knowledge
Media, Inc.”.
and Data Engineering 22, 10 (2010), 1345–1359.
[4] Dan Chang, Hao Gui, Rui Fan, Ze Fan, and Ji
[16] Michael J. Pazzani and Daniel Billsus. 2007. Content-
Tian. 2019. Application of Improved Collaborative
Based Recommendation Systems. Springer Berlin
Filtering in the Recommendation of E-commerce
Heidelberg, Berlin, Heidelberg, 325–341.
Commodities. INTERNATIONAL JOURNAL OF
[17] Chantal Pellegrini, Ege Özsoy, Monika Wintergerst,
COMPUTERS COMMUNICATIONS CONTROL 14, 4
and Georg Groh. 2021. Exploiting Food Embeddings
(2019), 489–502.
for Ingredient Substitution. In HEALTHINF.
[18] Jeffrey Pennington, Richard Socher, and Christo- _project/
pher D Manning. 2014. Glove: Global vectors for [25] Chun-Yuen Teng, Yu-Ru Lin, and Lada A Adamic.
word representation. In Proceedings of the 2014 con- 2012. Recipe recommendation using ingredient
ference on empirical methods in natural language networks. In Proceedings of the 4th annual ACM
processing (EMNLP). 1532–1543. web science conference. 298–307.
[19] Juan Ramos et al. 2003. Using tf-idf to determine [26] Mayumi Ueda, Syungo Asanuma, Yusuke
word relevance in document queries. In Proceed- Miyawaki, and Shinsuke Nakajima. 2014. Recipe
ings of the first instructional conference on machine recommendation method by considering the users
learning, Vol. 242. Citeseer, 29–48. preference and ingredient quantity of target recipe.
[20] Nils Reimers and Iryna Gurevych. 2019. Sentence- In Proceedings of the international multiconference
BERT: Sentence Embeddings using Siamese BERT- of engineers and computer scientists, Vol. 1. 12–14.
Networks. [27] Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao,
[21] Stephen Robertson and Hugo Zaragoza. 2009. The Nan Yang, and Ming Zhou. 2020. MiniLM: Deep Self-
probabilistic relevance framework: BM25 and beyond. Attention Distillation for Task-Agnostic Compres-
Now Publishers Inc. sion of Pre-Trained Transformers. arXiv:2002.10957
[22] Perez Sarah. 2015. Instacart And Allrecipes Now Let [28] Chutian Wei, Xinyu Chen, Zhenning Tang, and
You Add A Meal’s Ingredients To Your Grocery List Wen Cheng. 2021. Fully content-based IMDb movie
With A Click. https://techcrunch.com/2015/10/12/ recommendation engine with Pearson similarity.
instacart-and-allrecipes-now-let-you-add-a-meals In International Conference on Green Communica-
-ingredients-to-your-grocery-list-with-a-click/ tion, Network, and Internet of Things (GCNIoT 2021),
[23] J. Ben Schafer, Dan Frankowski, Jon Herlocker, and Siting Chen and Jun Mou (Eds.), Vol. 12085. Interna-
Shilad Sen. 2007. Collaborative Filtering Recom- tional Society for Optics and Photonics, SPIE, 132 –
mender Systems. Springer Berlin Heidelberg, Berlin, 137.
Heidelberg, 291–324. [29] Ruiliang Yan, Chris Myers, John Wang, and Sanjoy
[24] Chahak Sethi, Melvin Vellera, Diane Myung kyung Ghose. 2014. Bundling products to success: The
Woodbridge, and Joey Jonghoon Ahnn. 2022. Bun- influence of complementarity and advertising. Jour-
dle Recommender from Recipes to Shopping Cart. nal of Retailing and Consumer Services 21, 1 (2014),
https://github.com/dianewoodbridge/target_recipe 48–53.