Bundle Recommender from Recipes to Shopping Carts -
Optimizing Ingredients, Kitchen Gadgets and their
Quantities
Chahak Sethi1,† , Melvin Vellera1,† , Diane Myung-kyung Woodbridge1 and
Joey Jonghoon Ahnn2
1
    University of San Francisco, San Francisco, California, USA
2
    Target, Sunnyvale, California, USA


                                          Abstract
                                          In this paper, we introduce a recommender system where it automatically captures the context of what users or guests look for
                                          and recommends a bundle of products to be added to their shopping cart. The recommendation system takes selected recipes
                                          from a user as input and recommends a shopping cart with ingredients in optimized quantities as well as any kitchen gadgets
                                          that might be necessary to efficiently prepare the recipes using neural networks. We propose a system architecture, dive
                                          deep into the individual components, and evaluate the performance of information retrieval, semantic search, and quantity
                                          optimization algorithms. Using an ensemble methodology, we attained a mean average precision of over 0.9 for ingredient
                                          and quantity recommendations. The recipe-based bundle recommender system may be used not only to improve the user’s
                                          shopping experiences but also to enable and encourage them to have healthier eating habits, aiming at providing personalized
                                          product recommendations.


1. Introduction                                                                                                   or TV shows, including genre, language, cast, director,
                                                                                                                  and popularity, to recommend them to their users [28].
Recommendation systems provide personalized recom-                                                                   With the COVID-19 pandemic, there has been in-
mendations for the customers using various data from creased need and interest in meal preparation, including
customers and products to enhance customer experience shopping and cooking. This even helped some people
and maximize the conversion rate, significantly contribut- rediscover their liking for meal preparation and discover
ing to revenue growth in the retail industry. In 2020, new recipes. However, in recommending items for cook-
the recommendation system market was valued at 1.77 ing recipes, there are unique challenges, including: (1)
billion US dollars globally with a projected compound Recommending in-stock grocery items with correct quan-
annual growth rate of 33.0% by 2028 [9].                                                                          tity as the recipe uses volume while the product uses
              Recommendation systems generally utilize two cate- weight as a measure. For example, in Figure 1, the recipe
gories of algorithms, including Collaborative filtering- for classic french toast uses four tablespoons of unsalted
based recommendation [23][5] and content-based recom- butter, whereas it is sold in a 16-ounce pack; (2) Recom-
mendation [16] algorithms. Collaborative filtering (CF) mending kitchen gadgets using unstructured text data
relies on the user’s history data and matches a user A with in directions. For example, in the figure 1, the recipe for
a similar user B and recommends A what B liked. Today, classic french toast requires to ”whisk”, which implicitly
most big e-commerce giants with a massive amount of indicates that the user would need a ”whisker”; (3) Op-
data, use collaborative filtering to recommend products timizing quantity for repeating ingredients in multiple
to their customers [4]. Content-based recommendations recipes. For example, if the classic French toast recipe
create a profile to characterize each item. Some of the (Figure 1, Table 1) requires six slices of bread, while an-
industry applications of content-based recommendation other recipe requires eight slices, which indicates a total
systems include a name that suggests jobs to the users by of 1 pack of bread is needed to complete both recipes.
matching their interests and skills with the features of job                                                         In this research, we developed a content-based rec-
postings [2]. IMDb uses the information of the movies ommendation system with natural language processing
                                                                                                                  to solve the three aforementioned sub-problems: (1) the
ORSUM@ACM RecSys 2022: 5th Workshop on Online Recommender developed system parses ingredients and kitchen gadgets
Systems and User Modeling, jointly with the 16th ACM Conference on in the recipes corpus and recommends the most similar
Recommender Systems, September 23rd, 2022, Seattle, WA, USA
†                                                                                                                 products from our product catalog database; (2) it parses
    The authors contributed equally.
                                                                                                                  unstructured text in the recipe direction section to rec-
Envelope-Open csethi2@usfca.edu (C. Sethi); mvellera@usfca.edu (M. Vellera);
dwoodbridge@usfca.edu (D. M. Woodbridge);                                                                         ommend the kitchen gadgets required to cook a recipe;
joey.ahnn@target.com (J. J. Ahnn)                                                                                 (3) Finally, it optimizes the number of products and bun-
                     © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License
                     Attribution 4.0 International (CC BY 4.0).                                                   dles the recommendations together as a set of items that
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
Figure 1: Text of an Example Recipe’s (Classic French Toast) and Recommended Ingredients and Kitchen Gadgets


Table 1                                                             present in this field in Section 2. Next, we provide the
Shopping Cart Recommendation - French Toast                         overview of the developed system, including mapping
   Product                                        Qty   Price ($)   kitchen gadgets and ingredients in the recipes to relevant
   Hood Heavy Cream - 1pt                         1     3.49        products from the product catalog database, optimizing
   McCormick Ground Cinnamon - 2.37oz             1     1.99
   Arnold Stone Ground Wheat Bread - 16oz         1     2.59
                                                                    the quantities to be recommended, and providing mul-
   Grade A Large Eggs - 12ct - Good & Gather™     1     1.69        tiple candidates and their corresponding ranking to the
   Iodized Salt - 26oz - Good & Gather™           1     0.49        user in Section 3. We have employed a few techniques
   Tillamook Unsalted Butter Spread - 16oz        1     4.19
   Squish 1.5qt Mixing Bowl                       1     8.99
                                                                    to measure the performance of the system which will
   9” Whisk Stainless Steel                       1     6.00        be discussed in Section 4. We summarize our work with
   Westinghouse Cast Iron Seasoned Skillet 6.5-   1     23.50       some future works section 5. The authors make the codes
   inch
                                                                    used for the research available to the public [24].


customers can purchase individually or as a bundle, a               2. Related Work
group of complementary items that can be purchased
together [13] [29].                                                 In late 2015, an American company that operates a gro-
   Our research offers the convenience of personally cu-            cery delivery and pick-up service in the United States
rated shopping experiences to users by reducing their               and Canada, integrated with AllRecipes, a top recipe site,
shopping efforts to find out right products as a bundle.            to allow users to select a recipe and fill their cart with all
The system helps users without any prior experience                 the necessary ingredients [22]. Although the ingredient
in cooking easily purchase the required ingredients and             recommendations provided by a e-commerce company
kitchen gadgets. An automatic quantity optimization                 are accurate for a good number of recipes, we observed
will reduce wastage of resources and optimize costs for             that the recommended quantities were not ideal for cer-
the users by suggesting the correct quantity of the prod-           tain recipes. There were also cases where no matching
uct. We believe that this approach leads to increased               product was found for a recipe ingredient, such as veg-
user basket sizes, eventually raising the business revenue.         etable oil, even though there are other closely related
Furthermore, it can also aid in automated promotional               products, including olive oil and canola oil. In addition
emails to the consumers with all the ingredients and gad-           to these limitations, we also identified an opportunity for
gets in a recipe rather than the hand-curated contents              augmenting ingredient recommendations with kitchen
which we find time-consuming.                                       gadget recommendations that could help improve a user’s
   In this paper we discuss the related work already                cooking experience, especially those new to cooking. Our
work has been primarily motivated by these use cases,         corpus (millions) while still being certain that recommen-
and in this paper, we propose a methodology that gen-         dations are personalized and engaging for the user. Our
erates accurate ingredient recommendations as well as         research employed a two-stage approach for candidate
kitchen gadget recommendations.                               generation (retriever stage) and ranking (re-ranker stage)
   To our knowledge, there has not been any notable re-       [7].
search regarding the recommendations of an optimized
shopping cart of ingredients and kitchen gadgets based
on recipes. We find that most of the existing literature in   3. System Overview
the food domain is related to recommending recipes or in-
                                                              The proposed system first takes a recipe as input from
gredient substitutes. [25, 26, 11, 17]. Anirudh Jagithyala
                                                              a guest shopping on an e-commerce company’s website.
[11] developed a recommendation system that recom-
                                                              The recipe then gets split into two sections: 1) cooking
mends recipes based on recipe ratings, ingredients, and
                                                              instructions and 2) ingredients. The cooking instructions
review text. A number of approaches including memory-
                                                              are parsed in order to detect and extract any kitchen
based collaborative filtering and TF-IDF were tried along
                                                              gadgets that might be required for the recipe, while the
with similarity measures such as cosine and Pearson cor-
                                                              required ingredients and quantities are extracted from
relation. The research evaluated multiple approaches us-
                                                              the recipe’s ingredients section. The quantities and units
ing the mean average precision (mAP) and showed that
                                                              of ingredients in the recipe text and the product catalog
collaborative filtering on recipe ratings performed better.
                                                              database are standardized for accurately matching vary-
Chantal Pellegrini et al. [17] explored the use of text and
                                                              ing units from recipes and products. The ingredients and
image embeddings for identifying ingredient substitutes.
                                                              kitchen gadgets required by the recipe are then fed into
They generated context-free embeddings using word2vec
                                                              the recommender system to search for the best match-
as well as context-based embeddings using transformer-
                                                              ing products from the product catalog database based
based models. In the end, the research showed that the
                                                              on textual and quantity information. The system then
transformer-based multi-modal approach using text and
                                                              adds these products to the shopping cart of the guest
image embeddings together gave the best results with a
                                                              (Figure 2).
precision of 0.84 for the top 1000 most common ingre-
                                                                 For advanced natural language processing (NLP), we
dients. Chun-Yuen Teng et al. [25] explored the recom-
                                                              utilized Open-source software libraries, including Spacy
mendation of recipes and ingredient substitutes using
                                                              [10] and NLTK [3] for extracting recipe ingredients and
network structures. The system identifies ingredient sub-
                                                              kitchen gadgets from the recipe text. The ingredients,
stitutes using a graph structure where nodes represent in-
                                                              along with the required quantities, were parsed from the
gredients and edges represent the degree of substitutabil-
                                                              ingredient section of the recipe (Figure 1) using regular
ity. To derive pairs of related recipes, they computed the
                                                              expressions, after which these ingredients were then pre-
cosine similarity between the ingredient lists for the two
                                                              processed using the NLTK library for stop words removal,
recipes, weighted by the inverse document frequency.
                                                              stemming, and 𝑛-gram expansion. The Spacy library was
Mayumi Ueda et al. [26] applied user preferences and
                                                              used for extracting kitchen gadgets from the recipe in-
ingredient quantity for recommending recipes. Their
                                                              structions using named entity recognition (NER). Once
method breaks recipes down into their ingredients and
                                                              the ingredients and kitchen gadgets of the recipe are
scores them based on the ingredients’ frequency of use
                                                              identified, they are compared against the products in the
and specificity.
                                                              product catalog database to get the most relevant prod-
   Some of the existing algorithms for searching and rank-
                                                              uct in stock for each ingredient and user. This process
ing relevant items use classical information retrieval al-
                                                              also involved the novel usage of a language representa-
gorithms such as TF-IDF [19], BM-25 [21], or Glove [18],
                                                              tion model, Bidirectional Encoder Representations from
while others make use of deep learning models such as
                                                              Transformers (BERT) [8], which is designed to pre-train
BERT [8]. BERT [8] is a language representation model
                                                              deep bidirectional representations from an unlabeled text
that can give accurate contextual embeddings for words
                                                              by jointly conditioning on both left and right contexts in
in most cases. Unfortunately, the BERT does not gener-
                                                              all layers. The advantage of using a pre-trained architec-
ally give accurate representations for sentences, and the
                                                              ture is that we can use transfer learning to transfer the
construction of BERT makes it unsuitable for semantic
                                                              already trained features to the current data without the
similarity search. To overcome these issues, we applied
                                                              complexity of training heavy machine learning models
the Sentence-BERT, [20] model, which was trained using
                                                              [15].
Siamese BERT-Networks.
                                                                 We also developed an algorithm to recommend the
   The preferred recommendation system architecture
                                                              optimal number of products in case a guest chooses more
was one based a two-stage approach consisting of can-
                                                              than one recipe using the same products. The quantity
didate generation and ranking stages. This two-stage
                                                              or weight of the common ingredients is recommended
approach allows for recommendations from a very large
Figure 2: System Overview


Figure 3: Ingredient Recommendation Workflow


based on the sum of the required amount, which helps          search time, the algorithm embeds a recipe ingredient
create the optimal baskets for the guests and leads to less   into the same vector space and calculates the similarity
wastage of resources.                                         between an ingredient (𝐼) and product (𝑃) to find the most
                                                              relevant products. These products would have a high
3.1. Ingredient Recommendation                                semantic overlap with the ingredient. We used cosine
                                                              similarity in Equation 1 to find the closest embeddings
The developed system utilizes a combination of semantic       from the corpus.
search and information retrieval algorithms to recom-
                                                                                                  𝑒
mend the most relevant products for the ingredients in                              𝐼 ⋅𝑃        ∑𝑗=1 𝐼𝑗 ⋅ 𝑃𝑗
a recipe. Semantic search can improve search accuracy                 𝑆𝑖𝑚(𝐼 , 𝑃) =         =                        (1)
                                                                                   ‖𝐼 ‖‖𝑃‖    𝑒
                                                                                            ∑ 𝐼 2 ∑ 𝑃2
                                                                                                        𝑒
by understanding the context and content of the search                                    √ 𝑗=1 𝑗 √ 𝑗=1 𝑗
query. In contrast to conventional search algorithms,         , where 𝑒 is the embedding dimension of the ingredient
which only find documents based on lexical matches, se-       and product vectors.
mantic search can also find synonyms. Semantic search            After computing the cosine similarity between the
aims to embed all entries in the corpus into vector space,    embedding of an ingredient and the embeddings of the
where the query is embedded into the same vector space,       products, the top 𝑚 products are retrieved.
to find the closest embeddings from the corpus. In our           For complex search tasks, the search can be signifi-
case, a recipe ingredient is a query, and the products are    cantly improved by using a retrieve and re-rank frame-
all the entries in the corpus, where the products are em-     work, where the top 𝑡 products are retrieved efficiently,
bedded into vector space and stored separately. During        followed by a re-ranker that ranks these 𝑡 products and
recommends 𝑚 products.                                                time of inference [6]. A bi-encoder model can encode
                                                                      an input text, such as a recipe ingredient or a product,
3.1.1. Retriever                                                      and output a vector (embedding) that captures seman-
                                                                      tic information. If a recipe ingredient embedding and a
Given an ingredient, we first use a retrieval system that product embedding are similar, then the cosine similarity
quickly retrieves 𝑡 products that are potentially relevant (Equation 1) between these two embeddings will be high.
for the given ingredient. Then the 𝑡 products are re- Hence, by comparing an ingredient embedding with all
ranked, and the top 𝑚 matches are sent through to the the product embeddings using cosine similarity, we can
quantity recommendation module. For our retrieval identify the most similar products for an ingredient.
system, we make use of the BM25 algorithm for lexi-                      In order to reduce the search space efficiently, hierar-
cal search [21]. For an ingredient query (𝐼) with terms chical classification models were created for the following
𝑖1 , … , 𝑖𝑛 , the BM25 score for a product text 𝑃 is in Equa- levels: class, subclass, and item-type. These models were
tion 2.                                                               trained using the preprocessed text of the products as
                                                                      feature vectors and the respective hierarchical level val-
                      𝑛                𝑡𝑓 (𝑖𝑗 , 𝑃) ⋅ (𝑐 + 1)          ues as the target labels. A softmax activation function
    𝐵𝑀25(𝑃, 𝐼 ) = ∑ 𝐼 𝐷𝐹 (𝑖𝑗 )                                        (Equation  4) was used in the final layer of the multi-class
                                                             |𝐷|
                     𝑗=1       𝑡𝑓 (𝑖𝑗 ) + 𝑐 ⋅ (1 − 𝑏 + 𝑏 ⋅ 𝑑 )        classification models, along with the cross-entropy loss
                                                              𝑎𝑣𝑔
                                                                  (2) function (Equation 5).
, where 𝑡𝑓 (𝑖𝑗 , 𝑃) is the number of times term 𝑖𝑗 of the
                                                                                                        exp(𝑧𝑖 )
ingredient text occurs in 𝑃. |𝐷| is the number of words in                           Softmax(𝑧𝑖 ) = 𝐶                          (4)
𝑃. 𝑑𝑎𝑣𝑔 is the average number of words for product text.                                             ∑ 𝑗=1 exp(𝑧𝑗 )
𝑏 and 𝑐 are saturation parameters for document length
                                                                      , where 𝐶 is the number of output classes, 𝑧𝑖 is an element
and term frequency respectively. In general, values such
                                                                      of a vector 𝑧 of size 𝐶 corresponding to a particular class.
as 0.5 < 𝑏 < 0.8 and 1.2 < 𝑐 < 2 are reasonably good in
many circumstances [21].
     Equation 3 describes inverse document frequency                                                   1
                                                                                                            𝑁
(𝐼 𝐷𝐹 (𝑖𝑗 )) for a corpus with 𝑁 products with term 𝑖𝑗 .                      Cross Entropy Loss = − (∑ 𝑦𝑖 ⋅ 𝑙𝑜𝑔(𝑦𝑖̂ ))        (5)
                                                                                                       𝑁 𝑖=1
                                  𝑁 − 𝑁 (𝑖𝑗 ) + 0.5
               𝐼 𝐷𝐹 (𝑖𝑗 ) = 𝑙𝑜𝑔                             (3)   , where 𝑁 is the number of observations, 𝑦𝑖 is the true
                                    𝑁 (𝑖𝑗 ) + 0.5                 label vector and 𝑦𝑖̂ is the predicted label probability vector.
, where 𝑁 (𝑖𝑗 ) is the number of product texts in the
database that contain the term 𝑖𝑗 of the ingredient query.        3.1.2. Re-ranker
   For improving the accuracy of the retrieval stage, we
                                                                  After retrieving the top 𝑡 products, the re-ranker stage
combined the BM25 algorithm with a bi-encoder sentence
                                                                  ranks the products more accurately using a cross-encoder
transformer model that was fine-tuned using Microsoft’s
                                                                  sentence transformer model that was fine tuned using
MiniLM model [27]. The MiniLM model is a compressed
                                                                  the Microsoft’s MS Marco dataset [1], which is a large
Transformer model that uses an approach termed as deep
                                                                  scale information retrieval corpus that was created based
self-attention distillation to reduce the number of parame-
                                                                  on real user search queries using Bing search engine. In
ters required by a transformer model. It is twice as fast as
                                                                  contrast to a bi-encoder model that performs two inde-
BERT while retaining more than 99% accuracy on SQuAD
                                                                  pendent self-attentions for the query and the document,
2.0 and several GLUE benchmark tasks using only 50%
                                                                  a cross-encoder model performs full self-attention across
of BERT’s model parameters. The bi-encoder sentence
                                                                  the entire query-document pair. As a result, the cross-
transformer model [20] that uses the MiniLM model was
                                                                  encoder can model the interaction between a query and
trained using a dataset of 1 billion sentence pairs and a
                                                                  a document, and the resulting representations contain
self-supervised contrastive learning objective: given a
                                                                  contextualized embeddings [6]. In our use case, an in-
sentence from a sentence pair, the model should predict
                                                                  gredient and a product are passed simultaneously to the
which out of a set of randomly sampled other sentences
                                                                  cross-encoder, which then outputs a score indicating how
was actually paired with one in the dataset. A bi-encoder
                                                                  relevant the product is for the given ingredient. As the
model performs two independent self-attentions for the
                                                                  cross-encoder models the interaction between an ingre-
query and the document, and the document is mapped
                                                                  dient and a product during inference time, it is slower
to a fixed BERT representation regardless of the choice
                                                                  than the bi-encoder, and hence, it can only be used for
of a query. This makes it possible for bi-encoder models
                                                                  a small subset of products. However, we can achieve
to pre-compute document representations offline, signifi-
                                                                  a higher accuracy as they perform attention across the
cantly reducing the computational load per query at the
                                                                  query and the document. After the re-ranker stage, the
Table 2                                                          of 1 lb rather than two packs of 0.5 lb flour if 1 lb-pack is
Commonly used units in recipes and corresponding conversion      more cost-efficient.
to the International System of Units (SI)                           If a user selects multiple recipes, the quantity is opti-
      Common Unit     Converted Unit in                          mized such that minimum units of the common products
      in Recipe       International System of Units (SI)         are recommended. For example, for two recipes using
      1 cup           225 ml
                                                                 one tablespoon and two tablespoons of salt each, the rec-
      1 teaspoon      5 ml
      1 tablespoon    15 ml                                      ommended unit of salt can/bottle will be optimized to
      1 fluid ounce   30 ml                                      one to reduce waste and cost.

                                                                 3.3. Kitchen Gadget Recommendation
top 𝑚 matched ingredients are then optimized based on
quantity.                                                        The recommendations for kitchen gadgets follow a simi-
                                                                 lar approach to ingredient recommendations using a com-
3.2. Unit Normalization and Optimization                         bination of semantic search and information retrieval al-
                                                                 gorithms. The required kitchen gadget is implicitly men-
Once the algorithm selects the top 𝑚 matched ingredients,        tioned in unstructured recipe instruction text, whereas
the next important step is to recommend the optimal              ingredients are explicitly listed in the ingredient sec-
quantity of the product needed in the recipe (Figure 4).         tion. To identify the kitchen tools and methods post
For this, we start with retrieving the ingredient quantity       the pre-processing of the recipe instructions, a custom
specified in the recipe and normalize the units required to      NER model from Spacy [10] was trained on these entities.
a standard SI units (Table 2), including tablespoon (tbsp),      The NER model identifies the kitchen gadgets (nouns)
teaspoon (tsp), milliliter (ml), cup, count, pound (lb), and     used in the recipe and the methods (verbs) that can iden-
ounce (oz). These standard quantities are either weight          tify a kitchen gadget. For example, for ”Chop the garlic
or in volumes handled differently from each other.               and add to the pan”, a pan will be identified as a gadget,
   As product descriptions generally utilize weight as a         whereas chop will be identified as a method associated
measure while most recipes use volumes, we converted             with the gadgets including a knife and chopping board.
the volume with the standardized unit to weight using               Similar to ingredient recommendations, products are
density (𝑑). For instance, the weight (𝑤) for 𝑛 cups of a        embedded into vector space and stored separately. Dur-
grocery product in the recipe, where 1 cup is 225 ml, can        ing search time, a gadget from the recipe instruction is
be calculated as the following.                                  also embedded into the same vector space, and the system
                                                                 searches for the most relevant products. These products
                      𝑤 = 𝑛 ∗ 225 ∗ 𝑑                      (6)   would have a high semantic overlap with kitchen gad-
                                                                 gets. After computing the cosine similarity between the
   Once the required weight is calculated, the system com-
                                                                 embedding of the kitchen gadget and the embeddings
pares it against the weight of the recommended products
                                                                 of all products, the top 𝑚 products with the maximum
in product catalog’s database. The recommended num-
                                                                 similarity are retrieved. For these search tasks, we used
ber of units is then calculated for each matched product
                                                                 embeddings from the RoBERTa [12], an improved and
using Equation 7, where 𝑞 is the recommended quantity,
                                                                 robustly trained version of BERT with further tuned hy-
𝑤 is the ounces required in the recipe, and 𝑝 is the ounces
                                                                 perparameters. RoBERTa has achieved state-of-the-art
sold or packaged.
                                                                 results on GLUE, RACE, and SQuAD.
                            𝑤                                       For complex search tasks, the search is significantly
                         𝑞=⌈ ⌉                             (7)
                            𝑝                                    improved by using a retrieve (Section 3.1.1 ) and re-rank
   For fresh produce items that use a count as a unit in         (Section 3.1.1) framework, just like for the ingredients
a recipe, like two onions or three potatoes, the average         (Figure 5). For quantity optimization, if the user selects
weight of the given fruits and vegetables is used to con-        multiple recipes, the common kitchen gadgets are only
vert it to weight [14]. The reverse conversion is also           recommended once, along with all the other gadgets used
applied if the unit for a product at e-commerce compa-           in each recipe.
nies uses count and the recipe specifies weight instead.
   Once the recommended quantity is known for 𝑚 4. Result
matched ingredients, we sort 𝑚 ingredients by the quan-
tity recommended and the price in ascending order. In For assessing the performance of different algorithms,
addition, the system recommends lower-priced items if we identified the relevant products for the top 100 most
multiple packaging options are available for the same common ingredients and kitchen gadgets (queries) and
products. For instance, the system recommends one pack
Figure 4: Quantity Recommendation Flow


Figure 5: Kitchen Gadget Recommendations


calculated the mean average precision (𝑚𝐴𝑃) at different 4.1. Ingredient Search
values of 𝐾, where 𝐾 is the number of retrieved products.
                                                            For ingredient search, different algorithms, as well as an
                              𝑄                             ensemble of these algorithms, were evaluated using the
                            ∑𝑞=1 𝐴𝑃@𝐾 (𝑞)
               𝑚𝐴𝑃@𝐾 =                                  (8) 𝑚𝐴𝑃@𝐾 metric. The BM25 algorithm was considered
                                    𝑄                       as the baseline model against more complex models. An
, where 𝑄 is the number of queries, 𝐾 is the number of interesting thing to note from Figure 6. is that the BM25
retrieved products, and 𝐴𝑃@𝐾 is the average precision algorithm gives a very good performance for 𝐾=1 since
at 𝐾.                                                       lexical search algorithms generally have high precision
                                                            due to exact keyword matching. However, for higher val-
                                  𝐾
                          1                                 ues of 𝐾, the drop in mean average precision is quite high.
           𝐴𝑃@𝐾 =                ∑ 𝑃(𝑘) ⋅ 𝑟𝑒𝑙(𝑘)        (9) The experiment results showed that transformer models
                      𝑚𝑖𝑛(𝐾 , 𝑅) 𝑘=1
                                                            such as MiniLM and MS Marco are more general with
, where 𝑅 is the number of relevant products, 𝐾 is the consistently high precision values across different 𝐾 val-
number of retrieved products, 𝑟𝑒𝑙(𝑘) is an indicator that ues. Using BM25 along with the MS Marco transformer
says whether that 𝑘 𝑡ℎ item was relevant (𝑟𝑒𝑙(𝑘)=1) or not model gave the best performance for all 𝐾 values.
(𝑟𝑒𝑙(𝑘)=0), and 𝑃(𝑘) is the precision at 𝑘.                    Once the best performing model was identified, we
                               𝑘                            further evaluated the final model with 100 recipes from
                             ∑ 𝑟𝑒𝑙(𝑖)
                    𝑃(𝑘) = 𝑖=1                        (10) the Recipe1M+ corpus, which is a large-scale, structured
                                 𝑘
Figure 6: Mean Average Precision of Ingredient Search for Different 𝐾


Table 3                                                            4.3. Kitchen Gadget
Quantity Recommendation Accuracy Measurement
                                                                   The NER model to identify kitchen gadgets was trained
   Ingredient   Total recipes   Correct Qty   Percentage Correct
   Salt         32,506          32,443        99.806%
                                                                   on 500 recipes and tested on 100 recipes. The custom
   Sugar        19,983          16,473        82.435%              entities’ kitchen gadgets and methods were marked with
   Butter       18,958          14,875        78.463%              their position in the text using regex for these 600 recipes.
                                                                   A manual review of the annotations was performed to
                                                                   update any incorrect or missed annotations, and we
corpus of over one million cooking recipes and 13 million          achieved a test F1 score of 99.627% (Equation 11).
food images. The 𝑚𝐴𝑃@1 value for all the ingredients
from these 100 recipes was 0.949, which is similar to                                      2 ⋅ 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ⋅ 𝑟𝑒𝑐𝑎𝑙𝑙
the 𝑚𝐴𝑃@𝐾 values we see in Figure 6 for the top 100                                 𝐹1 =                                   (11)
                                                                                            𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙
ingredients.
                                                                   , where precision is out of the total documents retrieved,
                                                                   how many are relevant and recall is out of the total rele-
4.2. Quantity Normalization and                                    vant documents how many relevant documents are re-
     Optimization                                                  trieved.
                                                                      A custom mapping is used to convert cooking methods
For evaluating quantity normalization, we measured the             to kitchen gadgets which along with other identified gad-
accuracy of quantity normalization, using the most com-            gets form search queries for the recipe. For evaluation,
monly used ingredients in the randomly chosen 100,000              different algorithms were evaluated at mAP@K, similar
recipes. The three most commonly used ingredients, salt,           to ingredient search. We developed transformer models
sugar, and butter, are tracked to measure if the recom-            including MiniLM, MS Marco, and Roberta for the se-
mended quantity after unit normalization is accurate or            mantic search tasks. The experiment results show that
not.                                                               MS Marco, a combination of retriever and a re-ranker,
   We found that salt is used in 32,506 out of the chosen          performs consistently better than MiniLM and Roberta
100,000 recipes ( Table 3). Out of the total 32,506 recipes,       across all 𝐾 (Figure 7).
the quantity of salt is correctly recommended in 32,443               As an example, we present a recipe in Figure 8 that
recipes which is 99.806% of the total. Similarly, the accu-        consists of two sections: ingredients and directions. The
racies of the recommended quantity were 82.435% and                ingredients section is used for ingredient and quantity
78.463% for sugar and butter respectively. However, the            recommendations, whereas the directions section is used
relatively low accuracy in quantity recommendations for            for kitchen gadget recommendations. The keywords used
sugar and butter is due to the incorrect recipe text. For          for kitchen gadget recommendations are highlighted in
example, certain recipes say 34 cups of butter instead of          red.
3/4 cups of butter. This has been further discussed in                The product recommendations based on the recipe
Section 5.                                                         (Figure 8) is given below in Table 4.
   Further, the quantity optimization process was also
evaluated with randomly chosen 100 recipes from the
Recipe1M+ corpus. The mAP@1 value for all the ingredi- 5. Conclusion
ents from these 100 recipes was 0.914. For a combination
of recipes, the result has been manually verified for 5 We presented a content-based recommendation system
random sets of any two recipes.                              to recommend relevant retail products to a user or guest
                                                             based on selected recipe contents. The recommended
                                                             products are primarily based on the ingredients required
Figure 7: Mean Average Precision of Kitchen Gadget Search for Different 𝑘

                        Ingredients                 Directions
                                 1/2 cupoats
                                                        1. In a medium bowl combine rolled oats and water, let stand 5
                                  2 cupswater
                                                           minutes.
                                  2 cupspancake mix
                                 1/2 cupapples          2. Meanwhile, heat large nonstick skillet or griddle to medium
                          2 tablespoonssugar               high heat (375F).
                           1/2 teaspooncinnamon         3. Grease lightly with oil. Add remaining ingredients to rolled
                                                           oats mixture; stir just until all ingredients are moistened.
                                                        4. For each pancake, pour 1/4 cup batter into hot skillet.
                                                        5. Cook 1 to 1.5 minutes, turning when edges look cooked and
                                                           bubbles begin to break on surface.
                                                        6. Continue to cook 1 to 1.5 minutes or until golden brown.
                                                        7. Serve with syrup and butter, if desired.
Figure 8: Apple Oat Breakfast Pancakes


Table 4                                                      assessing the top product recommendations for the 100
Shopping Cart Recommendation                                 most frequently occurring ingredients in the 1M+ recipe
   Product                                     Qty Price ($) corpus. For ingredient search, the ensemble approach of
   Red Delicious Apple                         1   0.99      BM25 and MS Marco gave the best performance, while
   McCormick Ground Cinnamon - 2.37oz          1   1.99
   Good & Gather Organic Oats - 18oz           1   2.39
                                                             for kitchen gadget search, the MiniLM model proved to
   Buttermilk Pancake Mix - 32oz               1   2.19      be more accurate. The accuracy of quantity recommenda-
   Imperial Granulated Pure Sugar - 4lb        1   2.19      tions was measured by evaluating certain high-frequency
   Good & Gather Alkaline Water - 1L           1   0.99
   Squish 1.5qt Mixing Bowl - Green            1   8.99
                                                             ingredients such as salt, sugar, and butter across more
   Anchor 8oz Glass Measuring Cup              1   3.49      than 50,000 recipes from the 1 million+ (1M+) recipe
   Nylon Ladle with Soft Grip - Made By Design 1   3.00      corpus.
   Lakeside 10” Nonstick Aluminum Skillet with 1   16.47
   Faux Granite finish
                                                                There were a few challenges that we faced while im-
                                                             plementing the system that we hope to address in the
                                                             near future. Firstly, the recipe text used in the 1M+ recipe
by the recipe, which are extracted from the recipe text corpus has some inconsistencies where the quantity re-
along with associated quantities. More significantly, the quired is not accurately defined. For example, a lot of
system is also capable of recommending the relevant recipes say 34 cups of an ingredient instead of 3/4 cups. In
kitchen gadgets that might be required for making the future work, we can work on parsing the text with more
recipe based on the instructions provided in the recipe. scrutiny and rules to avoid such cases. Alternatively,
When a user selects multiple recipes, the recommender we could also build an entirely new recipe scraping al-
system optimizes quantities for each product, where the gorithm to handle such cases. Secondly, the quantity
quantities are adjusted according to the amounts of the optimization for multiple recipes is in place, but there is
common ingredients and kitchen gadgets present in the no formal way to measure how well is the performance
recipes.                                                     of the process. As a part of future work, we would devise
   We conducted experiments for evaluating the effec- a way to quantify the performance of this process.
tiveness of various algorithms for the recommendation           We also identified extra features for the current sys-
system, such as BM25, MiniLM, MS Marco, and RoBERTa. tem. First, the dietary restrictions of a user can be ac-
In our experiments, we compared these algorithms by counted for by considering the ingredients, nutritional
information, and key allergens that might be present in      [5] Lin Chen, Rui Li, Yige Liu, Ruixuan Zhang, and
the product using natural language processing. Second,           Diane Myung-kyung Woodbridge. 2017. Ma-
there could be preference filters provided to users before       chine learning-based product recommendation us-
they add a recipe to the shopping cart that could consider       ing Apache Spark. In 2017 IEEE SmartWorld, Ubiq-
their dietary preferences such as whether they prefer            uitous Intelligence Computing, Advanced Trusted
non-GMO or organic products only. Third, instead of              Computed, Scalable Computing Communications,
asking for preferences explicitly, the preferences of the        Cloud Big Data Computing, Internet of People
users could be determined automatically by analyzing             and Smart City Innovation (SmartWorld/SCAL-
their past shopping behaviors such as clicks, views, or          COM/UIC/ATC/CBDCom/IOP/SCI). 1–6. https:
product purchases. Based on this implicit feedback, we           //doi.org/10.1109/UIC-ATC.2017.8397470
could determine the user’s preferences, such as whether      [6] Jaekeol Choi, Euna Jung, Jangwon Suh, and Won-
the user is a vegetarian or if the user is currently pur-        jong Rhee. 2021. Improving Bi-encoder Docu-
chasing products specific to a particular diet, such as the      ment Ranking Models with Two Rankers and Multi-
ketogenic diet. Such information can then be used in the         teacher Distillation. In Proceedings of the 44th In-
recommendation system for personalizing recommended              ternational ACM SIGIR Conference on Research and
products for different users.                                    Development in Information Retrieval. ACM.
   The content-based recommendation system proposed          [7] Paul Covington, Jay Adams, and Emre Sargin. 2016.
in this paper could potentially be used to improve the           Deep neural networks for youtube recommenda-
shopping experience of users by providing them options           tions. In Proceedings of the 10th ACM conference on
to add all the necessary products required by a recipe           recommender systems. 191–198.
automatically to their shopping cart. Kitchen gadget rec- [8] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and
ommendations could be helpful not just for improving             Kristina Toutanova. 2018. BERT: Pre-training of
the shopping experiences of the users but also for their         Deep Bidirectional Transformers for Language Un-
cooking experiences. These recommendations also al-              derstanding.
low a business to increase the basket sizes of their users, [9] Grand View Research. 2021. Recommendation En-
thereby increasing revenue. Moreover, the system could           gine Market Size, Share & Trends Analysis. Recom-
also aid in automated promotional emails that recom-             mendation Engine Market Report (2021).
mend products required by recipes that may be of interest [10] Matthew Honnibal and Ines Montani. 2017. spaCy
to customers.                                                    2: Natural language understanding with Bloom em-
                                                                 beddings, convolutional neural networks and incre-
                                                                 mental parsing. (2017). To appear.
References                                                  [11] Anirudh Jagithyala. 2014. Recommending recipes
                                                                 based on ingredients and user reviews. Ph. D. Disser-
 [1] Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng,
                                                                 tation. Kansas State University.
      Jianfeng Gao, Xiaodong Liu, Rangan Majumder, An-
                                                            [12] Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du,
      drew McNamara, Bhaskar Mitra, Tri Nguyen, Mir
                                                                 Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis,
      Rosenberg, Xia Song, Alina Stoica, Saurabh Tiwary,
                                                                 Luke Zettlemoyer, and Veselin Stoyanov. 2019.
      and Tong Wang. 2016. MS MARCO: A Human Gen-
                                                                 RoBERTa: A Robustly Optimized BERT Pretraining
      erated MAchine Reading COmprehension Dataset.
                                                                 Approach.
 [2] Shivam Bansal, Aman Srivastava, and Anuja Arora.
                                                            [13] Kevin F. McCardle, Kumar Rajaram, and Christo-
      2017. Topic Modeling Driven Content Based Jobs
                                                                 pher S. Tang. 2007. Bundling retail products: Mod-
      Recommendation Engine for Recruitment Industry.
                                                                 els and analysis. European Journal of Operational
      Procedia Computer Science 122 (2017), 865–872. 5th
                                                                 Research 177, 2 (2007), 1197–1217.
      International Conference on Information Technol-
                                                            [14] Niklas. 2022. Average Weight of All Fruits and
      ogy and Quantitative Management, ITQM 2017.
                                                                 Vegetables. https://weightofstuff.com/average-w
 [3] Steven Bird, Ewan Klein, and Edward Loper. 2009.
                                                                 eight-of-all-fruits-and-vegetables/
      Natural language processing with Python: analyzing
                                                            [15] Sinno Jialin Pan and Qiang Yang. 2010. A Survey on
      text with the natural language toolkit. ” O’Reilly
                                                                 Transfer Learning. IEEE Transactions on Knowledge
      Media, Inc.”.
                                                                 and Data Engineering 22, 10 (2010), 1345–1359.
 [4] Dan Chang, Hao Gui, Rui Fan, Ze Fan, and Ji
                                                            [16] Michael J. Pazzani and Daniel Billsus. 2007. Content-
      Tian. 2019. Application of Improved Collaborative
                                                                 Based Recommendation Systems. Springer Berlin
      Filtering in the Recommendation of E-commerce
                                                                 Heidelberg, Berlin, Heidelberg, 325–341.
      Commodities. INTERNATIONAL JOURNAL OF
                                                            [17] Chantal Pellegrini, Ege Özsoy, Monika Wintergerst,
      COMPUTERS COMMUNICATIONS CONTROL 14, 4
                                                                 and Georg Groh. 2021. Exploiting Food Embeddings
      (2019), 489–502.
                                                                 for Ingredient Substitution. In HEALTHINF.
[18] Jeffrey Pennington, Richard Socher, and Christo-             _project/
     pher D Manning. 2014. Glove: Global vectors for         [25] Chun-Yuen Teng, Yu-Ru Lin, and Lada A Adamic.
     word representation. In Proceedings of the 2014 con-         2012. Recipe recommendation using ingredient
     ference on empirical methods in natural language             networks. In Proceedings of the 4th annual ACM
     processing (EMNLP). 1532–1543.                               web science conference. 298–307.
[19] Juan Ramos et al. 2003. Using tf-idf to determine       [26] Mayumi Ueda, Syungo Asanuma, Yusuke
     word relevance in document queries. In Proceed-              Miyawaki, and Shinsuke Nakajima. 2014. Recipe
     ings of the first instructional conference on machine        recommendation method by considering the users
     learning, Vol. 242. Citeseer, 29–48.                         preference and ingredient quantity of target recipe.
[20] Nils Reimers and Iryna Gurevych. 2019. Sentence-             In Proceedings of the international multiconference
     BERT: Sentence Embeddings using Siamese BERT-                of engineers and computer scientists, Vol. 1. 12–14.
     Networks.                                               [27] Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao,
[21] Stephen Robertson and Hugo Zaragoza. 2009. The               Nan Yang, and Ming Zhou. 2020. MiniLM: Deep Self-
     probabilistic relevance framework: BM25 and beyond.          Attention Distillation for Task-Agnostic Compres-
     Now Publishers Inc.                                          sion of Pre-Trained Transformers. arXiv:2002.10957
[22] Perez Sarah. 2015. Instacart And Allrecipes Now Let     [28] Chutian Wei, Xinyu Chen, Zhenning Tang, and
     You Add A Meal’s Ingredients To Your Grocery List            Wen Cheng. 2021. Fully content-based IMDb movie
     With A Click. https://techcrunch.com/2015/10/12/             recommendation engine with Pearson similarity.
     instacart-and-allrecipes-now-let-you-add-a-meals             In International Conference on Green Communica-
     -ingredients-to-your-grocery-list-with-a-click/              tion, Network, and Internet of Things (GCNIoT 2021),
[23] J. Ben Schafer, Dan Frankowski, Jon Herlocker, and           Siting Chen and Jun Mou (Eds.), Vol. 12085. Interna-
     Shilad Sen. 2007. Collaborative Filtering Recom-             tional Society for Optics and Photonics, SPIE, 132 –
     mender Systems. Springer Berlin Heidelberg, Berlin,          137.
     Heidelberg, 291–324.                                    [29] Ruiliang Yan, Chris Myers, John Wang, and Sanjoy
[24] Chahak Sethi, Melvin Vellera, Diane Myung kyung              Ghose. 2014. Bundling products to success: The
     Woodbridge, and Joey Jonghoon Ahnn. 2022. Bun-               influence of complementarity and advertising. Jour-
     dle Recommender from Recipes to Shopping Cart.               nal of Retailing and Consumer Services 21, 1 (2014),
     https://github.com/dianewoodbridge/target_recipe             48–53.