Bundle Recommender from Recipes to Shopping Carts - Optimizing Ingredients, Kitchen Gadgets and their Quantities Chahak Sethi1,† , Melvin Vellera1,† , Diane Myung-kyung Woodbridge1 and Joey Jonghoon Ahnn2 1 University of San Francisco, San Francisco, California, USA 2 Target, Sunnyvale, California, USA Abstract In this paper, we introduce a recommender system where it automatically captures the context of what users or guests look for and recommends a bundle of products to be added to their shopping cart. The recommendation system takes selected recipes from a user as input and recommends a shopping cart with ingredients in optimized quantities as well as any kitchen gadgets that might be necessary to efficiently prepare the recipes using neural networks. We propose a system architecture, dive deep into the individual components, and evaluate the performance of information retrieval, semantic search, and quantity optimization algorithms. Using an ensemble methodology, we attained a mean average precision of over 0.9 for ingredient and quantity recommendations. The recipe-based bundle recommender system may be used not only to improve the user’s shopping experiences but also to enable and encourage them to have healthier eating habits, aiming at providing personalized product recommendations. 1. Introduction or TV shows, including genre, language, cast, director, and popularity, to recommend them to their users [28]. Recommendation systems provide personalized recom- With the COVID-19 pandemic, there has been in- mendations for the customers using various data from creased need and interest in meal preparation, including customers and products to enhance customer experience shopping and cooking. This even helped some people and maximize the conversion rate, significantly contribut- rediscover their liking for meal preparation and discover ing to revenue growth in the retail industry. In 2020, new recipes. However, in recommending items for cook- the recommendation system market was valued at 1.77 ing recipes, there are unique challenges, including: (1) billion US dollars globally with a projected compound Recommending in-stock grocery items with correct quan- annual growth rate of 33.0% by 2028 [9]. tity as the recipe uses volume while the product uses Recommendation systems generally utilize two cate- weight as a measure. For example, in Figure 1, the recipe gories of algorithms, including Collaborative filtering- for classic french toast uses four tablespoons of unsalted based recommendation [23][5] and content-based recom- butter, whereas it is sold in a 16-ounce pack; (2) Recom- mendation [16] algorithms. Collaborative filtering (CF) mending kitchen gadgets using unstructured text data relies on the user’s history data and matches a user A with in directions. For example, in the figure 1, the recipe for a similar user B and recommends A what B liked. Today, classic french toast requires to ”whisk”, which implicitly most big e-commerce giants with a massive amount of indicates that the user would need a ”whisker”; (3) Op- data, use collaborative filtering to recommend products timizing quantity for repeating ingredients in multiple to their customers [4]. Content-based recommendations recipes. For example, if the classic French toast recipe create a profile to characterize each item. Some of the (Figure 1, Table 1) requires six slices of bread, while an- industry applications of content-based recommendation other recipe requires eight slices, which indicates a total systems include a name that suggests jobs to the users by of 1 pack of bread is needed to complete both recipes. matching their interests and skills with the features of job In this research, we developed a content-based rec- postings [2]. IMDb uses the information of the movies ommendation system with natural language processing to solve the three aforementioned sub-problems: (1) the ORSUM@ACM RecSys 2022: 5th Workshop on Online Recommender developed system parses ingredients and kitchen gadgets Systems and User Modeling, jointly with the 16th ACM Conference on in the recipes corpus and recommends the most similar Recommender Systems, September 23rd, 2022, Seattle, WA, USA † products from our product catalog database; (2) it parses The authors contributed equally. unstructured text in the recipe direction section to rec- Envelope-Open csethi2@usfca.edu (C. Sethi); mvellera@usfca.edu (M. Vellera); dwoodbridge@usfca.edu (D. M. Woodbridge); ommend the kitchen gadgets required to cook a recipe; joey.ahnn@target.com (J. J. Ahnn) (3) Finally, it optimizes the number of products and bun- © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). dles the recommendations together as a set of items that CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) Figure 1: Text of an Example Recipe’s (Classic French Toast) and Recommended Ingredients and Kitchen Gadgets Table 1 present in this field in Section 2. Next, we provide the Shopping Cart Recommendation - French Toast overview of the developed system, including mapping Product Qty Price ($) kitchen gadgets and ingredients in the recipes to relevant Hood Heavy Cream - 1pt 1 3.49 products from the product catalog database, optimizing McCormick Ground Cinnamon - 2.37oz 1 1.99 Arnold Stone Ground Wheat Bread - 16oz 1 2.59 the quantities to be recommended, and providing mul- Grade A Large Eggs - 12ct - Good & Gather™ 1 1.69 tiple candidates and their corresponding ranking to the Iodized Salt - 26oz - Good & Gather™ 1 0.49 user in Section 3. We have employed a few techniques Tillamook Unsalted Butter Spread - 16oz 1 4.19 Squish 1.5qt Mixing Bowl 1 8.99 to measure the performance of the system which will 9” Whisk Stainless Steel 1 6.00 be discussed in Section 4. We summarize our work with Westinghouse Cast Iron Seasoned Skillet 6.5- 1 23.50 some future works section 5. The authors make the codes inch used for the research available to the public [24]. customers can purchase individually or as a bundle, a 2. Related Work group of complementary items that can be purchased together [13] [29]. In late 2015, an American company that operates a gro- Our research offers the convenience of personally cu- cery delivery and pick-up service in the United States rated shopping experiences to users by reducing their and Canada, integrated with AllRecipes, a top recipe site, shopping efforts to find out right products as a bundle. to allow users to select a recipe and fill their cart with all The system helps users without any prior experience the necessary ingredients [22]. Although the ingredient in cooking easily purchase the required ingredients and recommendations provided by a e-commerce company kitchen gadgets. An automatic quantity optimization are accurate for a good number of recipes, we observed will reduce wastage of resources and optimize costs for that the recommended quantities were not ideal for cer- the users by suggesting the correct quantity of the prod- tain recipes. There were also cases where no matching uct. We believe that this approach leads to increased product was found for a recipe ingredient, such as veg- user basket sizes, eventually raising the business revenue. etable oil, even though there are other closely related Furthermore, it can also aid in automated promotional products, including olive oil and canola oil. In addition emails to the consumers with all the ingredients and gad- to these limitations, we also identified an opportunity for gets in a recipe rather than the hand-curated contents augmenting ingredient recommendations with kitchen which we find time-consuming. gadget recommendations that could help improve a user’s In this paper we discuss the related work already cooking experience, especially those new to cooking. Our work has been primarily motivated by these use cases, corpus (millions) while still being certain that recommen- and in this paper, we propose a methodology that gen- dations are personalized and engaging for the user. Our erates accurate ingredient recommendations as well as research employed a two-stage approach for candidate kitchen gadget recommendations. generation (retriever stage) and ranking (re-ranker stage) To our knowledge, there has not been any notable re- [7]. search regarding the recommendations of an optimized shopping cart of ingredients and kitchen gadgets based on recipes. We find that most of the existing literature in 3. System Overview the food domain is related to recommending recipes or in- The proposed system first takes a recipe as input from gredient substitutes. [25, 26, 11, 17]. Anirudh Jagithyala a guest shopping on an e-commerce company’s website. [11] developed a recommendation system that recom- The recipe then gets split into two sections: 1) cooking mends recipes based on recipe ratings, ingredients, and instructions and 2) ingredients. The cooking instructions review text. A number of approaches including memory- are parsed in order to detect and extract any kitchen based collaborative filtering and TF-IDF were tried along gadgets that might be required for the recipe, while the with similarity measures such as cosine and Pearson cor- required ingredients and quantities are extracted from relation. The research evaluated multiple approaches us- the recipe’s ingredients section. The quantities and units ing the mean average precision (mAP) and showed that of ingredients in the recipe text and the product catalog collaborative filtering on recipe ratings performed better. database are standardized for accurately matching vary- Chantal Pellegrini et al. [17] explored the use of text and ing units from recipes and products. The ingredients and image embeddings for identifying ingredient substitutes. kitchen gadgets required by the recipe are then fed into They generated context-free embeddings using word2vec the recommender system to search for the best match- as well as context-based embeddings using transformer- ing products from the product catalog database based based models. In the end, the research showed that the on textual and quantity information. The system then transformer-based multi-modal approach using text and adds these products to the shopping cart of the guest image embeddings together gave the best results with a (Figure 2). precision of 0.84 for the top 1000 most common ingre- For advanced natural language processing (NLP), we dients. Chun-Yuen Teng et al. [25] explored the recom- utilized Open-source software libraries, including Spacy mendation of recipes and ingredient substitutes using [10] and NLTK [3] for extracting recipe ingredients and network structures. The system identifies ingredient sub- kitchen gadgets from the recipe text. The ingredients, stitutes using a graph structure where nodes represent in- along with the required quantities, were parsed from the gredients and edges represent the degree of substitutabil- ingredient section of the recipe (Figure 1) using regular ity. To derive pairs of related recipes, they computed the expressions, after which these ingredients were then pre- cosine similarity between the ingredient lists for the two processed using the NLTK library for stop words removal, recipes, weighted by the inverse document frequency. stemming, and 𝑛-gram expansion. The Spacy library was Mayumi Ueda et al. [26] applied user preferences and used for extracting kitchen gadgets from the recipe in- ingredient quantity for recommending recipes. Their structions using named entity recognition (NER). Once method breaks recipes down into their ingredients and the ingredients and kitchen gadgets of the recipe are scores them based on the ingredients’ frequency of use identified, they are compared against the products in the and specificity. product catalog database to get the most relevant prod- Some of the existing algorithms for searching and rank- uct in stock for each ingredient and user. This process ing relevant items use classical information retrieval al- also involved the novel usage of a language representa- gorithms such as TF-IDF [19], BM-25 [21], or Glove [18], tion model, Bidirectional Encoder Representations from while others make use of deep learning models such as Transformers (BERT) [8], which is designed to pre-train BERT [8]. BERT [8] is a language representation model deep bidirectional representations from an unlabeled text that can give accurate contextual embeddings for words by jointly conditioning on both left and right contexts in in most cases. Unfortunately, the BERT does not gener- all layers. The advantage of using a pre-trained architec- ally give accurate representations for sentences, and the ture is that we can use transfer learning to transfer the construction of BERT makes it unsuitable for semantic already trained features to the current data without the similarity search. To overcome these issues, we applied complexity of training heavy machine learning models the Sentence-BERT, [20] model, which was trained using [15]. Siamese BERT-Networks. We also developed an algorithm to recommend the The preferred recommendation system architecture optimal number of products in case a guest chooses more was one based a two-stage approach consisting of can- than one recipe using the same products. The quantity didate generation and ranking stages. This two-stage or weight of the common ingredients is recommended approach allows for recommendations from a very large Figure 2: System Overview Figure 3: Ingredient Recommendation Workflow based on the sum of the required amount, which helps search time, the algorithm embeds a recipe ingredient create the optimal baskets for the guests and leads to less into the same vector space and calculates the similarity wastage of resources. between an ingredient (𝐼) and product (𝑃) to find the most relevant products. These products would have a high 3.1. Ingredient Recommendation semantic overlap with the ingredient. We used cosine similarity in Equation 1 to find the closest embeddings The developed system utilizes a combination of semantic from the corpus. search and information retrieval algorithms to recom- 𝑒 mend the most relevant products for the ingredients in 𝐼 ⋅𝑃 ∑𝑗=1 𝐼𝑗 ⋅ 𝑃𝑗 a recipe. Semantic search can improve search accuracy 𝑆𝑖𝑚(𝐼 , 𝑃) = = (1) ‖𝐼 ‖‖𝑃‖ 𝑒 ∑ 𝐼 2 ∑ 𝑃2 𝑒 by understanding the context and content of the search √ 𝑗=1 𝑗 √ 𝑗=1 𝑗 query. In contrast to conventional search algorithms, , where 𝑒 is the embedding dimension of the ingredient which only find documents based on lexical matches, se- and product vectors. mantic search can also find synonyms. Semantic search After computing the cosine similarity between the aims to embed all entries in the corpus into vector space, embedding of an ingredient and the embeddings of the where the query is embedded into the same vector space, products, the top 𝑚 products are retrieved. to find the closest embeddings from the corpus. In our For complex search tasks, the search can be signifi- case, a recipe ingredient is a query, and the products are cantly improved by using a retrieve and re-rank frame- all the entries in the corpus, where the products are em- work, where the top 𝑡 products are retrieved efficiently, bedded into vector space and stored separately. During followed by a re-ranker that ranks these 𝑡 products and recommends 𝑚 products. time of inference [6]. A bi-encoder model can encode an input text, such as a recipe ingredient or a product, 3.1.1. Retriever and output a vector (embedding) that captures seman- tic information. If a recipe ingredient embedding and a Given an ingredient, we first use a retrieval system that product embedding are similar, then the cosine similarity quickly retrieves 𝑡 products that are potentially relevant (Equation 1) between these two embeddings will be high. for the given ingredient. Then the 𝑡 products are re- Hence, by comparing an ingredient embedding with all ranked, and the top 𝑚 matches are sent through to the the product embeddings using cosine similarity, we can quantity recommendation module. For our retrieval identify the most similar products for an ingredient. system, we make use of the BM25 algorithm for lexi- In order to reduce the search space efficiently, hierar- cal search [21]. For an ingredient query (𝐼) with terms chical classification models were created for the following 𝑖1 , … , 𝑖𝑛 , the BM25 score for a product text 𝑃 is in Equa- levels: class, subclass, and item-type. These models were tion 2. trained using the preprocessed text of the products as feature vectors and the respective hierarchical level val- 𝑛 𝑡𝑓 (𝑖𝑗 , 𝑃) ⋅ (𝑐 + 1) ues as the target labels. A softmax activation function 𝐵𝑀25(𝑃, 𝐼 ) = ∑ 𝐼 𝐷𝐹 (𝑖𝑗 ) (Equation 4) was used in the final layer of the multi-class |𝐷| 𝑗=1 𝑡𝑓 (𝑖𝑗 ) + 𝑐 ⋅ (1 − 𝑏 + 𝑏 ⋅ 𝑑 ) classification models, along with the cross-entropy loss 𝑎𝑣𝑔 (2) function (Equation 5). , where 𝑡𝑓 (𝑖𝑗 , 𝑃) is the number of times term 𝑖𝑗 of the exp(𝑧𝑖 ) ingredient text occurs in 𝑃. |𝐷| is the number of words in Softmax(𝑧𝑖 ) = 𝐶 (4) 𝑃. 𝑑𝑎𝑣𝑔 is the average number of words for product text. ∑ 𝑗=1 exp(𝑧𝑗 ) 𝑏 and 𝑐 are saturation parameters for document length , where 𝐶 is the number of output classes, 𝑧𝑖 is an element and term frequency respectively. In general, values such of a vector 𝑧 of size 𝐶 corresponding to a particular class. as 0.5 < 𝑏 < 0.8 and 1.2 < 𝑐 < 2 are reasonably good in many circumstances [21]. Equation 3 describes inverse document frequency 1 𝑁 (𝐼 𝐷𝐹 (𝑖𝑗 )) for a corpus with 𝑁 products with term 𝑖𝑗 . Cross Entropy Loss = − (∑ 𝑦𝑖 ⋅ 𝑙𝑜𝑔(𝑦𝑖̂ )) (5) 𝑁 𝑖=1 𝑁 − 𝑁 (𝑖𝑗 ) + 0.5 𝐼 𝐷𝐹 (𝑖𝑗 ) = 𝑙𝑜𝑔 (3) , where 𝑁 is the number of observations, 𝑦𝑖 is the true 𝑁 (𝑖𝑗 ) + 0.5 label vector and 𝑦𝑖̂ is the predicted label probability vector. , where 𝑁 (𝑖𝑗 ) is the number of product texts in the database that contain the term 𝑖𝑗 of the ingredient query. 3.1.2. Re-ranker For improving the accuracy of the retrieval stage, we After retrieving the top 𝑡 products, the re-ranker stage combined the BM25 algorithm with a bi-encoder sentence ranks the products more accurately using a cross-encoder transformer model that was fine-tuned using Microsoft’s sentence transformer model that was fine tuned using MiniLM model [27]. The MiniLM model is a compressed the Microsoft’s MS Marco dataset [1], which is a large Transformer model that uses an approach termed as deep scale information retrieval corpus that was created based self-attention distillation to reduce the number of parame- on real user search queries using Bing search engine. In ters required by a transformer model. It is twice as fast as contrast to a bi-encoder model that performs two inde- BERT while retaining more than 99% accuracy on SQuAD pendent self-attentions for the query and the document, 2.0 and several GLUE benchmark tasks using only 50% a cross-encoder model performs full self-attention across of BERT’s model parameters. The bi-encoder sentence the entire query-document pair. As a result, the cross- transformer model [20] that uses the MiniLM model was encoder can model the interaction between a query and trained using a dataset of 1 billion sentence pairs and a a document, and the resulting representations contain self-supervised contrastive learning objective: given a contextualized embeddings [6]. In our use case, an in- sentence from a sentence pair, the model should predict gredient and a product are passed simultaneously to the which out of a set of randomly sampled other sentences cross-encoder, which then outputs a score indicating how was actually paired with one in the dataset. A bi-encoder relevant the product is for the given ingredient. As the model performs two independent self-attentions for the cross-encoder models the interaction between an ingre- query and the document, and the document is mapped dient and a product during inference time, it is slower to a fixed BERT representation regardless of the choice than the bi-encoder, and hence, it can only be used for of a query. This makes it possible for bi-encoder models a small subset of products. However, we can achieve to pre-compute document representations offline, signifi- a higher accuracy as they perform attention across the cantly reducing the computational load per query at the query and the document. After the re-ranker stage, the Table 2 of 1 lb rather than two packs of 0.5 lb flour if 1 lb-pack is Commonly used units in recipes and corresponding conversion more cost-efficient. to the International System of Units (SI) If a user selects multiple recipes, the quantity is opti- Common Unit Converted Unit in mized such that minimum units of the common products in Recipe International System of Units (SI) are recommended. For example, for two recipes using 1 cup 225 ml one tablespoon and two tablespoons of salt each, the rec- 1 teaspoon 5 ml 1 tablespoon 15 ml ommended unit of salt can/bottle will be optimized to 1 fluid ounce 30 ml one to reduce waste and cost. 3.3. Kitchen Gadget Recommendation top 𝑚 matched ingredients are then optimized based on quantity. The recommendations for kitchen gadgets follow a simi- lar approach to ingredient recommendations using a com- 3.2. Unit Normalization and Optimization bination of semantic search and information retrieval al- gorithms. The required kitchen gadget is implicitly men- Once the algorithm selects the top 𝑚 matched ingredients, tioned in unstructured recipe instruction text, whereas the next important step is to recommend the optimal ingredients are explicitly listed in the ingredient sec- quantity of the product needed in the recipe (Figure 4). tion. To identify the kitchen tools and methods post For this, we start with retrieving the ingredient quantity the pre-processing of the recipe instructions, a custom specified in the recipe and normalize the units required to NER model from Spacy [10] was trained on these entities. a standard SI units (Table 2), including tablespoon (tbsp), The NER model identifies the kitchen gadgets (nouns) teaspoon (tsp), milliliter (ml), cup, count, pound (lb), and used in the recipe and the methods (verbs) that can iden- ounce (oz). These standard quantities are either weight tify a kitchen gadget. For example, for ”Chop the garlic or in volumes handled differently from each other. and add to the pan”, a pan will be identified as a gadget, As product descriptions generally utilize weight as a whereas chop will be identified as a method associated measure while most recipes use volumes, we converted with the gadgets including a knife and chopping board. the volume with the standardized unit to weight using Similar to ingredient recommendations, products are density (𝑑). For instance, the weight (𝑤) for 𝑛 cups of a embedded into vector space and stored separately. Dur- grocery product in the recipe, where 1 cup is 225 ml, can ing search time, a gadget from the recipe instruction is be calculated as the following. also embedded into the same vector space, and the system searches for the most relevant products. These products 𝑤 = 𝑛 ∗ 225 ∗ 𝑑 (6) would have a high semantic overlap with kitchen gad- gets. After computing the cosine similarity between the Once the required weight is calculated, the system com- embedding of the kitchen gadget and the embeddings pares it against the weight of the recommended products of all products, the top 𝑚 products with the maximum in product catalog’s database. The recommended num- similarity are retrieved. For these search tasks, we used ber of units is then calculated for each matched product embeddings from the RoBERTa [12], an improved and using Equation 7, where 𝑞 is the recommended quantity, robustly trained version of BERT with further tuned hy- 𝑤 is the ounces required in the recipe, and 𝑝 is the ounces perparameters. RoBERTa has achieved state-of-the-art sold or packaged. results on GLUE, RACE, and SQuAD. 𝑤 For complex search tasks, the search is significantly 𝑞=⌈ ⌉ (7) 𝑝 improved by using a retrieve (Section 3.1.1 ) and re-rank For fresh produce items that use a count as a unit in (Section 3.1.1) framework, just like for the ingredients a recipe, like two onions or three potatoes, the average (Figure 5). For quantity optimization, if the user selects weight of the given fruits and vegetables is used to con- multiple recipes, the common kitchen gadgets are only vert it to weight [14]. The reverse conversion is also recommended once, along with all the other gadgets used applied if the unit for a product at e-commerce compa- in each recipe. nies uses count and the recipe specifies weight instead. Once the recommended quantity is known for 𝑚 4. Result matched ingredients, we sort 𝑚 ingredients by the quan- tity recommended and the price in ascending order. In For assessing the performance of different algorithms, addition, the system recommends lower-priced items if we identified the relevant products for the top 100 most multiple packaging options are available for the same common ingredients and kitchen gadgets (queries) and products. For instance, the system recommends one pack Figure 4: Quantity Recommendation Flow Figure 5: Kitchen Gadget Recommendations calculated the mean average precision (𝑚𝐴𝑃) at different 4.1. Ingredient Search values of 𝐾, where 𝐾 is the number of retrieved products. For ingredient search, different algorithms, as well as an 𝑄 ensemble of these algorithms, were evaluated using the ∑𝑞=1 𝐴𝑃@𝐾 (𝑞) 𝑚𝐴𝑃@𝐾 = (8) 𝑚𝐴𝑃@𝐾 metric. The BM25 algorithm was considered 𝑄 as the baseline model against more complex models. An , where 𝑄 is the number of queries, 𝐾 is the number of interesting thing to note from Figure 6. is that the BM25 retrieved products, and 𝐴𝑃@𝐾 is the average precision algorithm gives a very good performance for 𝐾=1 since at 𝐾. lexical search algorithms generally have high precision due to exact keyword matching. However, for higher val- 𝐾 1 ues of 𝐾, the drop in mean average precision is quite high. 𝐴𝑃@𝐾 = ∑ 𝑃(𝑘) ⋅ 𝑟𝑒𝑙(𝑘) (9) The experiment results showed that transformer models 𝑚𝑖𝑛(𝐾 , 𝑅) 𝑘=1 such as MiniLM and MS Marco are more general with , where 𝑅 is the number of relevant products, 𝐾 is the consistently high precision values across different 𝐾 val- number of retrieved products, 𝑟𝑒𝑙(𝑘) is an indicator that ues. Using BM25 along with the MS Marco transformer says whether that 𝑘 𝑡ℎ item was relevant (𝑟𝑒𝑙(𝑘)=1) or not model gave the best performance for all 𝐾 values. (𝑟𝑒𝑙(𝑘)=0), and 𝑃(𝑘) is the precision at 𝑘. Once the best performing model was identified, we 𝑘 further evaluated the final model with 100 recipes from ∑ 𝑟𝑒𝑙(𝑖) 𝑃(𝑘) = 𝑖=1 (10) the Recipe1M+ corpus, which is a large-scale, structured 𝑘 Figure 6: Mean Average Precision of Ingredient Search for Different 𝐾 Table 3 4.3. Kitchen Gadget Quantity Recommendation Accuracy Measurement The NER model to identify kitchen gadgets was trained Ingredient Total recipes Correct Qty Percentage Correct Salt 32,506 32,443 99.806% on 500 recipes and tested on 100 recipes. The custom Sugar 19,983 16,473 82.435% entities’ kitchen gadgets and methods were marked with Butter 18,958 14,875 78.463% their position in the text using regex for these 600 recipes. A manual review of the annotations was performed to update any incorrect or missed annotations, and we corpus of over one million cooking recipes and 13 million achieved a test F1 score of 99.627% (Equation 11). food images. The 𝑚𝐴𝑃@1 value for all the ingredients from these 100 recipes was 0.949, which is similar to 2 ⋅ 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ⋅ 𝑟𝑒𝑐𝑎𝑙𝑙 the 𝑚𝐴𝑃@𝐾 values we see in Figure 6 for the top 100 𝐹1 = (11) 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙 ingredients. , where precision is out of the total documents retrieved, how many are relevant and recall is out of the total rele- 4.2. Quantity Normalization and vant documents how many relevant documents are re- Optimization trieved. A custom mapping is used to convert cooking methods For evaluating quantity normalization, we measured the to kitchen gadgets which along with other identified gad- accuracy of quantity normalization, using the most com- gets form search queries for the recipe. For evaluation, monly used ingredients in the randomly chosen 100,000 different algorithms were evaluated at mAP@K, similar recipes. The three most commonly used ingredients, salt, to ingredient search. We developed transformer models sugar, and butter, are tracked to measure if the recom- including MiniLM, MS Marco, and Roberta for the se- mended quantity after unit normalization is accurate or mantic search tasks. The experiment results show that not. MS Marco, a combination of retriever and a re-ranker, We found that salt is used in 32,506 out of the chosen performs consistently better than MiniLM and Roberta 100,000 recipes ( Table 3). Out of the total 32,506 recipes, across all 𝐾 (Figure 7). the quantity of salt is correctly recommended in 32,443 As an example, we present a recipe in Figure 8 that recipes which is 99.806% of the total. Similarly, the accu- consists of two sections: ingredients and directions. The racies of the recommended quantity were 82.435% and ingredients section is used for ingredient and quantity 78.463% for sugar and butter respectively. However, the recommendations, whereas the directions section is used relatively low accuracy in quantity recommendations for for kitchen gadget recommendations. The keywords used sugar and butter is due to the incorrect recipe text. For for kitchen gadget recommendations are highlighted in example, certain recipes say 34 cups of butter instead of red. 3/4 cups of butter. This has been further discussed in The product recommendations based on the recipe Section 5. (Figure 8) is given below in Table 4. Further, the quantity optimization process was also evaluated with randomly chosen 100 recipes from the Recipe1M+ corpus. The mAP@1 value for all the ingredi- 5. Conclusion ents from these 100 recipes was 0.914. For a combination of recipes, the result has been manually verified for 5 We presented a content-based recommendation system random sets of any two recipes. to recommend relevant retail products to a user or guest based on selected recipe contents. The recommended products are primarily based on the ingredients required Figure 7: Mean Average Precision of Kitchen Gadget Search for Different 𝑘 Ingredients Directions 1/2 cupoats 1. In a medium bowl combine rolled oats and water, let stand 5 2 cupswater minutes. 2 cupspancake mix 1/2 cupapples 2. Meanwhile, heat large nonstick skillet or griddle to medium 2 tablespoonssugar high heat (375F). 1/2 teaspooncinnamon 3. Grease lightly with oil. Add remaining ingredients to rolled oats mixture; stir just until all ingredients are moistened. 4. For each pancake, pour 1/4 cup batter into hot skillet. 5. Cook 1 to 1.5 minutes, turning when edges look cooked and bubbles begin to break on surface. 6. Continue to cook 1 to 1.5 minutes or until golden brown. 7. Serve with syrup and butter, if desired. Figure 8: Apple Oat Breakfast Pancakes Table 4 assessing the top product recommendations for the 100 Shopping Cart Recommendation most frequently occurring ingredients in the 1M+ recipe Product Qty Price ($) corpus. For ingredient search, the ensemble approach of Red Delicious Apple 1 0.99 BM25 and MS Marco gave the best performance, while McCormick Ground Cinnamon - 2.37oz 1 1.99 Good & Gather Organic Oats - 18oz 1 2.39 for kitchen gadget search, the MiniLM model proved to Buttermilk Pancake Mix - 32oz 1 2.19 be more accurate. The accuracy of quantity recommenda- Imperial Granulated Pure Sugar - 4lb 1 2.19 tions was measured by evaluating certain high-frequency Good & Gather Alkaline Water - 1L 1 0.99 Squish 1.5qt Mixing Bowl - Green 1 8.99 ingredients such as salt, sugar, and butter across more Anchor 8oz Glass Measuring Cup 1 3.49 than 50,000 recipes from the 1 million+ (1M+) recipe Nylon Ladle with Soft Grip - Made By Design 1 3.00 corpus. Lakeside 10” Nonstick Aluminum Skillet with 1 16.47 Faux Granite finish There were a few challenges that we faced while im- plementing the system that we hope to address in the near future. Firstly, the recipe text used in the 1M+ recipe by the recipe, which are extracted from the recipe text corpus has some inconsistencies where the quantity re- along with associated quantities. More significantly, the quired is not accurately defined. For example, a lot of system is also capable of recommending the relevant recipes say 34 cups of an ingredient instead of 3/4 cups. In kitchen gadgets that might be required for making the future work, we can work on parsing the text with more recipe based on the instructions provided in the recipe. scrutiny and rules to avoid such cases. Alternatively, When a user selects multiple recipes, the recommender we could also build an entirely new recipe scraping al- system optimizes quantities for each product, where the gorithm to handle such cases. Secondly, the quantity quantities are adjusted according to the amounts of the optimization for multiple recipes is in place, but there is common ingredients and kitchen gadgets present in the no formal way to measure how well is the performance recipes. of the process. As a part of future work, we would devise We conducted experiments for evaluating the effec- a way to quantify the performance of this process. tiveness of various algorithms for the recommendation We also identified extra features for the current sys- system, such as BM25, MiniLM, MS Marco, and RoBERTa. tem. First, the dietary restrictions of a user can be ac- In our experiments, we compared these algorithms by counted for by considering the ingredients, nutritional information, and key allergens that might be present in [5] Lin Chen, Rui Li, Yige Liu, Ruixuan Zhang, and the product using natural language processing. Second, Diane Myung-kyung Woodbridge. 2017. Ma- there could be preference filters provided to users before chine learning-based product recommendation us- they add a recipe to the shopping cart that could consider ing Apache Spark. In 2017 IEEE SmartWorld, Ubiq- their dietary preferences such as whether they prefer uitous Intelligence Computing, Advanced Trusted non-GMO or organic products only. Third, instead of Computed, Scalable Computing Communications, asking for preferences explicitly, the preferences of the Cloud Big Data Computing, Internet of People users could be determined automatically by analyzing and Smart City Innovation (SmartWorld/SCAL- their past shopping behaviors such as clicks, views, or COM/UIC/ATC/CBDCom/IOP/SCI). 1–6. https: product purchases. Based on this implicit feedback, we //doi.org/10.1109/UIC-ATC.2017.8397470 could determine the user’s preferences, such as whether [6] Jaekeol Choi, Euna Jung, Jangwon Suh, and Won- the user is a vegetarian or if the user is currently pur- jong Rhee. 2021. Improving Bi-encoder Docu- chasing products specific to a particular diet, such as the ment Ranking Models with Two Rankers and Multi- ketogenic diet. Such information can then be used in the teacher Distillation. In Proceedings of the 44th In- recommendation system for personalizing recommended ternational ACM SIGIR Conference on Research and products for different users. Development in Information Retrieval. ACM. The content-based recommendation system proposed [7] Paul Covington, Jay Adams, and Emre Sargin. 2016. in this paper could potentially be used to improve the Deep neural networks for youtube recommenda- shopping experience of users by providing them options tions. In Proceedings of the 10th ACM conference on to add all the necessary products required by a recipe recommender systems. 191–198. automatically to their shopping cart. Kitchen gadget rec- [8] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and ommendations could be helpful not just for improving Kristina Toutanova. 2018. BERT: Pre-training of the shopping experiences of the users but also for their Deep Bidirectional Transformers for Language Un- cooking experiences. These recommendations also al- derstanding. low a business to increase the basket sizes of their users, [9] Grand View Research. 2021. Recommendation En- thereby increasing revenue. Moreover, the system could gine Market Size, Share & Trends Analysis. Recom- also aid in automated promotional emails that recom- mendation Engine Market Report (2021). mend products required by recipes that may be of interest [10] Matthew Honnibal and Ines Montani. 2017. spaCy to customers. 2: Natural language understanding with Bloom em- beddings, convolutional neural networks and incre- mental parsing. (2017). To appear. References [11] Anirudh Jagithyala. 2014. Recommending recipes based on ingredients and user reviews. Ph. D. Disser- [1] Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, tation. Kansas State University. Jianfeng Gao, Xiaodong Liu, Rangan Majumder, An- [12] Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, drew McNamara, Bhaskar Mitra, Tri Nguyen, Mir Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Rosenberg, Xia Song, Alina Stoica, Saurabh Tiwary, Luke Zettlemoyer, and Veselin Stoyanov. 2019. and Tong Wang. 2016. MS MARCO: A Human Gen- RoBERTa: A Robustly Optimized BERT Pretraining erated MAchine Reading COmprehension Dataset. Approach. [2] Shivam Bansal, Aman Srivastava, and Anuja Arora. [13] Kevin F. McCardle, Kumar Rajaram, and Christo- 2017. Topic Modeling Driven Content Based Jobs pher S. Tang. 2007. Bundling retail products: Mod- Recommendation Engine for Recruitment Industry. els and analysis. European Journal of Operational Procedia Computer Science 122 (2017), 865–872. 5th Research 177, 2 (2007), 1197–1217. International Conference on Information Technol- [14] Niklas. 2022. Average Weight of All Fruits and ogy and Quantitative Management, ITQM 2017. Vegetables. https://weightofstuff.com/average-w [3] Steven Bird, Ewan Klein, and Edward Loper. 2009. eight-of-all-fruits-and-vegetables/ Natural language processing with Python: analyzing [15] Sinno Jialin Pan and Qiang Yang. 2010. A Survey on text with the natural language toolkit. ” O’Reilly Transfer Learning. IEEE Transactions on Knowledge Media, Inc.”. and Data Engineering 22, 10 (2010), 1345–1359. [4] Dan Chang, Hao Gui, Rui Fan, Ze Fan, and Ji [16] Michael J. Pazzani and Daniel Billsus. 2007. Content- Tian. 2019. Application of Improved Collaborative Based Recommendation Systems. Springer Berlin Filtering in the Recommendation of E-commerce Heidelberg, Berlin, Heidelberg, 325–341. Commodities. INTERNATIONAL JOURNAL OF [17] Chantal Pellegrini, Ege Özsoy, Monika Wintergerst, COMPUTERS COMMUNICATIONS CONTROL 14, 4 and Georg Groh. 2021. Exploiting Food Embeddings (2019), 489–502. for Ingredient Substitution. In HEALTHINF. [18] Jeffrey Pennington, Richard Socher, and Christo- _project/ pher D Manning. 2014. Glove: Global vectors for [25] Chun-Yuen Teng, Yu-Ru Lin, and Lada A Adamic. word representation. In Proceedings of the 2014 con- 2012. Recipe recommendation using ingredient ference on empirical methods in natural language networks. In Proceedings of the 4th annual ACM processing (EMNLP). 1532–1543. web science conference. 298–307. [19] Juan Ramos et al. 2003. Using tf-idf to determine [26] Mayumi Ueda, Syungo Asanuma, Yusuke word relevance in document queries. In Proceed- Miyawaki, and Shinsuke Nakajima. 2014. Recipe ings of the first instructional conference on machine recommendation method by considering the users learning, Vol. 242. Citeseer, 29–48. preference and ingredient quantity of target recipe. [20] Nils Reimers and Iryna Gurevych. 2019. Sentence- In Proceedings of the international multiconference BERT: Sentence Embeddings using Siamese BERT- of engineers and computer scientists, Vol. 1. 12–14. Networks. [27] Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, [21] Stephen Robertson and Hugo Zaragoza. 2009. The Nan Yang, and Ming Zhou. 2020. MiniLM: Deep Self- probabilistic relevance framework: BM25 and beyond. Attention Distillation for Task-Agnostic Compres- Now Publishers Inc. sion of Pre-Trained Transformers. arXiv:2002.10957 [22] Perez Sarah. 2015. Instacart And Allrecipes Now Let [28] Chutian Wei, Xinyu Chen, Zhenning Tang, and You Add A Meal’s Ingredients To Your Grocery List Wen Cheng. 2021. Fully content-based IMDb movie With A Click. https://techcrunch.com/2015/10/12/ recommendation engine with Pearson similarity. instacart-and-allrecipes-now-let-you-add-a-meals In International Conference on Green Communica- -ingredients-to-your-grocery-list-with-a-click/ tion, Network, and Internet of Things (GCNIoT 2021), [23] J. Ben Schafer, Dan Frankowski, Jon Herlocker, and Siting Chen and Jun Mou (Eds.), Vol. 12085. Interna- Shilad Sen. 2007. Collaborative Filtering Recom- tional Society for Optics and Photonics, SPIE, 132 – mender Systems. Springer Berlin Heidelberg, Berlin, 137. Heidelberg, 291–324. [29] Ruiliang Yan, Chris Myers, John Wang, and Sanjoy [24] Chahak Sethi, Melvin Vellera, Diane Myung kyung Ghose. 2014. Bundling products to success: The Woodbridge, and Joey Jonghoon Ahnn. 2022. Bun- influence of complementarity and advertising. Jour- dle Recommender from Recipes to Shopping Cart. nal of Retailing and Consumer Services 21, 1 (2014), https://github.com/dianewoodbridge/target_recipe 48–53.