<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>S2SRec2: Set-to-Set Recommendation for Basket Completion with Recipe</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yanan Cao</string-name>
          <email>yanan.cao@walmart.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Omid Memarrast</string-name>
          <email>omid.memarrast@walmart.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shiqin Cai</string-name>
          <email>shiqin.cai@walmart.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sinduja Subramaniam</string-name>
          <email>sinduja.subramaniam@walmart.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Evren Korpeoglu</string-name>
          <email>evren.korpeoglu@walmart.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kannan Achan</string-name>
          <email>kannan.achan@walmart.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Walmart Global Tech</institution>
          ,
          <addr-line>Sunnyvale, CA, 94086</addr-line>
          ,
          <country country="US">United States</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>In grocery e-commerce, customers often build baskets of ingredients guided by dietary preference, but lack the recipe expertise to create complete meals. Consequently, utilizing recipe knowledge to recommend complementary ingredients based on given ingredients is key to filling these gaps for a successful culinary experience. The traditional method for completing a given ingredient set, also known as recipe completion, generally focuses on predicting a single missing ingredient using a leave-one-out strategy from recipe data; however, this approach falls short in two important aspects when applied to real-world scenarios. First, these methods do not fully capture the complexity of real-life culinary experiences, where customers routinely need to add multiple ingredients to complete a recipe. Second, they only consider the interaction between the existing ingredients and the missing ingredients but neglect the relationship among multiple missing ingredients. To overcome these limitations, we reformulate basket completion as a set-to-set (S2S) recommendation problem, where an incomplete basket is input into a system that predicts a set of complementary ingredients to form a coherent culinary experience. We introduce S2SRec2 a set-to-set ingredient recommendation framework utilizing a Set Transformer based model trained in a multitask learning paradigm. S2SRec2 simultaneously learns to: (i) query missing ingredients based on the set representation of existing ingredients, and (ii) determine the completeness of the basket based on the union set of existing and predicted ingredients. These two tasks are jointly optimized, which enforces both accurate retrieval of complementary ingredients and coherent basket completeness prediction for multi-ingredient recommendation. Experiments on large-scale culinary datasets, together with extensive qualitative analyses, demonstrate that S2SRec2 significantly outperforms traditional single-target recommendation methods, ofering a promising solution to enhance grocery shopping experiences and foster culinary creativity.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Item Recommendation</kwd>
        <kwd>Basket Completion</kwd>
        <kwd>Set Transformer</kwd>
        <kwd>Multi-task Learning</kwd>
        <kwd>Recipe Completion</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The convergence of machine learning and culinary arts has sparked growing interest in understanding
recipes through textual data, such as ingredient lists, cooking instructions and visual content like recipe
images [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Although there are numerous recipes available, recipe completion – identifying the right
set of potential ingredients that complement those already present - remains a challenging yet crucial
problem. In grocery e-commerce, recipe completion aligns naturally with the task of basket completion.
Ingredient recommendations for completing recipes with partial ingredients in an e-commerce basket
can not only streamline a user’s meal planning process and reduce food waste but also improve the
grocery shopping experience, leading to higher customer satisfaction and conversion metrics for
business. Figure 1 demonstrates an example use case of recipe completion for a grocery basket: if a
customer has a set of ingredients whose variety is not suficient for cooking, a recipe completion system
can suggest potential grocery items that can form a decent dish with existing ingredients. By predicting
missing ingredient and suggesting additions, these systems facilitate eficient meal preparation and
ensure informed and satisfactory choices during grocery shopping. There are a handful of studies
that have explored methods for completing partial recipes in diferent ways, but each has its own
limitations. Traditional methods such as item-based collaborative filtering, matrix factorization [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], and
content-based filtering [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] are commonly used in the early works of recipe completion and recipe-driven
food recommendation. However, such methods neglect the rich patterns in the recipe data, such as the
compatibility among ingredients and the relationship between ingredients and recipes. To leverage
the rich information in the recipe data, recent studies in this field have started exploring Knowledge
Graph (KG) [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ] and Set-Transformers [
        <xref ref-type="bibr" rid="ref6 ref7 ref8">6, 7, 8</xref>
        ] for recipe representation learning to facilitate the
recipe completion task. However, for recipe completion, the Leave-One-Out (LOO) [
        <xref ref-type="bibr" rid="ref6 ref7 ref8">6, 7, 8</xref>
        ] method is
commonly used to predict the single target ingredient excluded from the original set of ingredients.
Such approaches overlook the fact that most real-life scenarios require multiple ingredients to complete
a recipe.
      </p>
      <p>To address the limitations in previous works, we propose Set-to-Set Recommendation for Recipe
completion (S2SRec2), which takes a set of existing ingredients as input and predicts a set of potential
ingredients that complement the input set of ingredients to form a viable recipe prediction with a
higher level of confidence. S2SRec2 is a Set Transformer based model trained in a multi-task learning
fashion. In S2SRec2, we leverage the inherent permutation invariance of the Set Transformer to
represent a recipe as an unordered set of ingredients, thereby capturing complex inter-dependencies
without being afected by the order of elements. Rather than predicting a single missing ingredient,
S2SRec2 is formulated as a set-to-set recommendation problem in which the system is tasked with
proposing the missing ingredients and predicting the completeness of the basket when combined with
the predicted ingredients. To achieve this, our model is trained under a multitask learning framework
that simultaneously optimizes two objectives: (i) a missing-ingredient query task, where a learnable
query attends to the encoded ingredient representations to predict which ingredient should be added
to the current basket, and (ii) a recipe completeness prediction task, which indicates whether the
basket with additional ingredients is complete. These tasks are jointly optimized, ensuring both precise
ingredient retrieval and reliable detection of basket completeness in multi-ingredient recommendations.
Our main contributions are highlighted as follows:
• Set-to-set formulation for recipe completion: To our knowledge, this is the first work to
formulate recipe basket completion as a set-to-set recommendation problem. Instead of predicting
a single missing item, the system recommends a set of ingredients to complement an existing
ingredient set, addressing the more realistic scenario of multi-ingredient additions.
• Set-Transformer model with multitask learning: We propose S2SRec2, a Set
Transformer–based model that simultaneously learns a missing-ingredient query function and a stop
signal indicating basket completeness. This design exploits the permutation invariance of sets to
model ingredient combinations and uses a multi-task objective to add ingredients one by one and
decide when no further items are needed.
• Empirical gains on large-scale datasets: Through extensive experiments on large-scale recipe
datasets, we demonstrate that S2SRec2 outperforms existing methods (including single-ingredient
recommendation approaches) in both recommending relevant ingredients and correctly predicting
when the recipe is complete. S2SRec2 delivers more coherent multi-ingredient recommendations,
highlighting its efectiveness for real-world recipe completion scenarios.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Previous work on recommendation in the food domain mainly focuses on Recipe Completion [
        <xref ref-type="bibr" rid="ref4 ref7 ref9">9, 4, 7</xref>
        ],
Food Recommendation [
        <xref ref-type="bibr" rid="ref10 ref8">8, 10</xref>
        ], and Recipe Recommendation[
        <xref ref-type="bibr" rid="ref11 ref12 ref13">11, 12, 13</xref>
        ]. Since they all rely on the recipe
information, including ingredients and instructions, to derive hidden patterns among ingredients and
recipes, Recipe Representation Learning becomes a fundamental research field in the food domain. In
this section, we review related works in Recipe Representation Learning and Recipe Completion.
      </p>
      <sec id="sec-2-1">
        <title>2.1. Recipe Representation Learning</title>
        <p>
          To efectively encode recipes into meaningful embeddings, various techniques such as Multi-modal
embeddings [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], Graph Neural Networks [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], Knowledge Graphs [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], and Set Transformers [
          <xref ref-type="bibr" rid="ref6 ref7 ref8">6, 7, 8</xref>
          ]
have been explored to represent recipes in a continuous vector space. A recipe can contain textual
information including ingredients, cooking instructions, and tags, as well as visual information such
as food images. Tian et al. [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] proposed a multi-modal recipe representation learning using a Graph
Neural Network model to incorporate textual, visual, and relational information into recipe embeddings.
Since the ingredients in a recipe can be represented as an unordered set of ingredients, the permutation
invariant property in Set Transformers [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] inspires the research in variant set transformer architectures.
Li &amp; Zaki’s work [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] proposes a Set Transformer-based joint model to learn recipe representations
through a recipe Knowledge Graph and optimize the learned embeddings using a triplet loss to ensure
similar recipes are closer in the latent semantic space.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Recipe Completion</title>
        <p>
          Although recipe representation learning has received considerable attention, studies on recipe
completion remain relatively sparse. A few pioneering works have explored tackling this problem using
collaborative filtering, knowledge graphs, and food pairing. In collaborative filtering-based methods,
Cueto et al. [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] use an item-based recommender system for ingredient recommendation based on the
similarity between item vectors containing their ratings. Nevertheless, such approaches neglect the rich
patterns in food data, such as the interaction between ingredients and the relevance between ingredients
and recipes. Guo et al. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] constructed a collaborated knowledge graph which combines a food
knowledge graph of ingredients and recipes with user interaction information for food recommendation. Gim
et al. [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] train a supervised learning model to predict the eliminated ingredient given a leave-one-out
ingredient set for each recipe, while Kim et al. [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] propose a solution based on the theory of food pairing
by predicting the afinity scores that quantify the suitability of adding a single ingredient to the existing
set of ingredients. These works recommend only one additional ingredient, while ignoring the fact that
most recipe completing scenarios need multiple additions. They also overlook the relationships among
the potential ingredients, where an additional ingredient would impact the next ingredient prediction.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Method</title>
      <p>
        In this section, we introduce S2SRec2, a set-to-set recommendation method for recipe completion that
can be applied to e-commerce complementary item recommendation for grocery baskets. In our setting,
only ingredients appear in both the basket data and the recipe data, so we leave out extra recipe details
such as tags, descriptions, and cooking steps used in prior work [
        <xref ref-type="bibr" rid="ref6 ref7 ref8">6, 7, 8</xref>
        ].
      </p>
      <p>Based on the study of related work, we adopt the Set Transformer architecture in Li &amp; Zaki’s
work to learn the representations of a set of ingredients, considering the unordered nature of recipe
ingredients. The Set Transformer consists of an encoder and a decoder, where the encoder processes each
element independently by performing self-attention among the elements to generate each element’s
representation enriched by other elements in the set, and the decoder summarizes the set via pooling for
downstream tasks. The key MAB, ISAB, and PMA blocks we use are outlined in Appendix A. After the
Set Transformer learning the permutation-invariant relationship between the ingredients, the model is
jointly optimized on two tasks—missing ingredient prediction and completeness prediction. Figure 2
demonstrates the training procedure of S2SRec2.</p>
      <sec id="sec-3-1">
        <title>3.1. Predicting Missing Ingredients Using a Learnable Query</title>
        <p>The first task of S2SRec2 is to identify the missing ingredient. We reformulate the problem as a
missing-ingredient query task and introduce a learnable query that directly interrogates the encoded
set representation of the existing ingredients. Specifically, given an input basket represented as an
unordered set of ingredients, a Set Transformer encoder is applied to capture the rich inter-dependencies
among the ingredients without imposing any sequential order. A learnable query vector is then used to
attend to the set of ingredient embeddings, efectively asking, “which ingredient is missing to complete
this basket?” The result is a score assigned to every candidate ingredient.</p>
        <p>For each recipe  ∈ , a subset of ingredients is randomly selected to form an incomplete basket,
denoted as . Let the encoded ingredient set be denoted by  = {1, 2, . . . , } and let  be the
learnable query vector. For each ingredient  ∈ , where  is the set of all ingredients available for
recommendation, an attention mechanism is used to compute scores between  and . These scores are
then passed through a softmax layer to generate a probability distribution over all candidate ingredients:
 ( | , ) = softmax(Score(, )),
∀  ∈ .</p>
        <p>(1)
The ingredient with the highest probability is selected as the missing ingredient complement.</p>
        <p>This learnable query mechanism is optimized end-to-end with a cross-entropy loss between the
predicted probability distribution and the ground truth missing ingredient.</p>
        <p>1 ∑︁  log ( ( | , )) ,
CE = − 
=1
(2)</p>
        <p>This design not only supports the recommendation of multiple ingredients sequentially as needed in
real-world scenarios but also inherently models the internal interactions among the predicted missing
ingredients - ensuring the recommended set forms a coherent culinary experience.</p>
        <p>While the query head can propose ingredients one by one, the model must also decide when further
additions are no longer beneficial—a capability provided by the completeness head introduced next.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Basket Completeness Prediction using Classification</title>
        <p>To further enhance the model’s ability to recognize a successful culinary completion, we define the
second task to predict the completeness of a basket formed by combining the existing ingredients with
a predicted missing ingredient. This task focuses on evaluating whether the integrated basket ofers all
the necessary ingredients for a complete recipe. Unlike sequence-based approaches that use a ‘&lt;stop&gt;‘
token to signal the end of generation, ingredient data are unordered therefore there is no inherent
position for a termination symbol to be correctly predicted.</p>
        <p>For each recipe  ∈ , the whole ingredients set is denoted as a completed set with a positive label.
At the same time, a subset of ingredients is randomly dropped to create an incomplete basket, which
is labeled as negative class. In the S2SRec2, the representation of complete recipe - obtained after the
set encoder and the subsequent prediction module - is passed through a fully connected layer that
outputs a predicted probability  representing the likelihood that the augmented basket is complete.
The training objective for this task is defined by the binary cross-entropy (BCE) loss:
where  is the ground truth label with  = 1 indicating a complete basket and  = 0 an incomplete
one and  is the number of training samples.</p>
        <p>This completeness prediction task enables the model to learn nuanced representations of recipe
completion, ensuring that the recommended ingredient complements the existing basket.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Joint Loss for Missing Ingredient Prediction and Set Completeness Prediction</title>
        <p>To integrate recipe knowledge into the completing task, we adopt a multi-task learning strategy, training
both the missing ingredients prediction and set completeness prediction tasks in parallel. Each task is
associated with its own specific output head, allowing the model to make complementary predictions
for the main task while benefiting from shared latent representations. The overall objective is optimized
using a combined loss function.</p>
        <p>=  ×  + (1 −  ) ×  .</p>
        <p>A weighting factor  is introduced to balance these two objectives.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Inference Process</title>
        <p>(3)
(4)
(5)</p>
        <p>During inference, the process begins by initializing an empty predicted ingredient set,  = {}. For
each inference round , the current basket of ingredients () is passed through S2SRec2, which outputs
two predictions: a completeness probability () and a recommended ingredient (). If () ≤ 0.5,
indicating that the basket is incomplete, the recommended ingredient () is added to the predicted set
 and the basket is updated as</p>
        <p>(+1) = () ∪ {()}.</p>
        <p>This iterative process continues until the completeness probability exceeds 0.5 (i.e., () &gt; 0.5), at
which point the basket is considered complete and the predicted ingredient set  is the final prediction.
Figure 3 demonstrates the inference procedure of S2SRec2.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiment and Results</title>
      <p>
        4.1. Data
Our recipes are sourced from an open-source dataset provided by food.com. To enhance data quality
and reduce noise, we filter out recipes with fewer than 5 ingredients, as they often lack suficient
complexity for meaningful analysis. Additionally, we removed recipes with more than 15 ingredients
to focus on the core structure of common recipes since 95% of recipes have fewer than 15 ingredients.
We also followed Kitchenette [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] to remove ingredients whose occurrence count does not exceed
20. We end up with a final dataset of 141,782 recipes with 3,804 unique ingredients, allowing us to
emphasize the most relevant and distinctive ingredients, while minimizing the influence of common,
less informative ingredients. During training, each recipe is passed through randomly dropping up to
three ingredients twice, which augments the data for learning. All ingredient embeddings are initialized
using a pretrained BERT to project each ingredient into 768-dimensional vectors. Code is available at:
https://github.com/ycao21/S2Srec2.
      </p>
      <sec id="sec-4-1">
        <title>4.2. Evaluation</title>
        <p>To evaluate set-to-set compatibility, we compare S2SRec2 against five baselines on the task of predicting
missing ingredient(s) from a pool of 3,804 candidates given a partial basket. All methods share the same
preprocessing and postprocessing pipeline to ensure fairness. The baselines are:
• Logistic Regression (LR): Bag-of-ingredients representation with pretrained embeddings,
followed by a multi-label logistic regression classifier.
• Vanilla Neural Network: Pretrained Word2Vec ingredient embeddings are passed through two
fully connected layers for multi-label ingredient prediction.
• Bi-LSTM: Arbitrary sequence structure on the unordered ingredient set to assess the impact of
explicitly modeled order information
• Kitchenette: Adapted Siamese-network for ingredient pairing model, recontructed to accept
multiple ingredients and output multi-label predictions.
• Reciptor: Set Transformer-based recipe representation learner with multi-task objectives,
modiifed to produce multi-label ingredient outputs.</p>
        <p>We use Precision, Recall, and F1 score to evaluate the quality of the predicted ingredient set against
the ground truth, and Mean Squared Error (MSE) to assess the stop-head prediction.</p>
        <p>Let  be the predicted ingredient set and  the ground-truth set. Then
Precision = | ∩ | ,
| |</p>
        <p>Recall = | ∩ | ,
||</p>
        <p>F1 =
2 Precision × Recall
Precision + Recall</p>
        <p>MSE =</p>
        <p>1 ∑︁ (|| − | |)2
 =1</p>
        <p>These metrics are calculated using only the first  predictions made by the model. If the model
predicts more than  ingredients, only the top  are considered. This approach measures each model’s
ability to prioritize correct predictions early, focusing on the most relevant recommended ingredients.</p>
        <p>The experiments are conducted uniformly applied to all models with Adam Optimizer with an initial
learning rate of 10− 4, 500 batch size and 30 training epochs. For multi-task variants, we tuned joint-loss
weight  over {0.2,0.4,0.6,0.8} on the validation set, selecting  = 0.6 as the best trade-of between
ingredient retrieval and stop prediction. All neural network implementations are developed using
PyTorch 2.0. The experiments are carried out on a machine with two Nvidia T4 GPUs, providing a total
of 60GB of memory.
k=3</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.3. Results</title>
        <p>k=3
0.0286
0.0219
0.0304
0.0182
0.0331</p>
        <p>k=5</p>
        <p>To better understand the contribution of each component in the proposed S2SRec2 model, we
conduct an ablation study involving five variants, as shown in Table 2. Removing the stop head and
instead truncating the output at a fixed length (  = 3 or  = 5) results in a drop of precision and
F1. As the model is forced to output multiple ingredients regardless of confidence of completeness,
it increases the likelihood of covering true positives, yielding the highest recall at first 3, but at the
expense of precision and higher MSE due to over-generation. Replacing the stop head with a multi-label
classification mechanism using a high threshold also degrades precision and F1, due to the same reason.
When evaluating structural changes, we find that replacing the set-transformer with a
mean-poolingbased set encoder while retaining the stop head leads to a significant drop in all missing ingredient
prediction metrics. Replacing the set-transformer with a vanilla neural network shows the weakest
performance across precision, recall, and F1, confirming the importance of permutation-invariant set
modeling. Finally, the full S2SRec2 model achieves the best overall performance in precision, F1, and
MSE, demonstrating that both the decoder-based set aggregation and the dedicated stop prediction
mechanism are essential to accurate and controllable ingredient set completion.</p>
        <p>Table 3 presents qualitative examples comparing S2SRec2 with the best baseline model on the
ingredient basket completion task based on the precision. S2SRec2 is able to predict key missing
ingredients that align closely with the ground truth, while baseline models tend to over-generate
unrelated ingredients.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>In this paper, we introduced S2SRec2, a set-based recommendation framework for ingredient basket
completion that bridges the gap between grocery baskets and culinary experiences. Our approach
leverages a Set Transformer to capture inter-dependencies among ingredients within the basket, followed
by two complementary tasks. The first task employs a learnable query combined with cross-entropy
loss to predict missing ingredients, ensuring precise retrieval of complementary items. The second
task uses binary cross-entropy loss to assess basket completeness, determining whether the union of
existing and predicted ingredients forms a complete recipe. By jointly optimizing both tasks, S2SRec2
not only enhances ingredient compatibility but also ensures coherent ingredient recommendations.</p>
      <p>While S2SRec2 focuses on recommending complementary ingredients to complete a basket, it is
promising to be extended to multiple real-world e-commerce settings. For example, the representation
of a completed basket can be obtained from S2SRec2 and used to identify and surface full recipes
that match the completed basket. Furthermore, integrating user-specific signals such as past purchase
history and dietary restrictions into the set representation could personalize both ingredient and recipe
recommendations. In future work, we will deploy S2SRec2 in production environment at scale and
conduct online evaluations under live user trafic. Collectively, S2SRec2 can evolve from a standalone
complement recommender into a fully personalized, interactive recipe discovery and completion engine
for grocery e-commerce.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this paper, the author(s) used ChatGPT (GPT-4) for language-quality assistance
(grammar checking, rephrasing, and minor stylistic edits). The author(s) reviewed, edited, and approved
all content and take full responsibility for the final publication.
The Set Transformer consists of an encoder and a decoder, each serving a distinct purpose. The encoder
processes each element independently by performing self-attention among the elements in the set. This
results in an output set of equal size, where each element’s representation is enriched by its relationship
with the other elements. Set Encoder consists of two Induced Set Attention Blocks, which are a variation
of Set Attention Block that reduces the computing complexity of large sets. Both blocks are based on
Multi-head Attention, which computes attention scores of diferent projections of the input vector.</p>
      <p>Specifically, given a set of  ×  dimensioned query vectors  ∈ ×  and their corresponding
key-value pairs  ∈ ×  and  ∈ ×  as input, the attention function is defined as
(, ,  ) =  (( ) ),
1
where  = √ is the scale function. The attention function computes the dot product of  and  and
applies the scaling factor to output a weighted sum of  values, assigning a higher weight when the
dot product of the query and key is large.</p>
      <p>
        Instead of single-head attention, which only captures inter-element relationships in a single space,
multi-head Attention [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] is introduced to generate diferent outputs in multiple subspaces:
      </p>
      <p>The Multi-head Attention Blocks (MAB) in Set Transformer are built using the multi-head Attention
in combination with layer normalization.</p>
      <p>(,  ) =  ( +   ()),
where  =  ( +  ℎ(, , )).</p>
      <p>
        To improve the training eficiency, the Set Transformer proposed the Induced Set Attention Blocks
(ISAB) [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], which employ a trainable inducing vector  fed twice in a MAB, significantly reducing the
computing complexity especially for large sets.
      </p>
      <p>(, ,  ) = (1 ⊕ ... ⊕ ),
 = (,  ,   ).
(6)
(7)
(8)
(9)
(10)
(11)</p>
      <p>The decoder summarizes the set via pooling for downstream tasks. Rather than using
dimensionwise mean pooling, the Set transformer applies a row-wise feedforward layer (RFF) in conjunction
a Multi-head Attention-based Pooling layer (PMA) on a set of  trainable vectors . Given a set of
vectors  ∈ × , the entire process can be expressed as
   =  ((()),   ()),
 =  (, ).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Salvador</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Hynes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Aytar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Marin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ofli</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Weber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Torralba</surname>
          </string-name>
          ,
          <article-title>Learning cross-modal embeddings for cooking recipes and food images</article-title>
          ,
          <source>in: Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>3020</fpage>
          -
          <lpage>3028</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>M. De Clercq</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Stock</surname>
            , B. De Baets,
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Waegeman</surname>
          </string-name>
          ,
          <article-title>Data-driven recipe completion using machine learning methods</article-title>
          ,
          <source>Trends in Food Science &amp; Technology</source>
          <volume>49</volume>
          (
          <year>2016</year>
          )
          <fpage>1</fpage>
          -
          <lpage>13</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>N.</given-names>
            <surname>Nilesh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kumari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hazarika</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Raman</surname>
          </string-name>
          ,
          <article-title>Recommendation of indian cuisine recipes based on ingredients, in: 2019 IEEE 35th international conference on data engineering workshops (ICDEW)</article-title>
          , IEEE,
          <year>2019</year>
          , pp.
          <fpage>96</fpage>
          -
          <lpage>99</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Research on food recommendation method based on knowledge graph</article-title>
          ,
          <source>in: International Conference on Computer Science and Education</source>
          , Springer,
          <year>2022</year>
          , pp.
          <fpage>521</fpage>
          -
          <lpage>533</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Metoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. V.</given-names>
            <surname>Chawla</surname>
          </string-name>
          ,
          <article-title>Recipe2vec: Multi-modal recipe representation learning with graph neural networks</article-title>
          ,
          <source>in: Proceedings of the ThirtyFirst International Joint Conference on Artificial Intelligence (IJCAI-22)</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>3473</fpage>
          -
          <lpage>3479</lpage>
          . doi:
          <volume>10</volume>
          .24963/ijcai.
          <year>2022</year>
          /482.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Zaki</surname>
          </string-name>
          ,
          <string-name>
            <surname>Reciptor:</surname>
          </string-name>
          <article-title>An efective pretrained model for recipe representation learning</article-title>
          ,
          <source>in: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery &amp; data mining</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1719</fpage>
          -
          <lpage>1727</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Gim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Spranger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Maruyama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kang</surname>
          </string-name>
          ,
          <article-title>Recipebowl: A cooking recommender for ingredients and recipes using set transformer</article-title>
          ,
          <source>IEEE Access 9</source>
          (
          <year>2021</year>
          )
          <fpage>143623</fpage>
          -
          <lpage>143633</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Gim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Maruyama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Park</surname>
          </string-name>
          , J. Kang,
          <article-title>Recipemind: guiding ingredient choices from food pairing to recipe completion using cascaded set transformer</article-title>
          ,
          <source>in: Proceedings of the 31st ACM international conference on information &amp; knowledge management</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>3092</fpage>
          -
          <lpage>3102</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P. F.</given-names>
            <surname>Cueto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Roet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Słowik</surname>
          </string-name>
          ,
          <article-title>Completing partial recipes using item-based collaborative filtering to recommend ingredients</article-title>
          ,
          <year>2020</year>
          . arXiv:
          <year>1907</year>
          .12380.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          , L.-Y. Duan,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Jing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Song</surname>
          </string-name>
          , L. Nie, Market2dish:
          <article-title>Health-aware food recommendation</article-title>
          ,
          <source>ACM Transactions on Multimedia Computing</source>
          , Communications, and
          <string-name>
            <surname>Applications</surname>
          </string-name>
          (TOMM)
          <volume>17</volume>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>19</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Metoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. V.</given-names>
            <surname>Chawla</surname>
          </string-name>
          ,
          <article-title>Reciperec: A heterogeneous graph learning model for recipe recommendation</article-title>
          ,
          <year>2022</year>
          . arXiv:
          <volume>2205</volume>
          .
          <fpage>14005</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>D.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Zaki</surname>
          </string-name>
          , C.-h. Chen,
          <article-title>Health-guided recipe recommendation over knowledge graphs</article-title>
          ,
          <source>Journal of Web Semantics</source>
          <volume>75</volume>
          (
          <year>2023</year>
          )
          <fpage>100743</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>H.</given-names>
            <surname>Neelam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. S.</given-names>
            <surname>Veerella</surname>
          </string-name>
          ,
          <article-title>Enhancing personalized recipe recommendation through multi-class classification</article-title>
          ,
          <source>International Journal of Computer Science, Engineering and Information Technology</source>
          <volume>13</volume>
          (
          <year>2024</year>
          ). doi:
          <volume>10</volume>
          .5121/ijcseit.
          <year>2024</year>
          .
          <volume>14502</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kosiorek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. W.</given-names>
            <surname>Teh</surname>
          </string-name>
          ,
          <article-title>Set transformer: A framework for attentionbased permutation-invariant neural networks</article-title>
          ,
          <source>in: International conference on machine learning, PMLR</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>3744</fpage>
          -
          <lpage>3753</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>D.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kang</surname>
          </string-name>
          ,
          <article-title>Kitchenette: Predicting and ranking food ingredient pairings using siamese neural network</article-title>
          ,
          <source>in: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-2019, International Joint Conferences on Artificial Intelligence Organization</source>
          ,
          <year>2019</year>
          , p.
          <fpage>5930</fpage>
          -
          <lpage>5936</lpage>
          . doi:
          <volume>10</volume>
          .24963/ijcai.
          <year>2019</year>
          /822.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , Ł. Kaiser,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>