<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Journal of Machine
Learning Research 22 (2021) 1-6. URL: http://jmlr.org/papers/v22/20</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1016/j.ipm.2019.05.012</article-id>
      <title-group>
        <article-title>RecipeRAG: A Knowledge Graph-Driven Approach to Personalized Recipe Retrieval and Generation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Julie Loesch</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Emin Durmuş</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Remzi Celebi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Advanced Computing Sciences, Maastricht University</institution>
          ,
          <addr-line>Paul-Henri Spaaklaan 1, Maastricht, 6229 EN</addr-line>
          ,
          <country country="NL">Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Kirklareli University</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <volume>26</volume>
      <fpage>94</fpage>
      <lpage>97</lpage>
      <abstract>
        <p>Recommending or generating a recipe that satisfies the growing diversity of dietary needs and preferences is a significant challenge for many people. Food choices are influenced by a mix of factors (taste, dietary restriction, health, availability of ingredients), yet many algorithms are optimized for single-criterion decisions. In this work, we introduce RecipeRAG, a novel knowledge graph based retrieval-augmented generation system for personalized recipe generation based on users' multiple criteria. We construct RecipeKG, a knowledge graph derived from Food.com and enriched with additional semantic tags, to capture complex relationships between recipes and their related concepts and use knowledge graph embedding models for retrieval. RecipeRAG employs multi-criteria information retrieval on RecipeKG on user-defined constraints, followed by a large language model to generate personalized recipes. Our experiments demonstrate that RecipeRAG outperforms existing methods in both retrieval and generation tasks, producing high-quality personalized recipes that meet multiple constraints. RecipeRAG ofers a promising solution to make a connection between traditional recipes and evolving nutritional and dietary needs, allowing for more flexible and personalized cooking options.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Knowledge Graph (KG)</kwd>
        <kwd>Large Language Model (LLM)</kwd>
        <kwd>Retrieval-Augmented Generation (RAG)</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Designing or adapting recipes under multiple constraints—such as dietary needs, taste, nutrition,
availability, and cultural norms—is a complex multi-objective combinatorial problem. Traditional
approaches, including rule-based systems and single-objective optimization, struggle to handle these
competing goals simultaneously [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Recent models like GISMo [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and SHARE [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] support ingredient
substitution and recipe editing under dietary constraints. However, they face key limitations: low
substitution accuracy, limited modeling of taste and function, lack of multi-objective optimization, and
minimal user interaction or explainability.
      </p>
      <p>
        To address these gaps, intelligent recipe generation systems must move beyond surface-level keyword
matching to reason about the underlying semantics of food. This requires a deep understanding of
ingredient properties, culinary techniques, and their interrelationships. Food ontologies provide this
foundation by capturing rich, structured knowledge on nutrition, ingredient hierarchies, cooking
methods, and safety considerations [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Leveraging such ontologies enables more informed,
contextaware recipe recommendations that better satisfy complex, user-defined constraints.
      </p>
      <p>
        Recent research explores the potential of generating recipes based on specific constraints, enabling
systems to create dishes that align with user requirements. However, most existing studies optimize
for a single constraint at a time. For instance, Kazama et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] focus on regional cuisine styles, while
Shirai et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] develop models that generate nutritionally optimized recipes. Morales-Garzón et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
concentrate on vegetarian recipe generation. Li et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] propose a system for editing recipes to
accommodate dietary restrictions; however, their approach considers only recipe titles and ingredients,
neglecting cooking instructions, which are critical for modeling preparation logic and ingredient
interactions.
      </p>
      <p>Beyond these limitations, recent LLM-based models introduce new challenges. They often hallucinate
unsafe or impractical recipes, such as incompatible ingredient substitutions or incomplete cooking
procedures. Furthermore, many systems lack the ability to handle multiple, user-defined constraints
simultaneously, and often operate without integrating structured knowledge like food ontologies. This
results in limited adaptability, poor interpretability, and unreliable outcomes, especially when recipes
must align with complex health or cultural considerations.</p>
      <p>Retrieval-Augmented Generation (RAG) systems can assist in generating recipes that meet specific
constraints while considering the full recipe context. These systems enhance text generation by
integrating external knowledge. Given a user’s input, RAG models retrieve the top  relevant knowledge
pieces to inform the generation process. While basic RAG systems typically rely on textual data, other
sources, such as ontologies or knowledge graphs, can also serve as their knowledge base. The use of
knowledge graph-based RAG systems has recently gained increasing attention.</p>
      <p>Consequently, this paper introduces a knowledge graph-based retrieval-augmented generation
framework for recipe generation, called RecipeRAG. The proposed approach enables users to generate
recipes while incorporating constraints. The contributions of our work are as follows:
• A new recipe knowledge graph, RecipeKG, is constructed from Food.com.
• A retrieval-augmented generation system is developed that leverages a knowledge graph to
generate personalized recipes.
• Instead of relying solely on direct similarity matching between user queries and recipes,
RecipeRAG introduces a novel retrieval method that frames recipe recommendation as a
multicriteria link prediction task, enabling the system to optimize across user-defined constraints.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>Various approaches have been proposed for recipe generation. In this section, three distinct subsets of
recipe generation methods will be examined.</p>
      <sec id="sec-2-1">
        <title>2.1. Generating Cooking Instructions</title>
        <p>Instruction generation for cooking recipes has advanced significantly with the integration of large
language models (LLMs) and traditional language models. These methods focus on producing coherent,
structured, and detailed cooking instructions.</p>
        <p>
          Fine-tuned GPT-2 models have demonstrated strong capabilities in this area. RecipeGPT [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] supports
both ingredient and instruction generation, while Goel et al. [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] trained GPT-2 on the RecipeDB dataset
to generate novel recipe instructions. Similarly, Hwang et al. [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] employed a prompt-based framework
with LLMs to simplify recipe steps while retaining essential details.
        </p>
        <p>
          More recent advancements include multimodal approaches. LLaVa-Chef [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] fine-tunes the LLaVA
model to integrate visual and textual data, enhancing instruction accuracy and outperforming existing
models like GPT-2 and LLaMA in instruction quality.
        </p>
        <p>
          Beyond LLMs, traditional language models continue to contribute. Liu et al. [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] proposed a
counterfactual recipe generation method that modifies a base recipe based on a change in an ingredient,
while logically adjusting the subsequent steps. Transformer-based models have also gained traction.
Majumder et al. [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] expanded recipe instructions from incomplete ingredient lists, while Liu et al. [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]
introduced a structured, step-by-step planning approach for refining generated instructions.
        </p>
        <p>
          Li et al. [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] combined recipe editing with instruction generation, using a copy attention mechanism
to align cooking steps with modified ingredient lists.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Generating Ingredient Selection</title>
        <p>Ingredient selection plays a crucial role in shaping the flavor profile and structure of a recipe.
Computational methods have been developed to suggest novel ingredient combinations, leveraging knowledge
graphs and embedding models.</p>
        <p>
          Pini et al. [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] constructed a knowledge graph from food databases to recommend new ingredient
combinations based on similarity metrics. By capturing ingredient features such as flavor profiles
and functional attributes, their tool generates ranked ingredient suggestions tailored to user-defined
constraints.
        </p>
        <p>
          Embedding models have also been employed for ingredient substitutions. Chen et al. [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] introduced
NutRec, a framework that modifies ingredient lists to create healthier recipes. NutRec first predicts
potential ingredients that could be added to the original ingredient list using an embedding-based model
and second estimates optimal ingredient quantities with a neural network, ensuring nutritional balance.
        </p>
        <p>Similarly, Pan et al. [17] proposed a two-step method for generating recipes with novel ingredients.
They vectorized ingredients using Doc2Vec, identified substitutions based on cosine similarity, and
generated coherent recipes using N-gram and LSTM-based models.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Constraint-Based Generation</title>
        <p>Generating recipes that align with user preferences (i.e., dietary restrictions, ingredient availability, or
nutritional needs) is a complex challenge. Various approaches leverage recipe embeddings, knowledge
graphs, and transformer-based models to address this.</p>
        <p>
          Kazama et al. [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] introduced a framework that converts traditional recipes into regional variations.
This system includes a neural network that calculates the contribution of each ingredient to specific
regional cuisines and an extended Word2Vec model that recommends new ingredients matching the
target regional style while maintaining high similarity to the original recipe.
        </p>
        <p>
          Similarly, Morales-Garzón et al. [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] proposed an unsupervised recipe editing method using Word2Vec
embeddings trained on cooking texts. Ingredients from original recipes are mapped to a food composition
database, and unsuitable ingredients are replaced with alternatives that meet specified constraints.
        </p>
        <p>
          Shirai et al. [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] developed a constraint-based ingredient substitution method using FoodKG, leveraging
knowledge graphs to capture rich semantic relationships.
        </p>
        <p>
          Li et al. [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] introduced SHARE (System for Hierarchical Assistive Recipe Editing), a transformer-based
model for recipe editing under dietary constraints. SHARE uses two encoder-decoder networks: the
ifrst replaces ingredients based on user constraints, and the second generates new cooking instructions
tailored to the modified recipe.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Data</title>
      <p>This section describes the construction of the new knowledge graph from Food.com, called RecipeKG,
as well as the generation of the ground truth datasets used to evaluate the retrieval and generation
modules of RecipeRAG.</p>
      <sec id="sec-3-1">
        <title>3.1. RecipeKG (Training Set)</title>
        <p>The dataset used in this paper is sourced from Food.com-Recipes and Reviews1. We removed recipes
containing null values from the original dataset. For the remaining 88,519 recipes, we generated labels
based on nutritional information and user-generated descriptions, categorizing them into 6 distinct
categories. The description of each category is presented in Table 1.</p>
        <p>However, the dataset, which contains 88,519 recipes, exhibits highly imbalanced label distributions
across categories.To balance the dataset, we grouped some of the underrepresented labels based on
1https://www.kaggle.com/datasets/irkaal/foodcom-recipes-and-reviews
similar attributes. After reducing the number of samples from overrepresented classes, a total of 6,754
recipes were used to construct RecipeKG.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Dataset to Evaluate Multi-criteria Retrieval (Retrieval Test Set 1)</title>
        <p>To assess the retrieval performance of RecipeRAG, we created five separate ground truth subsets from our
collection of 6,754 recipes to evaluate diferent levels of user query complexity. Each subset represents
a specific number of combined user criteria, ranging from 1 to 5, and contains 100 unique combinations
of criteria. An example of a 2 criteria set is (hasDietType = Vegetarian and isFromRegion =
Asian), which means that only two criteria are taken into account simultaneously. This allows us to
measure how the system performs on various queries, from simple to more complex queries.</p>
        <p>To generate the combination sets, we used all possible options for each criterion, as the number of
options was relatively small, ranging from 3 to 10. For example, hasCarbLevel can take three values
(low_carb, medium_carb, high_carb), while Region can have up to 10 options (africa,
asia, europe, global, indian, latin_america_and_caribbean, mediterranean,
middle_east, north_america, oceania).</p>
        <p>For smaller criteria combinations (1 and 2), we exhaustively enumerated all possible combinations
and retained those for which at least one matching recipe was found. For larger set sizes (3 to 5), where
the combinatorial space increases exponentially, we employed a stratified sampling strategy to ensure
both feasibility and representativeness. The procedure was as follows:
1. All possible combinations were generated and filtered to retain only those with at least one
corresponding recipe.
2. We computed dynamic bin sizes using the Freedman–Diaconis rule [18], which determines the
optimal bin width based on the interquartile range (IQR) and sample size.
3. The combinations were then grouped into quantile-based bins according to their match count
distributions.
4. For each bin, we randomly selected 50% of the data.</p>
        <p>This methodology produced a balanced set, referred to as Retrieval Test Set 1, which includes both
simple single-criterion queries and more complex multi-criteria combinations. The total number of
possible and sampled combinations for Retrieval Test Set 1 is presented in Table 3. Table 2 provides an
overview of all criteria and their corresponding values considered in this study.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Dataset to Evaluate Multi-ingredient Retrieval (Retrieval Test Set 2)</title>
        <p>In addition, we developed a dedicated test set focusing exclusively on ingredient-based recipe retrieval.
This test set aims to assess the system’s performance on a common real-world task: retrieving recipes
that contain specific combinations of ingredients.</p>
        <p>We began by conducting a comprehensive analysis of the ingredient distribution within the dataset.
From a total of 88,519 recipes, we identified 1,646 unique ingredients, exhibiting a characteristic
longtail distribution. A small subset of staple ingredients (i.e., salt, butter, sugar) appeared in thousands of
recipes, while the majority of specialty ingredients occurred only infrequently.</p>
        <p>To construct a representative yet computationally manageable sample of ingredients, we adopted the
following methodology:
1. We generated a frequency distribution of all ingredients and confirmed a power-law distribution
via log-log plotting.
2. We applied equal-frequency (quantile) binning, using a dynamically determined bin count of 16,
computed via the Freedman–Diaconis rule to accommodate data dispersion and sample size.
3. Within each bin, we performed a stratified sampling of 25%, resulting in a total of 412
representative ingredients covering the full frequency spectrum, from ubiquitous staples to rare
ingredients.
4. An inverted index was then constructed, mapping each selected ingredient to the corresponding
set of recipes, enabling eficient intersection-based queries.</p>
        <p>Using these 412 representative ingredients, we generated combinations of sizes 1 through 4 and
evaluated their retrieval potential:
1. All possible combinations were computed using the binomial coeficient.
2. For each combination, the set of matching recipes was determined via intersection operations on
the inverted index.
3. Only combinations with at least one matching recipe were retained for evaluation.</p>
        <p>Table 3 presents the number of all possible and retained (sampled) ingredient combinations for
Retrieval Test Set 2.</p>
        <p>This systematic approach enables rigorous evaluation of ingredient-based retrieval capabilities,
encompassing a wide range of query complexities from single-ingredient lookups to intricate
multiingredient constraints.
To evaluate the generation capabilities of RecipeRAG, we randomly selected criteria from either Retrieval
Test Set 1 or Retrieval Test Set 2. Subsequently, a random integer between 2 and 5 was drawn to
determine the number of criteria to include. For Retrieval Test Set 2, which consists solely of ingredients
and is relatively large, we focused on frequently occurring ingredients. Ingredient frequencies were
calculated and categorized into low, medium, and high-frequency groups, with the selection limited to
those in the high-frequency category.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology</title>
      <p>This section details the KG-based RAG system we developed to generate personalized recipes based on
given user criteria. Figure 1 illustrates the overall structure and workflow of RecipeRAG.</p>
      <p>RecipeKG (triples)</p>
      <p>User Criteria {1, . . . , }
KG Embedding Training (TransE, RotatE, QuatE)</p>
      <p>Construct Query Triples (, , )</p>
      <p>Scoring function: (, , )
Normalization (MinMax) ^(, , )
Aggregation:  = ∑︀ ^(, , )</p>
      <p>Top-N Recipe Selection
LLM: DeepSeek-R1-0528 + Top-N + Criteria</p>
      <p>Customized Recipe Output</p>
      <p>RecipeRAG is composed of two main components: a knowledge graph-based retrieval module and a
recipe generation module. Figure 1 highlights the retrieval module, which uses RecipeKG (depicted in
Figure 2), a knowledge graph constructed from Food.com recipe data. This knowledge graph represents
both recipes and user criteria as entities, along with the relationships between them.</p>
      <p>Unlike traditional RAG systems that leverage direct graph structure, RecipeRAG employs a KG
embedding-based scoring mechanism to retrieve relevant recipes. This process consists of following
steps:
1. First, KG embedding models (i.e., TransE [19], RotatE [20] and QuatE [21]) were trained on</p>
      <p>
        RecipeKG to learn the embeddings of recipes, user criteria, and their relationships.
2. The system then applies multi-criteria optimization to calculate a plausibility score for each recipe
with respect to each criterion. This score is derived from the embedding model’s scoring function,
reflecting how strongly a given connection is plausible.
be the set of candidate recipes,
be the set of user-specified criteria ,
be the scoring function from a trained KG embedding model
which evaluates the plausibility of a triple (, , ),
where  is a recipe,  is the relation for criterion  ,
and  is the entity for criterion  .
3. Each score for a criterion is then normalized to the [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] range using MinMaxScaler to ensure
that the scores are on the same scale.
4. After normalization, the scores for each criterion are finally aggregated by recipe ID and then
summed as follows:
      </p>
      <p>The aggregated score for recipe  is:</p>
      <p>= ∑︁ ^ (, , ),</p>
      <p>=1
where ^ (, , )is the normalized score using MinMax scaling across recipes
for each criterion  .</p>
      <p>This aggregation produces a single overall score for each recipe that reflects how well a recipe
meets all the user’s specified criteria. This prevents any single criterion from dominating and
ensures that multiple criteria are evaluated in a balanced way. Taking into account the contribution
of each criterion, the scoring mechanism allows us to select the most appropriate recipes even in
cases of incomplete or conflicting labels. Instead of relying solely on exact matches, the aggregated
score ofers a flexible and generalizable evaluation that considers all aspects of the user’s input.
5. The retrieval process then selects the top-ranking recipes based on the aggregated scores. After
retrieving the top  (N=5 used in the experiments) most relevant recipes, the system utilizes
DeepSeek-R1-0528 to generate new recipes. This model incorporates the retrieved recipes as
context, along with the original user criteria, to produce customized recipe suggestions. If an
insuficient number of recipes were retrieved, a fallback mechanism was applied.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Experiments</title>
      <p>This section provides an overview of the experimental design used to evaluate the retrieval performance
of RecipeRAG with diferent knowledge graph embedding models, as well as the generation performance
for recipe recommendation and creation.</p>
      <sec id="sec-5-1">
        <title>5.1. Knowledge Graph Embedding Evaluation</title>
        <p>We trained three knowledge graph embedding (KGE) models to learn the vector representations of
entities and relations within our knowledge graph:
1. TransE: A translational distance model that interprets relations as translations in the embedding
space.
2. RotatE: A rotational model that represents relations through rotations in a complex vector space.
3. QuatE: A quaternion-based model that utilizes quaternions to encode entities and relations.</p>
        <p>These models were implemented using the PyKEEN framework, a Python library designed for training
and evaluating KGE models [22]. The training was conducted on our recipe knowledge graph, RecipeKG,
which encompasses entities such as recipes, ingredients, diet types, and meal types, along with their
corresponding relations.</p>
        <p>Since text embedding is a widely used method in traditional RAG systems, we incorporated two text
embedding baselines for comparison. The first baseline, GTE-Large, is a general-purpose text embedding
model trained with multi-stage contrastive [23]. The second baseline, all-MiniLM-L6-v2 (MiniLM) from
sentence-transformers, has been optimized for clustering and semantic search tasks. These models
were chosen for their efectiveness across various applications: GTE-large represents advancements in
text embedding techniques, while MiniLM is well-suited for semantic search, aligning closely with our
recipe retrieval objectives.</p>
        <p>To evaluate retrieval performance, we used the two test datasets described in Subsections 3.2 and
3.3. We then evaluated the retrieved recipes by comparing them to a predefined list of relevant recipes.
Precisely, we compute:
• Hits@: measures the percentage of positive examples that appear in the top- ranked predictions.
• Mean Reciprocal Rank (MRR): represents the average reciprocal rank, calculated by taking
the reciprocal of the rank (1/rank) of the first relevant item retrieved.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Recipe Generation Evaluation</title>
        <p>We chose RotatE, the strongest-performing knowledge graph embedding (KGE) model, to bootstrap
the recipe generation process. The generation pipeline operates as follows: first, the system retrieves
the top-ranking recipes based on aggregated relevance scores. From these, the top 5 most relevant
recipes are selected. These retrieved recipes, along with the original user criteria, are then fed into
DeepSeek-R1-0528, which generates new, tailored recipe suggestions. By leveraging both the user’s
input and the 5 retrieved examples as context, the model produces a customized recipe.</p>
        <p>For comparison, we repeated the process using two alternatives: (1) retrieving the top 5 examples
using MiniLM text embeddings, and (2) generating recipes with no example-based retrieval at all.</p>
        <p>To evaluate the results, we utilized a separate large language model, Mistral Small 3.2 24B, as an
LLM-as-a-judge [24, 25] to evaluate the generated recipes according to two criteria: Satisfiability
(i.e., whether all specified criteria are met) and Feasibility (i.e., whether the recipe can be realistically
prepared at home). Each criterion was rated on a scale from 1 (poor) to 5 (excellent), with no additional
explanation provided.</p>
        <p>We then computed the mean and standard deviation of these evaluation scores and compared the
results against baseline RAG methods—specifically, those using only text embeddings and those without
either knowledge graph or text embeddings.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Results</title>
      <p>In this section, we present both the results of our recipe retrieval and generation.</p>
      <sec id="sec-6-1">
        <title>6.1. Knowledge Graph Embedding Evaluation</title>
        <p>We conducted a comprehensive evaluation of our knowledge graph embedding models’ ability to
retrieve relevant recipes that satisfy constraints and compared their performance to two state-of-the-art
text embedding models. Specifically, to evaluate retrieval performance, we used the two test datasets
described in Subsections 3.2 and 3.3: (1) standard queries without ingredient constraints (criteria-based)
and (2) ingredient-only queries (ingredient-based). The results of these experiments are summarized in
Tables 4 and 5 by reporting the Hits@ and mean reciprocal rank (MRR) scores.</p>
        <p>MRR
(b) Results for 4 to 5 user criteria
4 criteria</p>
        <p>5 criteria
97.0
97.6
83.0
59.0
34.0
98.9
99.1
93.6
77.7
64.4
MRR
57.7
52.9
34.6
9.4
1.8</p>
        <p>MRR
91.0
89.2
66.2
33.3
12.7
95.0
95.9
83.5
54.2
32.7
MRR
66.8
64.1
45.8
15.8
4.5</p>
        <p>MRR
92.9
92.3
74.0
43.3
22.4</p>
        <p>The results in Table 4 show that knowledge graph embedding models, particularly RotatE and QuatE,
significantly outperform both classical (TransE) and language-based baselines (GTE-Large, MiniLM)
across all levels of retrieval dificulty. For 1 and 2 criteria, RotatE achieves nearly perfect performance
(MRR = 100 and 98.2), indicating that the model has efectively captured semantic structure. QuatE
closely follows and even outperforms RotatE on 3-criteria retrieval (MRR = 92.9 vs. 92.3). As the
number of criteria increases (4–5), the performance of all models declines, but RotatE and QuatE remain
robust, demonstrating strong generalization. In contrast, transformer-based models like MiniLM and
GTE-Large show steep performance degradation.</p>
        <p>The retrieval performance under ingredient constraints is more challenging across all models
compared to Retrieval Test Set 1. Despite this, RotatE consistently outperforms all baselines, achieving nearly
perfect results under 1 criterion (MRR = 99.6) and maintaining robust performance even as constraints
88.6
99.3
10.0
48.1
30.1
51.5
79.4
7.6
7.0
1.4</p>
        <p>MRR</p>
        <p>MRR
increase. QuatE follows closely, with good results for 1–2 criteria but showing more degradation beyond
3 criteria (MRR = 61.7 with 4 constraints). TransE and the language-based models (GTE-Large, MiniLM)
perform significantly worse under increasing constraint complexity.</p>
      </sec>
      <sec id="sec-6-2">
        <title>6.2. Recipe Generation Evaluation</title>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Discussion</title>
      <p>Retrieval. Across all models, performance declined as the number of criteria increased from one to
ifve, indicating that satisfying multiple user constraints significantly increases the complexity of the
retrieval task. Nevertheless, the results underscore the efectiveness of knowledge graph embeddings
(KGEs) in supporting multi-criteria recipe retrieval, while simultaneously revealing the limitations
of purely text-based embeddings in such scenarios. Transformer-based models such as MiniLM and
GTE-Large exhibited sharp performance drops, suggesting that they struggle to represent structured,
combinatorial constraints efectively without domain-specific fine-tuning. These findings highlight the
critical role of structured knowledge in enhancing retrieval performance, particularly when multiple
constraints must be jointly satisfied.</p>
      <p>Generation. In the generation task, the RAG model augmented with RotatE embeddings demonstrated
the best balance between quality and consistency. It achieved the highest satisfiability scores while
also exhibiting the lowest variance across both feasibility and satisfiability metrics. This suggests that
incorporating structured retrieval through KGEs enhances the generation of coherent and dependable
recipes. While the "No RAG" baseline showed slightly higher feasibility, it also displayed greater
variability, indicating less stable performance. Taken together, these results emphasize the advantage of
leveraging structured knowledge representations to improve both the robustness and overall quality of
recipe generation under complex conditions.</p>
    </sec>
    <sec id="sec-8">
      <title>8. Conclusion</title>
      <p>In this work, we introduced RecipeRAG, a novel RAG system that combines knowledge graph
embeddings (KGE) with LLMs to enhance recipe generation. We constructed RecipeKG from Food.com and
showed that KGE-based retrieval significantly outperforms traditional text-based methods in identifying
relevant recipes. Our experiments highlight that RotatE and QuatE embeddings ofer superior retrieval
performance compared to both classical models like TransE and language-based baselines such as
GTE-Large and MiniLM, especially as the number of user constraints increases.</p>
      <p>RecipeRAG leverages the structured nature of RecipeKG to retrieve recipes that closely match user
preferences and uses LLMs to generate coherent, personalized recipe texts. This integration enables
more accurate retrieval and better aligns the generated content with user-defined needs.</p>
      <p>Supplemental Material Statement. The dataset, code and relevant material are available at https:
//github.com/jloe2911/Recipe_RAG.</p>
    </sec>
    <sec id="sec-9">
      <title>Declaration on Generative AI</title>
      <p>We have used ChatGPT to address the grammatical errors and rephrase the sentences.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C.</given-names>
            <surname>Trattner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Elsweiler</surname>
          </string-name>
          ,
          <article-title>Food recommender systems: Important contributions, challenges and future research directions</article-title>
          ,
          <source>arXiv preprint arXiv:1711.02760</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Fatemi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Liu</surname>
          </string-name>
          , J. Han,
          <article-title>Gismo: Graph-based ingredient substitution modeling</article-title>
          ,
          <source>in: Proceedings of the 31st ACM International Conference on Information &amp; Knowledge Management (CIKM)</source>
          , ACM,
          <year>2022</year>
          , pp.
          <fpage>2931</fpage>
          -
          <lpage>2939</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ni</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. McAuley</surname>
          </string-name>
          ,
          <article-title>SHARE: a system for hierarchical assistive recipe editing</article-title>
          ,
          <source>in: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing</source>
          , Association for Computational Linguistics, Abu Dhabi, United Arab Emirates,
          <year>2022</year>
          , pp.
          <fpage>11077</fpage>
          -
          <lpage>11090</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .emnlp-main.
          <volume>761</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>W.</given-names>
            <surname>Min</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <article-title>Applications of knowledge graphs for food science and industry</article-title>
          ,
          <source>Patterns</source>
          <volume>3</volume>
          (
          <year>2022</year>
          )
          <article-title>100484</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.patter.
          <year>2022</year>
          .
          <volume>100484</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kazama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sugimoto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hosokawa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Matsushima</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. R.</given-names>
            <surname>Varshney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ishikawa</surname>
          </string-name>
          ,
          <article-title>A neural network system for transformation of regional cuisine style</article-title>
          ,
          <source>Frontiers in ICT 5</source>
          (
          <year>2018</year>
          )
          <article-title>14</article-title>
          . doi:
          <volume>10</volume>
          . 3389/fict.
          <year>2018</year>
          .
          <volume>00014</volume>
          . arXiv:
          <volume>1705</volume>
          .
          <fpage>03487</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Shirai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Seneviratne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. E.</given-names>
            <surname>Gordon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.-H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. L.</given-names>
            <surname>McGuinness</surname>
          </string-name>
          ,
          <article-title>Identifying Ingredient Substitutions Using a Knowledge Graph of Food</article-title>
          ,
          <source>Frontiers in Artificial Intelligence</source>
          <volume>3</volume>
          (
          <year>2021</year>
          )
          <article-title>621766</article-title>
          . doi:
          <volume>10</volume>
          .3389/frai.
          <year>2020</year>
          .
          <volume>621766</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Morales-Garzon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gomez-Romero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Martin-Bautista</surname>
          </string-name>
          ,
          <article-title>A Word Embedding-Based Method for Unsupervised Adaptation of Cooking Recipes, IEEE Access 9 (</article-title>
          <year>2021</year>
          )
          <fpage>27389</fpage>
          -
          <lpage>27404</lpage>
          . doi:
          <volume>10</volume>
          . 1109/ACCESS.
          <year>2021</year>
          .
          <volume>3058559</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>H. H.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Achananuparp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. K.</given-names>
            <surname>Prasetyo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Lim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. R.</given-names>
            <surname>Varshney</surname>
          </string-name>
          ,
          <article-title>Recipegpt: generative pre-training based cooking recipe generation and evaluation system</article-title>
          ,
          <source>Companion Proceedings of the Web Conference</source>
          <year>2020</year>
          (
          <year>2020</year>
          ). doi:
          <volume>10</volume>
          .1145/3366424.3383536.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Goel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Chakraborty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ponnaganti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tatipamala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Saini</surname>
          </string-name>
          , G. Bagler,
          <article-title>Ratatouille: A tool for Novel Recipe Generation</article-title>
          ,
          <source>in: 2022 IEEE 38th International Conference on Data Engineering Workshops (ICDEW)</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>107</fpage>
          -
          <lpage>110</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICDEW55742.
          <year>2022</year>
          .
          <volume>00022</volume>
          . arXiv:
          <volume>2206</volume>
          .
          <fpage>08267</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hwang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Hou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Roth</surname>
          </string-name>
          ,
          <source>Large Language Models as Sous Chefs: Revising Recipes with GPT-3</source>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2306</volume>
          .
          <fpage>13986</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>F.</given-names>
            <surname>Mohbat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Zaki</surname>
          </string-name>
          ,
          <string-name>
            <surname>LLaVA-Chef</surname>
          </string-name>
          :
          <article-title>A Multi-modal Generative Model for Food Recipes</article-title>
          , ???? doi:10.1145/3627673.3679562. arXiv:
          <volume>2408</volume>
          .
          <fpage>16889</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <source>Counterfactual Recipe Generation: Exploring Compositional Generalization in a Realistic Scenario</source>
          ,
          <year>2022</year>
          . arXiv:
          <volume>2210</volume>
          .
          <fpage>11431</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>B. P.</given-names>
            <surname>Majumder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ni</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. McAuley</surname>
          </string-name>
          ,
          <article-title>Generating personalized recipes from historical user preferences</article-title>
          , in: K. Inui,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ng</surname>
          </string-name>
          ,
          <string-name>
            <surname>X.</surname>
          </string-name>
          Wan (Eds.),
          <source>Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Hong Kong, China,
          <year>2019</year>
          , pp.
          <fpage>5976</fpage>
          -
          <lpage>5982</lpage>
          . URL: https://aclanthology.org/D19-1613. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D19</fpage>
          -1613.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Shareghi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Collier</surname>
          </string-name>
          ,
          <string-name>
            <surname>Plug-</surname>
          </string-name>
          and
          <article-title>-play recipe generation with content planning</article-title>
          , in: A.
          <string-name>
            <surname>Bosselut</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Chandu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Dhole</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Gangal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Gehrmann</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Jernite</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Novikova</surname>
          </string-name>
          , L. PerezBeltrachini (Eds.),
          <source>Proceedings of the 2nd Workshop on Natural Language Generation</source>
          , Evaluation, and
          <string-name>
            <surname>Metrics</surname>
          </string-name>
          (GEM),
          <article-title>Association for Computational Linguistics</article-title>
          , Abu Dhabi,
          <source>United Arab Emirates (Hybrid)</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>223</fpage>
          -
          <lpage>234</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .gem-
          <volume>1</volume>
          .19. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2022</year>
          .gem-
          <volume>1</volume>
          .
          <fpage>19</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>A.</given-names>
            <surname>Pini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hayes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Upton</surname>
          </string-name>
          , M. Corcoran, AI Inspired Recipes:
          <article-title>Designing Computationally Creative Food Combos, in: Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems</article-title>
          , ACM, Glasgow Scotland Uk,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          . doi:
          <volume>10</volume>
          .1145/3290607.3312948.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Jia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Gorbonos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. T.</given-names>
            <surname>Hoang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yu</surname>
          </string-name>
          , Y. Liu,
          <article-title>Eating healthier: Exploring nutrition information for healthier recipe recommendation</article-title>
          ,
          <source>Information Processing &amp; Management 57</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>