<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Approaching LLM Alignment using Agents with RAG⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Vladyslav Fliahin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olena Turuta</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oleksii Turuta</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Kharkiv National University of Radio Electronics</institution>
          ,
          <addr-line>Nauky Ave. 14, Kharkiv, 61000</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>V. N. Karazin Kharkiv National University</institution>
          ,
          <addr-line>Svobody Square, 4, Kharkiv, 61022</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>This paper focuses on contributing scalable LLM alignment approaches to generate outputs close to human goals and values. As AI models become more advanced, alignment becomes increasingly critical. This article explores a novel approach using agents and Retrieval-Augmented Generation (RAG) for alignment. We create a custom knowledge graph based on the Flickr30k subset. Leverage a Neo4j database to store predefined entities and their relationships, which serve as constraints for model outputs. We use RAG to guide the model's generation by focusing only on the relevant entities and relationships detected in an image, ensuring alignment with structured knowledge while ignoring irrelevant details.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;LLM</kwd>
        <kwd>Alignment</kwd>
        <kwd>RAG</kwd>
        <kwd>Knowledge Graph</kwd>
        <kwd>Multimodal Data 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>1.1. Problem statement</title>
        <p>
          As large language models (LLMs) become more capable [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], the challenge of aligning their outputs
with human intent, ethical constraints, and factual accuracy becomes increasingly important.
While traditional alignment methods focus on reinforcement learning with human feedback
(RLHF) or prompt engineering, these approaches often lack fine-grained control over specific
aspects of model behavior. In particular, ensuring that an LLM only considers predefined entities
and relationships, especially when processing complex multimodal data like images, remains an
open challenge.
        </p>
        <p>This paper focuses on contributing scalable LLM alignment approaches to generate outputs close to
human goals and values.</p>
      </sec>
      <sec id="sec-1-2">
        <title>1.2. Overview of our method</title>
        <p>In this work, we propose a novel approach to LLM alignment that leverages agents,
RetrievalAugmented Generation (RAG), and knowledge graphs to enforce controlled generation. We store a
structured representation of allowed entities and their relationships in a Neo4j knowledge graph,
which acts as a constraint system. When analyzing an image, our method first detects the entities
present, retrieves only the corresponding allowed relationships from the database, and then guides
the LLM’s generation using RAG. This ensures that the model adheres strictly to the predefined
knowledge constraints, avoiding irrelevant or undesired outputs.</p>
        <p>Our contributions are as follows:</p>
        <p> Empirical validation demonstrates how this approach improves alignment
precision while reducing hallucination.</p>
        <p>The rest of this paper is structured as follows: Section 2 discusses related work on LLM
alignment, knowledge-grounded generation systems, and agents. Section 3 describes our
methodology, including the role of LLMs, agents, RAG, and Neo4j. Section 4 presents our
experiments and results. Section 5 concludes with key insights and future research directions.
Section 6 outlines the limitations of the developed framework.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Relevant work</title>
      <p>
        Large Language Models (LLMs) have advanced in structured reasoning through techniques like
Chain-of-Thought (the authors showed how such reasoning abilities emerge naturally in
sufficiently large language models via a simple prompting [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]), Self-Consistency (the authors
propose a new decoding strategy, self-consistency, to replace the naive greedy decoding used in
chain-of-thought prompting; it first samples a diverse set of reasoning paths instead of only taking
the greedy one, and then selects the most consistent answer by marginalizing out the sampled
reasoning paths [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]), and Tree-of-Thought (authors introduced a new framework for language
model inference, Tree of Thoughts (ToT), which generalizes over the popular Chain of Thought
approach to prompting language models, and enables exploration over coherent units of text
(thoughts) that serve as intermediate steps toward problem solving [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]), improving inference by
generating intermediate steps rather than relying on greedy decoding [
        <xref ref-type="bibr" rid="ref10 ref5 ref6 ref7 ref8 ref9">5, 6, 7, 8, 9, 10</xref>
        ]. In case the
amount of information is too big to fit into a prompt, prior works have used knowledge storage,
such as knowledge graphs.
      </p>
      <p>
        Knowledge Graphs (KGs) are structured repositories of interconnected entities and
relationships, offering efficient graph-based knowledge representation and retrieval [
        <xref ref-type="bibr" rid="ref11 ref12 ref13">11, 12, 13</xref>
        ].
      </p>
      <p>
        Prior work combining KGs with LLMs has primarily focused on tasks such as knowledge-based
question answering [
        <xref ref-type="bibr" rid="ref14">14, 15, 16, 17</xref>
        ], entity-centric retrieval [18, 19, 20], and fact-checking [21, 22,
23].
      </p>
      <p>However, in the described applications, the obtained data was mainly used to extract the correct
answer or infer the answer from it. Our work focuses on utilizing the extracted data as supporting
information.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Agents with RAG (ARAG)</title>
      <p>Our approach combines Neo4j knowledge graphs, Retrieval-Augmented Generation (RAG), and
agent-based processing to enforce alignment constraints in large language models (LLMs) (see Fig.
1). This section details our framework, outlining how entities and relationships are stored,
retrieved, and used to control LLM-generated descriptions of images.</p>
      <sec id="sec-3-1">
        <title>3.1. Overview of our method</title>
        <sec id="sec-3-1-1">
          <title>The proposed system consists of three main components:</title>
          <p> Knowledge Graph: Stores predefined entities and relationships that outline the
allowed knowledge constraints.</p>
          <p> Image Processing Module: Extracts entities from the image using the VLM.
 RAG-Enhanced LLM Agent: Retrieves relevant entities and relationships from
Neo4j and conditions the model’s output within those constraints.
3.2. Knowledge graph
Representation. To build the alignment graph, we incorporated the Flickr30k dataset [24],
containing 29000 train, 1014 validation, and 1000 test samples. Each sample from the dataset has
the respective image and 5 English captions. Due to resource limitations, we built the graph on top
of the first 100 train samples from the dataset. We have checked two different setups:
 Extracting entities and relations from the image captions (this turned out to
provide a small number of entities and relations and was not used for the full-size
experiments)</p>
          <p> Extracting entities and relations directly from images (was used for the full-size
experiments)</p>
          <p>The first approach led to the 184 entities and 226 relations, while the second one led to the 231
entities and 404 relations present in the DB. In both cases, GPT-4o was used as a base LLM to
generate the DB.</p>
          <p>We structure our Neo4j database as a directed graph where:

</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>Nodes represent entities (e.g., "Person," "Vehicle"). Edges define relationships between entities (e.g., "drives," "owns").</title>
        </sec>
        <sec id="sec-3-1-3">
          <title>A sample knowledge graph structure:</title>
          <p>(:Person)-[:OWNS]-&gt;(:Vehicle)
(:Vehicle)-[:IS_LOCATED_AT]→(:Building)</p>
          <p>In the schema above, there are different relations: OWNS and IS_LOCATED_AT. We decided to
store the relations under the single RELATES relation and to store the exact relation type as a type
attribute. The same procedure was performed with the Person, Vehicle, and Building combined into
a single Entity with the name attribute. The actual knowledge graph snapshot is depicted in Fig. 2.</p>
          <p>Such a structure ensures that if a "Person" and "Vehicle" appear in an image, only the predefined
"OWNS" relationship is considered during LLM generation. There could also be introduced a lot of
additional attributes for each of the entities and relations, but for the sake of simplicity, we did not
incorporate much metadata.</p>
          <p>Querying. The system queries Neo4j to retrieve only the relations that match the list of
entities. We have implemented an optional strict mode to customize the retrieval process. The
strict case is when we extract two entities, cat and dog, then we extract only relations between
them, like dog-&gt;bark-&gt;cat. An unstrict case would also return all the relations that match only one
entity, like dog-&gt;eat-&gt;food, dog-&gt;is playing-&gt;ball.</p>
          <p>Extracted entities matching. This step is optional and is used only during inference. By
utilizing the vector embeddings, we match the extracted entities for a particular image with the list
of entities present in the DB. This step is required because some entities may not be consistently
extracted every time (e.g., windows/window, man/human, building/house). Without handling such
situations, the results may be much worse than expected. We utilize the vectors obtained using the
OpenAI Embeddings API. It provides us with 64-dimensional normalized embeddings that are
further compared using a cosine similarity score and a 0.7 threshold.</p>
          <p>In addition, we incorporated a caching procedure that pre-calculates embeddings for all the new
strings and then uses them in case we face the same entity. Caching significantly reduces the actual
costs for this operation. This is a local version of the semantic search that could be used on the
Neo4j side, but was simplified to the local version. In the production scenario, it should be
performed by a single operation in Neo4j.
3.3. Image Processing
The image processing module detects objects and extracts their textual representations (e.g., "car,"
"building"). Right now, we utilize the VLM for this task, including the input image and the prompt.
This works fine for now, but can be substituted by any Zero-Shot model like Grounding DINO or
something similar in the future, in case cost reduction is needed for production applications.
3.4. Vanilla LLM
In our setup, Vanilla LLM calls do not use DB or any other tools to perform the task.
3.5. Agent Processing
There are two different types of agents that we could use: tool-calling agents and code agents. After
preliminary experiments, we found out that code agents perform better in planning and aligning
with the step-by-step nature of the instructions.</p>
          <p>During the implementations, we utilized the smolagents framework as a core of the agent
backend. ReAct was used as the agent’s planning strategy.</p>
          <p>RAG-Enhanced Generation. The agent conditions the LLM using retrieved knowledge,
ensuring it describes only the allowed entities and relationships. Furthermore, we prompted the
model to identify the exact positions of the entities in the image. The are nine available values for
position: top-left, top-center, top-right, center-left, center, center-right, bottom-left, bottom-center,
and bottom-right. After the initial description generation, the agent defines the positions for each
entity and verifies its answer with the DB to minimize the possible hallucinations.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <p>To accept or reject the alignment improvements, we compared 10 head-to-head approaches
utilizing 50 samples and three different metrics.
4.1. Data</p>
      <sec id="sec-4-1">
        <title>4.2. Models</title>
        <p>To accept or reject the alignment improvements, we compared 10 head-to-head approaches
utilizing 50 samples and three different metrics.</p>
        <p>We have utilized two different API providers: OpenAI and Google. GPT-4o-mini and
Gemini-2.0-flash-lite were selected as candidates. On top of vanilla LLMs, we introduced different
modifications for them. ARAG suffix means that the models utilize Agentic flow with RAG. The EM
suffix means this setup uses the entities matching option, and the SR means that we included the
strict relations option.</p>
        <p>Regular LLM calls (without agentic flow) did not have access to the tools/DB. All the agentic
flow setups were equipped with three tools (load_initial_image, extract_entities, and
get_data_from_neo4j), a code interpreter, and a maximum of 7 steps to accomplish the task.</p>
        <p>For the LLM-as-a-judge purpose, we utilize the same model we used during the knowledge
graph creation – gpt-4o. This will mitigate the bias of inference and evaluation using the same
model. To calculate the BERTScore, we incorporate bert-base-uncased.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.3. Metrics</title>
        <p>To evaluate the experiment results, we need to be able to understand how good the model
performs at describing the image and to what extent it aligns with the DB reference. We focused
on the following metrics:</p>
        <p>
           Answer Relevance [
          <xref ref-type="bibr" rid="ref1 ref10 ref2 ref3 ref4 ref5 ref6 ref7 ref8 ref9">0 - 10</xref>
          ] – Measures how well the generated response aligns
with the expected content of the image. It is assessed based on the LLM judgment of the
generated descriptions of the image. A higher score indicates that the response remains
applicable and contextually appropriate despite alignment constraints.
        </p>
        <p>
           Groundedness [
          <xref ref-type="bibr" rid="ref1 ref10 ref2 ref3 ref4 ref5 ref6 ref7 ref8 ref9">0 – 10</xref>
          ] – Evaluates the extent to which the model’s output is based
on retrieved knowledge rather than hallucinated information. A higher groundedness score
indicates better alignment with structured knowledge and reduced model hallucination.
        </p>
        <p>
           BERTScore [
          <xref ref-type="bibr" rid="ref1">0 – 1</xref>
          ] – A widely used text similarity metric based on contextual
embeddings from a pre-trained BERT model. It compares the generated description with
reference captions by computing cosine similarity between token embeddings.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>The overall prediction time took ~4.5 hours to accomplish. The evaluation took ~14.5 hours due to
the GPT-4o’s degraded performance stated by OpenAI in the last few days. We have highlighted
the best results per model in bold because we have to evaluate the models separately compared to
the vanilla LLM setup. The final results are depicted in Table 1 below.</p>
      <p>As we can see, all the somehow aligned setups obtained higher groundedness scores compared
to the vanilla LLM approach. The final groundedness score also significantly depends on the base
model capabilities. In our results, we can state the significant difference between the
Gemini-2.0flash-lite and the gpt-4o-mini alignment capabilities.</p>
      <p>Meanwhile, answer relevance dropped significantly during alignment, the answers provided by
the aligned models are still valid (but, of course, less detailed). The most important thing is that we
see a correlation between the extent of alignment and the answer relevance.</p>
      <p>BERTScore is almost the same, meaning both models provide captions that partially align with
the labels. This can be explained by low-detail captions describing the Flickr30k dataset. They are
right to the point and were initially focused on much less capable models.</p>
      <p>From the above experiments, we can say that for a more optimal performance entities matching
should be set to True while the strict relations should be set to False.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>The scientific novelty of the presented system lies in contributing scalable LLM alignment
approaches to generate outputs close to human goals and values. We introduced an approach for
improving LLM alignment using Neo4j knowledge graphs, Retrieval-Augmented Generation
(RAG), and agent-based processing. Our method ensures that an LLM describes only predefined
entities and relationships extracted from an image, enforcing structured alignment constraints.
Using vector-based entity matching, strict relationship retrieval, and agentic execution, we
successfully constrained the model’s output while maintaining relevant and coherent responses.</p>
      <p>Our experimental results demonstrate that the aligned agent-based approach significantly
improves groundedness compared to a vanilla LLM with prompting. The trade-off is a slight drop
in answer relevance, but overall, the method remains effective for structured and controlled
generation. Importantly, BERTScore results suggest that despite these constraints, the aligned
model’s responses still align with human-annotated captions at approximately the same level.</p>
      <p>While our approach has proven effective, there are several avenues for future work:
 Scaling predictions to larger models such as GPT-4o or fine-tuned open-weight
LLMs to reduce hallucinations.</p>
      <p> Expanding the dataset beyond the first 100 samples of Flickr30k to increase
generalization.</p>
      <p> Verifying the performance using the Ukrainian multi30k dataset [25].</p>
      <p> Verify the framework performance on the specific domains, like Autonomous
Driving.</p>
      <p> Integrating Neo4j-side semantic search for more efficient and scalable entity
matching.</p>
      <p> Exploring techniques to enhance the alignment of vision-language models with
structured knowledge.</p>
      <p> Researching the capabilities of image-based alignment instead of the entities and
relations DB [26, 27, 28].</p>
      <p>By combining custom constraints from knowledge graphs with retrieval-enhanced generation,
our work demonstrates a promising pathway for more controllable, aligned, and factually
grounded LLMs. This methodology can be extended to other alignment-sensitive applications, such
as medical AI, legal document analysis, and explainable AI systems.</p>
      <p>Source code is available on GitHub [29].</p>
      <sec id="sec-6-1">
        <title>6.1. Limitations</title>
        <p>This research was conducted as part of the master’s thesis at the Kharkiv National University of
Radio Electronics. That is the reason why the KG sizes and the evaluation sizes are not that big.
The overall experiment budget was around 30$. We tested only the GPT-4o, GPT-4o-mini, and the
Gemini-2.0-flash-lite models in our tests because of their affordability. We expect our approach to
scale even better using the more prominent models like Gemini-2.0-pro and Claude-sonnet-3.7.</p>
        <p>The obtained framework sometimes faces the output token limits during the evaluations. These
are just being retried for now.</p>
        <p>We have conducted full-size experiments only using the Code Agent. After the preliminary
tests, the Tool Calling Agent almost always showed worse results.</p>
        <p>We did not evaluate our framework on the new data samples that were not used to build the
KG.</p>
        <p>Image size is limited to 20MB (current OpenAI limitation).</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>This publication is based upon work from COST Action GOBLIN - Global Network on Large-Scale,
Cross-domain and Multilingual Open Knowledge Graphs (CA23147), supported by COST
(European Cooperation in Science and Technology).</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used Gemini-2.5 in order to: Grammar and
spelling check.
[15] Y. Wang, N. Lipka, R. A. Rossi, A. Siu, R. Zhang, and T. Derr, Knowledge graph prompting for
multi-document question answering. In: Proceedings of the 38th AAAI Conference on
Artificial Intelligence (AAAI), pp. 19206–19214 (2024).
[16] X. He, Y. Tian, Y. Sun, N. V. Chawla, T. Laurent, Y. LeCun, X. Bresson, and B. Hooi,
Gretriever: Retrieval-augmented generation for textual graph understanding and question
answering. In: Proceedings of the Advances in 37th Neural Information Processing Systems
(NeurIPS), Curran Associates, Inc., pp. 132876-132907 (2024).
[17] X. Li, R. Zhao, Y. K. Chia, B. Ding, S. Joty, S. Poria, and L. Bing, Chain-of-knowledge:
Grounding large language models via dynamic knowledge adapting over heterogeneous
sources. In: Proceedings of the 12th International Conference on Learning Representations
(ICLR) (2024).
[18] L. Luo, Y.-F. Li, G. Haffari, and S. Pan, Reasoning on graphs: Faithful and interpretable large
language model reasoning. In: Proceedings of the 12th International Conference on Learning
Representations (ICLR) (2024).
[19] J. Sun, C. Xu, L. Tang, S. Wang, C. Lin, Y. Gong, H.-Y. Shum, and J. Guo, Think-on-graph: Deep
and responsible reasoning of large language model with knowledge graph. In: Proceedings of
the 12th International Conference on Learning Representations (ICLR) (2024).
[20] H. Liu, S. Wang, Y. Zhu, Y. Dong, and J. Li, Knowledge graph-enhanced large language models
via path selection. In: Findings of the 62nd Association for Computational Linguistics (ACL),
Bangkok, Thailand, pp. 6311-6321 (2024).
[21] R.-C. Chang and J. Zhang, CommunityKG-RAG: Leveraging Community Structures in
Knowledge Graphs for Advanced Retrieval-Augmented Generation in Fact-Checking, arXiv
preprint (2024).
[22] Y. Mu, P. Niu, K. Bontcheva, and N. Aletras, Predicting and analyzing the popularity of false
rumors in weibo, Expert Systems with Applications, vol. 243, p. 122791 (2024).
[23] A. Kau, X. He, A. Nambissan, A. Astudillo, H. Yin, and A. Aryani, Combining knowledge
graphs and large language models, arXiv preprint (2024).
[24] Peter Young, Alice Lai, Micah Hodosh, and Julia Hockenmaier, From image descriptions to
visual denotations: New similarity metrics for semantic inference over event descriptions,
Transactions of the Association for Computational Linguistics, pp. 67–78 (2014).
[25] Nataliia Saichyshyna, Daniil Maksymenko, Oleksii Turuta, Andriy Yerokhin, Andrii Babii, and
Olena Turuta, Extension Multi30K: Multimodal Dataset for Integrated Vision and Language
Research in Ukrainian. In: Proceedings of the 2nd Ukrainian Natural Language Processing
Workshop (UNLP), pp. 54–61 (2023).
[26] Kyrychenko, I., Tereshchenko, G., &amp; Smelyakov, K., Optimized Indexing Method in a Hybrid
Image Storage Model for Efficient Storage and Access in Big Data Environments. In:
Proceedings of the 17th International Conference on Advanced Trends in Radioelectronics,
Telecommunications and Computer Engineering (TCSET), 1-4 (2024).
[27] Gorokhovatskyi, V., Chmutov, Y., Tvoroshenko, I. and Kobylin, O., Reducing computational
costs by compressing the structural description in image classification methods, Advanced
Information Systems 9, 5-12 (2025).
[28] Gorokhovatskyi, Volodymyr, et al. "Search for visual objects by request in the form of a cluster
representation for the structural image description." Advances in Electrical and Electronic
Engineering 21.1 (2023).
[29] Implementation of the paper, https://github.com/Vlad-Fliahin/LLM-alignment-with-ARAG,
last accessed 2025/04/20.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Erkut</given-names>
            <surname>Erdem</surname>
          </string-name>
          , Menekse Kuyu, Semih Yagcioglu, Anette Frank, Letitia Parcalabescu, Barbara Plank, Andrii Babii, Oleksii Turuta, Aykut Erdem, Iacer Calixto, Elena Lloret,
          <string-name>
            <surname>Elena-Simona</surname>
            <given-names>Apostol</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ciprian-Octavian</surname>
            <given-names>Truică</given-names>
          </string-name>
          , Branislava Šandrih, Sanda Martinčić-Ipšić,
          <article-title>Gábor Berend, Albert Gatt, and Grăzina Korvel, Neural Natural Language Generation: A Survey on Multilinguality, Multimodality, Controllability and Learning</article-title>
          .
          <source>J. Artif. Int. Res</source>
          .
          <volume>73</volume>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schuurmans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bosma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Chi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhou</surname>
          </string-name>
          et al.,
          <article-title>Chain-ofthought prompting elicits reasoning in large language models</article-title>
          .
          <source>In: Advances of the 36th Neural Information Processing Systems (NeurIPS)</source>
          , Curran Associates, Inc., pp.
          <fpage>24824</fpage>
          -
          <lpage>24837</lpage>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schuurmans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Chi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Narang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chowdhery</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Selfconsistency improves chain of thought reasoning in language models</article-title>
          .
          <source>In: Proceedings of the 11th International Conference on Learning Representations (ICLR)</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Shafran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Griffiths</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Cao</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Narasimhan</surname>
          </string-name>
          ,
          <article-title>Tree of thoughts: Deliberate problem solving with large language models</article-title>
          .
          <source>In: Advances of the 37th Neural Information Processing Systems (NeurIPS)</source>
          , pp.
          <fpage>11809</fpage>
          -
          <lpage>11822</lpage>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Alon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. F.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Jiang</surname>
          </string-name>
          , and G. Neubig, Docprompting:
          <article-title>Generating code by retrieving the docs</article-title>
          .
          <source>In: Proceedings of the 11th Conference on Learning Representations (ICLR)</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kojima</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Gu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Reid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Matsuo</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Iwasawa</surname>
          </string-name>
          ,
          <article-title>Large language models are zero-shot reasoners</article-title>
          .
          <source>In: Advances of the 36th Neural Information Processing Systems (NeurIPS)</source>
          , Curran Associates, Inc., pp.
          <fpage>22199</fpage>
          -
          <lpage>22213</lpage>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Creswell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Shanahan</surname>
          </string-name>
          ,
          <string-name>
            <surname>and I. Higgins</surname>
          </string-name>
          ,
          <article-title>Selection-inference: Exploiting large language models for interpretable logical reasoning</article-title>
          .
          <source>In: Proceedings of the 11th Conference on Learning Representations (ICLR)</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>N.</given-names>
            <surname>Shinn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Labash</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Gopinath</surname>
          </string-name>
          ,
          <article-title>Reflexion: an autonomous agent with dynamic memory and self-reflection</article-title>
          .
          <source>In: Proceedings of the 37th International Conference on Neural Information Processing Systems (NIPS)</source>
          , Curran Associates Inc.,
          <string-name>
            <surname>Red</surname>
            <given-names>Hook</given-names>
          </string-name>
          ,
          <string-name>
            <surname>NY</surname>
          </string-name>
          , USA, Article
          <volume>377</volume>
          , pp.
          <fpage>8634</fpage>
          -
          <lpage>8652</lpage>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Besta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Blach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kubicek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Gerstenberger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Gianinazzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gajda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Podstawski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Niewiadomski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nyczyk</surname>
          </string-name>
          et al.,
          <article-title>Graph of thoughts: Solving elaborate problems with large language models</article-title>
          .
          <source>In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI)</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>E.</given-names>
            <surname>Zelikman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.</given-names>
            <surname>Goodman</surname>
          </string-name>
          ,
          <article-title>STaR: Bootstrapping Reasoning With Reasoning</article-title>
          .
          <source>In: Advances of the 36th Neural Information Processing Systems (NeurIPS)</source>
          , Curran Associates, Inc., pp.
          <fpage>15476</fpage>
          -
          <lpage>15488</lpage>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>H.</given-names>
            <surname>Paulheim</surname>
          </string-name>
          ,
          <article-title>Knowledge graph refinement: A survey of approaches and evaluation methods</article-title>
          .
          <source>In: 17th international semantic web conference (ISWC)</source>
          , IOS Press, NLD, pp.
          <fpage>489</fpage>
          -
          <lpage>508</lpage>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Mao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <article-title>Knowledge graph embedding: A survey of approaches and applications</article-title>
          ,
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          , vol.
          <volume>29</volume>
          , no.
          <issue>12</issue>
          , pp.
          <fpage>2724</fpage>
          -
          <lpage>2743</lpage>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Song</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Tao</surname>
          </string-name>
          ,
          <article-title>Meta-aggregator: Learning to aggregate for 1- bit graph neural networks</article-title>
          .
          <source>In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV)</source>
          , pp.
          <fpage>5281</fpage>
          -
          <lpage>5290</lpage>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>D.</given-names>
            <surname>Sanmartin</surname>
          </string-name>
          ,
          <article-title>Kg-rag: Bridging the gap between knowledge and creativity, arXiv preprint (</article-title>
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>