<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Hybrid Multi-Source RAG for Context-Aware Conversational Data Access</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giorgia Di Nicola</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefano Iannucci</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Riccardo Torlone</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Roma Tre University</institution>
          ,
          <addr-line>Rome</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>Retrieval-Augmented Generation (RAG) augments large language models (LLMs) with external knowledge, yet most RAG systems rely solely on vector-based semantic search, limiting their handling of structured or relational information. We propose a hybrid multi-source RAG architecture that unifies graph-based and vectorbased retrieval within a single conversational framework. A prompt-driven orchestrator dynamically selects and fuses the appropriate source at runtime, leveraging graph databases for symbolic reasoning and vector stores for semantic similarity. Experimental evaluation with LLM-as-a-Judge scoring shows that the hybrid configuration consistently outperforms single-source baselines, achieving higher relevance, fluency, faithfulness, and interpretability without any additional training overhead.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Large language models (LLMs) have revolutionized NLP, showing strong performance on many linguistic
and reasoning tasks. Their knowledge, however, is limited by static, incomplete training data [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], leading
to outdated or inaccurate answers on complex, domain-specific queries [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Retrieval-Augmented
Generation (RAG) mitigates this issue by coupling generative models with external retrieval [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Conventional
RAG relies on dense-vector retrieval, which excels at semantic similarity search over unstructured
text [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] but lacks the explicit relational reasoning that graph databases provide [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>Real-world knowledge, however, is rarely monolithic. It spans heterogeneous modalities, from
unstructured to structured data, each providing distinct and complementary advantages for representing
information. Within this landscape, vector databases are widely used today to capture semantic meaning
and conceptual similarity among textual contents, enabling retrieval based on related ideas rather than
exact wording. Conversely, graph databases are used to model explicit relationships and hierarchical
connections between entities, supporting structured reasoning and interpretability. A truly
contextaware conversational agent should be able to leverage both sources, integrating semantic and symbolic
retrieval to deliver responses that are both fluent and interpretable.</p>
      <p>To this end, we propose a hybrid multi-source RAG architecture that maintains a clear separation
between retrieval modalities while allowing dynamic, transparent source selection at runtime. Unlike
joint-embedding or reranking-based approaches, it keeps graph and vector pipelines independent,
delegating their integration to a prompt-driven orchestration layer where the LLM acts as a reasoning
controller. This late-fusion strategy preserves the complementary strengths of each modality and
ensures modularity, adaptability, and interpretability, making the system easily extensible and auditable
in heterogeneous production environments. The resulting architecture has been implemented in a
system that dynamically routes queries through graph-based, vector-based, or hybrid retrieval paths
and evaluates retrieved contexts through prompt-engineered selection logic. This allows the
conversational agent to generate responses that are semantically rich, structurally consistent, and explainable
simultaneously.</p>
      <p>The contributions of this work are threefold: (i) the design and implementation of a modular RAG
architecture integrating graph and vector databases under a unified agentic framework; (ii) a
promptbased source selection strategy enabling dynamic and transparent fusion of heterogeneous retrieval
outputs without additional model training; (iii) a quantitative evaluation demonstrating that the hybrid
approach achieves higher response quality measured by relevance, fluency, and faithfulness, compared
to graph-only and vector-only baselines.</p>
      <p>The rest of the paper is organized as follows. Section 2 reviews related work. Section 3 describes the
design of the hybrid multi-source RAG architecture. Section 4 presents the evaluation methodology,
and Section 5 reports and discusses the experimental results. Section 6 concludes the paper and outlines
future research directions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Retrieval-Augmented Generation (RAG) has become a principal technique for granting large language
models access to external knowledge [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Conventional RAG pipelines rely on vector databases to
perform semantic similarity search over unstructured text; this yields broad coverage but falls short on
tasks that demand structural reasoning, precise entity resolution, or interpretability. To address these
shortcomings, recent work has explored graph-based retrieval (GraphRAG), in which information is
accessed through graph traversal or query execution rather than dense-embedding search [6]. GraphRAG
provides advantages for multi-hop reasoning, disambiguation, and factual reliability, especially when
coupled with graph-neural-network retrievers [7] or heterogeneous node designs such as NodeRAG [8]
and multi-partite frameworks like Clue-RAG [9], which improve precision and cost-eficiency.
      </p>
      <p>Hybrid retrieval systems aim to combine the semantic richness of vector stores with the explicit
structure of graph stores, but in doing so they must balance two objectives that are complementary yet
potentially in conflict. PruningRAG [ 10] embeds a static knowledge-filtering step into the retrieval phase
to reduce hallucinations and enhance factual consistency, but its fixed heuristics limit adaptability to the
specific informational demands of a query. Multi-Source Prompt Retrieval (MSPR) [ 11] learns a dynamic
selection strategy that decides, on a per-query basis, which sources to consult, achieving strong empirical
performance at the cost of substantial training and maintenance overhead and reduced transparency.
Joint-embedding approaches attempt to fuse modalities into a shared latent space, enabling multimodal
retrieval but inevitably blurring semantic precision and compromising interpretability [12]. KG-infused
RAG augments conventional text-based retrieval with knowledge-graph facts to curb hallucination, yet
it relies on static integration rules that hinder flexibility and obscure source-priority decisions [ 13]. The
more elaborate multi-partite design of Clue-RAG [9] explicitly models cross-source dependencies and
retrieves compact, semantically coherent sub-graphs for combination with textual passages, preserving
graph semantics but introducing significant computational overhead and orchestration complexity that
limit runtime eficiency and modular extensibility.</p>
      <p>These collective eforts illustrate the core dilemma in hybrid retrieval: the trade-of between
maximizing semantic breadth and maintaining structural clarity. While some methods improve recall, they can
reduce interpretability. Conversely, those that prioritize graph semantics often sacrifice flexibility and
eficiency. To address these challenges and unlike previous work, our solution maintains a clear
separation between retrieval modalities while allowing dynamic, transparent source selection at runtime. The
design ensures modularity and adaptability across heterogeneous domains by deferring integration to a
prompt-driven orchestration layer rather than relying on joint embeddings or reranking mechanisms.
This late-fusion strategy preserves the complementary strengths of graph and vector pipelines and
supports extensible, auditable operation in real-world production settings.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Design of the Hybrid Multi-Source RAG Agent</title>
      <sec id="sec-3-1">
        <title>3.1. Architecture Overview</title>
        <p>The agent is designed around a modular architecture that separates retrieval modalities while enabling
dynamic coordination at runtime. Integration is achieved through a late-fusion orchestration layer,
enabling dynamic selection and combination of results at runtime while maintaining
interpretability, modularity, and adaptability. The modular design facilitates extension to additional sources or
modalities with minimal modification to existing components. The architecture is composed of three
complementary components: (1) a Vector RAG pipeline for semantic retrieval over unstructured
documents, (2) a Graph RAG pipeline for schema-aware reasoning over structured knowledge, and (3)
an Entity Resolution subroutine, an auxiliary chain that resolves raw identifiers into canonical labels
through a Graph Lookup process.</p>
        <p>Vector pipeline. The vector pipeline embeds the input query using a transformer-based model
and retrieves top- chunks from a vector store through approximate nearest neighbor search. The
retrieved documents are concatenated with the original query and forwarded to an LLM for answer
generation. The pipeline leverages dense embeddings to identify relevant content beyond strict term
matching, making it efective for lexically distant queries from the source corpus. This approach enables
retrieval across heterogeneous documents and varying linguistic formulations, making it well-suited
for exploratory or underspecified information needs. In summary, the Vector RAG pipeline provides
schema-indepentent retrieval, ensuring that when the knowledge graph yields limited or no relevant
results, the system can still rely on vector-based evidence to construct the final response.
Graph pipeline. The graph pipeline leverages a knowledge graph to perform structured retrieval
and multi-hop reasoning over domain knowledge. It begins by translating the input query into a
Cypher query via prompt-guided LLM reasoning (see Section 3.2), which is then executed against
the knowledge graph to retrieve relevant entities and relationships. Finally, the retrieved data are
reformulated in natural language using a dedicated prompt, producing a human-readable answer. By
operating over explicitly structured data, the Graph RAG pipeline ensures that answers are grounded in
version-controlled, schema-compliant information. This separation between symbolic query execution
and natural-language rendering improves transparency, facilitates debugging, and grounds the answers
in version-controlled structured data. As a result, the Graph RAG component provides precise,
schemaaware retrieval that complements the Vector RAG pipeline.</p>
        <p>Entity Resolution (Graph Lookup). A distinctive feature of the proposed design is the Graph Lookup
subroutine, which addresses the frequent occurrence of identifier-based outputs in graph databases (e.g.,
ns/m.07ssc in the WikiGraphs [14] dataset). Upon detection of unresolved identifiers in the Graph
RAG output, the subroutine is triggered to execute a narrowly scoped Cypher query to retrieve the
corresponding canonical names or human-readable values for the returned nodes. The subroutine then
produces these results as key-value mappings, which are applied by the main pipeline to replace the
original identifiers in the graph-derived output, producing fully interpretable results. This refinement
step ensures that all structured information retrieved from the knowledge graph is interpretable and
unambiguous, improving downstream usability and clarity in production contexts.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Automatic Cypher Query Generation</title>
        <p>A central component of the Graph RAG pipeline is the automatic generation of Cypher queries from
natural language questions. To achieve this, a large language model with an extended context window
is leveraged to handle detailed prompts that describe the full knowledge graph schema, entity types,
and relationships.</p>
        <p>The methodology is structured as follows:
1. Expert Role Declaration. The prompt begins by instructing the LLM to act as an expert Cypher
query generator for Freebase-like knowledge graphs. This sets the reasoning context and ensures the
model produces valid, schema-compliant queries.
2. Schema Inclusion. The complete knowledge graph schema is provided to the LLM, defining the
structure of nodes, their properties, and allowed relationships. The schema is presented in a simplified
form in Listing 1. However, the full schema conceptually includes:
• Nodes: Entity with properties (center: STRING, identifier; title: STRING, name).
• Relationships: REL between Entity nodes with property (type: STRING).</p>
        <p>• Allowed edges: Entity --[REL]--&gt; Entity.</p>
        <p>In addition, all relations are described in natural language and grouped by category, such as:
• ns/type.object.name: primary human-readable name of an entity.
• ns/common.topic.description: textual description providing context for a topic.
• ns/people.person.date_of_birth: birth date of an individual (YYYY-MM-DD).
• ns/people.person.profession: occupation of a person.
3. Natural Language Question. The user’s query is inserted into the prompt.
4. Formatting and Term Transformation Rules. Instructions specify basic text formatting and
term adjustments, such as replacing spaces with underscores, quoting string literals, and simplifying
expressions.
5. Cypher Pattern Instructions. The prompt enforces a MATCH-WHERE-RETURN construct and limits
r.type to the set of relations present in the provided schema.
6. Identifier Handling and Output Formatting. If retrieved entities contain machine-readable
identifiers, the above described entity resolution mechanism is engaged to map these to human-readable
labels before the final output is returned to the user.</p>
        <p>Gaining from a large-context LLM, the pipeline can encode the full schema without truncation,
producing accurate Cypher queries that form the basis for structured graph retrieval. Listing 1 presents
a simplified prompt omitting relation semantics for conciseness.</p>
        <p>Role. You are an expert Cypher query generator for Freebase-like knowledge graphs.</p>
        <p>Input schema.</p>
        <p>Instructions.</p>
        <p>Allowed relationships.</p>
        <p>• Node: Entity {center: STRING, title: STRING}
• Relationship: REL connecting Entity nodes {type: STRING}
• Allowed edges: Entity -[REL]-&gt; Entity
• Input question: {question}
• Generate a valid Cypher query using MATCH--WHERE--RETURN.
• Use only the relationships listed below.
• General topic information: ns/common.topic.alias, key/wikipedia.en
• People and biographical information: ns/people.person.date_of_birth,</p>
        <p>ns/people.person.profession</p>
        <p>Output. Return only the Cypher query.</p>
        <p>Listing 1: Prompt used to generate Cypher queries constrained by a predefined schema and relationship
whitelist.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Coordination and Flow Activation</title>
        <p>The coordination mechanism constitutes a central aspect of the system architecture, governing the
sequential execution of multiple retrieval pipelines in response to user queries. Each pipeline is executed
in turn, beginning with the Vector RAG pipeline, followed by the Graph RAG pipeline. The Graph
Lookup subroutine is conditionally triggered when unresolved identifiers are detected in the Graph
RAG output, refining the results with canonical labels.</p>
        <p>Intermediate outputs from all pipelines are aggregated by the final LLM node, which orchestrates
the integration of retrieved information based on contextual relevance and informativeness. In cases
where one source provides limited or non-informative content—-i.e., when the retrieved context does
not add relevant or precise information to the user’s question as defined in the prompt instructions
(see Listing 2)–the system relies on the alternative source; when both sources provide complementary
information, their contents are fused to generate a coherent response that is both contextually grounded
and comprehensive.</p>
        <p>This adaptive orchestration preserves modularity, transparency, and interpretability, while balancing
semantic coverage and structural precision to handle ambiguous or underspecified queries efectively.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Hybrid Integration Strategy</title>
        <p>The integration of heterogeneous retrieval modalities is deferred to a late-fusion stage. Instead of relying
on static merging rules, the outputs of the retrieval pipelines (optionally refined through identifier
resolution) are provided to the final LLM. An excerpt of the prompt guiding the final LLM is shown in
Listing 2.</p>
        <p>Prompt used for hybrid orchestration
You are given two independent context sources: GraphRAG and VectorRAG, and the user’s question. Your
task is to generate the most accurate, complete, and fluent answer based strictly on the provided contexts
and the user’s question.</p>
        <p>Instructions:
• If both sources provide relevant and compatible information, merge them.
• If only one source provides useful content, use that source.
• If the sources conflict, select the more specific or factual one.
• If neither source is relevant, reply: ”The information is not available in the provided context.”
• Output only one unified answer.</p>
        <p>Listing 2: Prompt guiding the final LLM in the hybrid RAG architecture.</p>
        <p>This prompt concretely operationalizes the integration strategy, ensuring context-aware, accurate,
and coherent answers without hardcoded merging rules. Through prompt-driven orchestration, the
model dynamically assesses the GraphRAG and VectorRAG contexts, integrating them when both
contribute relevant information, favoring the more precise or specific source in case of conflict, and
ifltering out irrelevant or uninformative content.</p>
        <p>This strategy preserves the strengths of each modality while avoiding the semantic overlap observed
in joint embedding approaches and the rigidity of pruning- or reranking-based methods.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Evaluation Methodology</title>
      <p>The evaluation systematically compared Vector, Graph, and Hybrid RAG. The methodology comprised
three components: a manual curated dataset, standardized evaluation metrics, and a semi-automated
annotation protocol, each described in the following subsections.</p>
      <sec id="sec-4-1">
        <title>4.1. Dataset Construction</title>
        <p>The evaluation dataset builds upon WikiGraphs [14], which pairs full Wikipedia articles from the
WikiText-103 corpus with automatically extracted subgraphs of Freebase. The dataset is released in
three structured variants—-max256, max512, and max1024—-that difer in the maximum number of
nodes per subgraph (256, 512, and 1024 respectively). These variants provide progressively larger and
more connected graph representations of the same underlying knowledge, enabling systematic analysis
of retrieval and reasoning performance across increasing structural complexity. In all configurations,
each graph remains aligned to its corresponding Wikipedia article, preserving semantic consistency
and factual grounding across scales.</p>
        <p>A set of approximately 100 query–answer pairs, applicable to all three dataset variants, was manually
created to expose complementary strengths and weaknesses of the retrieval modalities. They include
general knowledge questions, fine-grained factual queries, relation-centric queries, and adversarial or
underspecified cases. Ground-truth answers were consolidated from both textual and graph sources to
ensure benchmark reliability and prevent biases.</p>
        <p>To further extend the evaluation to large-scale conditions, an additional dataset was specifically
constructed for the max1024 configuration. This supplementary set comprises approximately 80
query–answer pairs associated specifically with the largest and most structurally complex graphs.
The questions related to the max1024 variant were generated with the assistance of ChatGPT and
subsequently verified manually to ensure factual correctness and coverage.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Evaluation Metrics</title>
        <p>The evaluation of system outputs was conducted along three principal dimensions, each capturing a
distinct aspect of answer quality:
1. Relevance: the extent to which the generated answer addresses the user query and efectively
incorporates the retrieved context. High relevance indicates strong alignment with the
information need and meaningful integration of supporting content, reflecting the efectiveness of both
retrieval and grounding.
2. Faithfulness: the degree to which the generated answer is factually consistent with the retrieved
knowledge. A faithful response does not introduce information that is unsupported or
misrepresents the source material. This dimension is critical in retrieval-augmented generation, where
the trustworthiness of the output depends on its grounding in the retrieved context.
3. Fluency: the naturalness, grammatical correctness, and structural coherence of the response.</p>
        <p>Evaluated independently from factual accuracy, this dimension ensures linguistic quality,
acknowledging that a fluent output may still be factually incorrect or irrelevant.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Annotation Protocol</title>
        <p>
          Evaluation was conducted using a semi-automated protocol following the LLM-as-a-Judge paradigm, in
which a large language model (Llama-4-Scout-17B-16E-Instruct-FP8a) was employed as the annotator.
Each instance of the dataset, comprising query, generated answer, retrieved context, ground-truth
reference, and retrieval method, was processed automatically to generate scores for all evaluation
dimensions. The workflow involved structured input preparation, automated evaluation, and parsing
and aggregation of outputs. Queries, answers, contexts, and ground-truth references were serialized
into structured records to ensure reproducibility. The LLM evaluated each instance according to the
defined criteria, returning both numeric scores (on a 1–-10 scale, subsequently normalized to [
          <xref ref-type="bibr" rid="ref1">0,1</xref>
          ]) and
textual justifications. Outputs were normalized and stored alongside the original dataset.
        </p>
        <p>A minimal input record is shown below:
"question": "What gameplay modes are available</p>
        <p>in Maniac Mansion?",
"method": "Graph RAG",
"context": [
"The gameplay modes available
in Maniac_Mansion are ns/m.02hjn4",
{"id":"ns/m.02hjn4","name":"Single-player"}
],
"answer": "The gameplay modes available are</p>
        <p>Single-player",
"ground_truth": "The gameplay modes available
are Single-player video game."</p>
        <p>These records form the basis for LLM-based scoring and enable stratified analyses across retrieval
strategies, supporting both aggregate comparisons and the identification of method-specific strengths
and weaknesses.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experimental Results</title>
      <p>We conducted experiments aimed at evaluating both, the efectiveness and the performance of the
proposed approach. All of them have been run on a machine equipped with an Intel Xeon Gold
5512U@2.1 Ghz and 128GB RAM. The graph database employed is Neo4j 5.19.0 (with APOC 5.19.0),
while the vector store is Qdrant 1.11.0 (running in a Docker container). Flowise 3.0.0, also running
in a Docker container, was used for pipeline orchestration and execution, and LangSmith SDK 0.4.31
for trace logging and debugging. Ollama 0.12.2, running in a Docker container, was employed for
embedding query vectors within the Vector RAG pipeline. The Groq Python SDK version 0.29.0 was
utilized for integrating the inference capabilities of Groq’s Language Processing Unit (LPU).</p>
      <sec id="sec-5-1">
        <title>5.1. Efectiveness Evaluation</title>
        <p>The empirical evaluation provides a comparative analysis of Vector RAG, Graph RAG, and the proposed
Hybrid RAG agent along the three assessment dimensions defined in Section 4.2: relevance, faithfulness,
and fluency. Table 1 reports the average normalized scores (0–1 scale) for each retrieval strategy.</p>
        <p>The Cypher queries used by the Graph RAG pipeline were generated automatically by an LLM
(Llama-4-Scout-17B-16E-Instruct-FP8a) with a large context window to accommodate the full graph
schema. This allowed prompt-driven, step-by-step query construction covering all entity types and
relationships.</p>
        <sec id="sec-5-1-1">
          <title>Method</title>
          <p>Vector RAG
Graph RAG
Hybrid RAG</p>
          <p>Relevance</p>
          <p>The results show that each retrieval modality exhibits a characteristic trade-of among the
diferent dimensions. Vector RAG achieves relatively balanced outcomes, with moderate relevance and
faithfulness and the highest fluency score among the baselines. Its reliance on semantic similarity
search allows it to capture diverse contexts, but at the cost of occasional factual imprecision. Graph
RAG demonstrates the strongest faithfulness, confirming the reliability of schema-compliant retrieval,
yet its relevance and fluency are lower, reflecting dificulties in handling underspecified queries and
the rigidity of graph-derived outputs. The Hybrid RAG agent clearly outperforms both baselines,
achieving the highest scores in relevance and faithfulness while maintaining fluency comparable to the
best-performing method.</p>
          <p>By coupling graph and vector pipelines, the Hybrid approach mitigates each method’s weaknesses:
structured graph evidence limits hallucinations, while dense-vector retrieval fills coverage gaps.
Automatic Cypher generation guarantees correct, reproducible graph queries, and a Graph-Lookup
subroutine replaces raw IDs with canonical labels, boosting interpretability and perceived answer quality.
Performance diferences stem from the underlying retrieval mechanisms: Graph RAG exploits explicit
KG structure, Vector RAG leverages semantic embeddings for broad context, and Hybrid RAG uses
prompt-driven late-fusion to pick and merge the most pertinent information from both sources.
Consequently, Vector RAG shows larger relevance/faithfulness fluctuations, whereas Hybrid RAG attains the
highest average scores and the lowest variance, indicating more stable behavior across query types.
These findings validate the central claim: a modular, multi-source RAG architecture that preserves
modality separation yet integrates them via late-fusion delivers superior reliability, adaptability, and
interpretability compared to single-modality systems. The Hybrid RAG agent thus emerges as a
promising design for real-world deployment scenarios where both accuracy and transparency are essential,
while its modular implementation in a low-code framework (e.g., Flowise) ensures seamless integration
without reliance on hardcoded logic.</p>
          <p>Performance by Query Type. To further examine the behavior of the retrieval pipelines, the evaluation
was stratified by question category-general knowledge, fine-grained factual, relation-centric, and
adversarial or underspecified cases, as defined in Section 4.1.</p>
          <p>Table 2 reports average normalized scores (0–1) per method and category. The results highlight the
complementary strengths and limitations of each retrieval strategy. Hybrid RAG consistently achieves
the highest relevance and maintains strong faithfulness, efectively combining graph precision with
vector flexibility. Graph RAG delivers exceptional faithfulness on schema-anchored queries (general
and fine-grained) but sufers in relevance and fluency when broader or less structured context is needed.
Vector RAG ensures fluent, semantically rich responses, particularly for relation-centric queries, yet
its moderate faithfulness exposes a risk of factual errors. Overall, the hybrid approach balances these
trade-ofs, leveraging structured grounding and semantic coverage to achieve the strongest overall
performance.</p>
        </sec>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Performance Evaluation</title>
        <p>For scalability evaluation, all graphs from the three datastore configurations ( max256, max512, and
max1024) were included in the experimental pool. To enable controlled cross-scale comparison, we
identified the subset of graphs that are structurally identical across variants—-hereafter denoted as
 common. These matched graphs share the same number of nodes and relations within Neo4j, and
identical semantic content in the corresponding Qdrant chunks. By issuing identical query sets over
 common across configurations, any observed performance diferences can be attributed exclusively to
datastore growth rather than graph-specific structural variations. For each configuration, the same
queries were executed on the corresponding | common| graphs using identical prompting and LLM
configurations, in order to isolate the contribution of datastore size and composition.</p>
        <p>Dataset expansion between configurations is quantified as follows. From max256 to max512, nodes
increased by approximately 1.17× (407,218 / 347,949), relations by 1.16× (714,582 / 613,705), and vector
chunks by 1.02× (969,993 / 946,423), yielding an average structural increase of ≈ 1.12×. From max512
to max1024, nodes increased by 1.05× (426,344 / 407,218), relations by 1.04× (742,265 / 714,582), and
vector chunks by 1.00× (973,719 / 969,993), resulting in an average structural increase of ≈ 1.03×.</p>
        <p>To verify Groq’s stability as an execution environment, each query was run three times per
configuration, measuring latency mean and standard deviation. The resulting low variability confirmed Groq’s
consistent inference times, allowing its use for the scalability evaluation.</p>
        <p>Figure 3 presents a stacked visualization of the latency contributions from the main pipeline
components: Vector RAG, Graph RAG, Graph Lookup, selection and final answer generation (including
coordination and flow activation), and Groq execution.</p>
        <p>Notably, the selection and generation stage remains largely stable across scales, measuring values of
1105 ms (max256), 1065 ms (max512), and 1097 ms (max1024), despite growth in dataset size and the
latencies of upstream components. This stability reflects the efectiveness of the dynamic orchestration
and prompt-driven source selection , which integrates only the most informative outputs and filters
uninformative content.</p>
        <p>Groq execution is also highly stable, with standard deviations ranging from  ≈ 0.13 s to  ≈ 0.30 s.
Vector RAG is consistently the fastest component, followed by Graph Lookup and selection/generation,
while Graph RAG exhibits the highest latencies due to Cypher generation, traversal, and reformulation.
Overall, standard deviations remain moderate ( ≈ 25 –30 ms for Vector and Graph RAG, 15–20 ms for
Lookup and selection), confirming reproducible performance and sub-linear scalability across dataset
scales.</p>
        <sec id="sec-5-2-1">
          <title>5.2.1. Full-scale descriptive runs (max1024)</title>
          <p>Whereas the matched-subset analysis provided a controlled comparison by holding per-graph structure
constant, the full-scale descriptive runs aim to characterize system behaviour under high-load conditions.
This evaluation employs the largest graphs available in the max1024 variant. The results confirm that
all pipelines remain within practical latency ranges even under these demanding conditions.</p>
          <p>Vector RAG exhibits the lowest mean latency (≈0.8s), followed by Graph RAG (≈1.9s). Hybrid RAG,
as expected, shows higher latency (≈4.9s) due to the additional orchestration, selection, and generation
step via the LLM; however, this overhead remains stable across scales and does not compound with
datastore size. The full-scale evaluation indicates that all pipelines sustain eficient latency even when
processing the most structurally complex graphs. Notably, the Hybrid RAG—-despite incorporating
an LLM-driven orchestration layer—-retains stable and robust scalability, demonstrating the system’s
ability to handle high-load conditions efectively without critical bottlenecks.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions and future work</title>
      <p>This work presented a hybrid multi-source RAG architecture that integrates graph-based and
vectorbased retrieval through a prompt-orchestrated late-fusion strategy. By preserving modality separation
and enabling dynamic source selection, the proposed system achieves a balanced trade-of between
semantic coverage and structural precision. Experimental results confirm that the hybrid approach
improves relevance and faithfulness over single-modality baselines while maintaining low latency and
high scalability.</p>
      <p>Future work will explore adaptive weighting of retrieval sources, integration of additional modalities,
and automated evaluation frameworks to further enhance transparency and robustness in data-centric
conversational AI.</p>
    </sec>
    <sec id="sec-7">
      <title>Artifacts</title>
      <p>The source code is available at:
https://github.com/Giorgia-DNicola/rag-multisource-agent</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this manuscript, the authors used ChatGPT-5.2 for grammar and spelling
review. After using this service, they carefully reviewed and edited the manuscript as needed and take
full responsibility for the content of the publication.</p>
    </sec>
    <sec id="sec-9">
      <title>Acknowledgements References</title>
      <p>This work was partially supported by the PRIN-MUR project 2022XERWK9 S-PIC4CHU.
[6] H. H. et al.and Yu Wang, H. Shomer, K. Guo, J. Ding, Y. Lei, M. Halappanavar, R. A. Rossi,
S. Mukherjee, X. Tang, Q. He, Z. Hua, B. Long, T. Zhao, N. Shah, A. Javari, Y. Xia, J. Tang,
Retrieval-augmented generation with graphs (graphrag), 2025. URL: https://arxiv.org/abs/2501.
00309. arXiv:2501.00309.
[7] C. Mavromatis, G. Karypis, GNN-RAG: Graph neural retrieval for eficient large language model
reasoning on knowledge graphs, in: W. Che, J. Nabende, E. Shutova, M. T. Pilehvar (Eds.), Findings of
the Association for Computational Linguistics: ACL 2025, Association for Computational
Linguistics, Vienna, Austria, 2025, pp. 16682–16699. URL: https://aclanthology.org/2025.findings-acl.856/.
doi:10.18653/v1/2025.findings-acl.856.
[8] T. Xu, H. Zheng, C. Li, H. Chen, Y. Liu, R. Chen, L. Sun, Noderag: Structuring graph-based rag
with heterogeneous nodes, arXiv preprint arXiv:2504.11544 (2025). URL: https://arxiv.org/abs/
2504.11544. doi:10.48550/arXiv.2504.11544, submitted 15 Apr 2025.
[9] Y. Su, Y. Fang, Y. Zhou, Q. Xu, C. Yang, Clue-rag: Towards accurate and cost-eficient graph-based
rag via multi-partite graph and query-driven iterative retrieval, arXiv preprint arXiv:2507.08445
(2025). URL: https://arxiv.org/abs/2507.08445. doi:10.48550/arXiv.2507.08445, versions
v1v3.
[10] S. Yu, M. Cheng, J. Yang, J. Ouyang, Y. Luo, C. Lei, Q. Liu, E. Chen, Multi-source knowledge
pruning for retrieval-augmented generation: A benchmark and empirical study, arXiv preprint
arXiv:2409.13694 (2024). URL: https://arxiv.org/abs/2409.13694.
[11] Q. Zhao, R. Wang, X. Wang, D. Zha, N. Mu, Prefrag: Preference-driven multi-source retrieval
augmented generation, arXiv preprint arXiv:2411.00689 (2024). URL: https://arxiv.org/abs/2411.
00689.
[12] X. Yin, X. Li, Z. Zhu, W. Wang, J. Zhou, H. Guan, Tri-modal motion retrieval by learning a joint
embedding space, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition (CVPR), 2024, pp. 1234–1243. URL: https://arxiv.org/pdf/2403.00691.
[13] D. Wu, Y. Yan, Z. Liu, Z. Liu, M. Sun, Kg-infused rag: Augmenting corpus-based rag with external
knowledge graphs, arXiv preprint arXiv:2506.09542 (2025). URL: https://arxiv.org/abs/2506.09542.
[14] L. Wang, Y. Li, O. Aslan, O. Vinyals, WikiGraphs: A Wikipedia text - knowledge graph paired
dataset, in: Proceedings of the Fifteenth Workshop on Graph-Based Methods for Natural
Language Processing (TextGraphs-15), 2021, pp. 67–82. URL: https://arxiv.org/pdf/2107.09556.
doi:10.18653/v1/2021.textgraphs-1.7.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>F.</given-names>
            <surname>Petroni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Piktus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yazdani</surname>
          </string-name>
          , N. De Cao,
          <string-name>
            <given-names>J.</given-names>
            <surname>Thorne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jernite</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Karpukhin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Maillard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Plachouras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rocktäschel</surname>
          </string-name>
          , S. Riedel,
          <article-title>KILT: a benchmark for knowledge intensive language tasks</article-title>
          , in: K.
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Rumshisky</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Zettlemoyer</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Hakkani-Tur</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          <string-name>
            <surname>Beltagy</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Bethard</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Cotterell</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Chakraborty</surname>
          </string-name>
          , Y. Zhou (Eds.),
          <source>Proceedings of the</source>
          <year>2021</year>
          <article-title>Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics</article-title>
          , Online,
          <year>2021</year>
          , pp.
          <fpage>2523</fpage>
          -
          <lpage>2544</lpage>
          . URL: https: //aclanthology.org/
          <year>2021</year>
          .naacl-main.
          <volume>200</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2021</year>
          .naacl-main.
          <volume>200</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Frieske</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Ishii</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. J.</given-names>
            <surname>Bang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Madotto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fung</surname>
          </string-name>
          ,
          <article-title>Survey of hallucination in natural language generation</article-title>
          ,
          <source>ACM Comput. Surv</source>
          .
          <volume>55</volume>
          (
          <year>2023</year>
          ). URL: https: //doi.org/10.1145/3571730. doi:
          <volume>10</volume>
          .1145/3571730.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Perez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Piktus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Petroni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Karpukhin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Küttler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          , W.-t. Yih,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rocktäschel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Riedel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kiela</surname>
          </string-name>
          ,
          <article-title>Retrieval-augmented generation for knowledge-intensive nlp tasks</article-title>
          ,
          <source>in: Proceedings of the 34th International Conference on Neural Information Processing Systems</source>
          , NIPS '20, Curran Associates Inc.,
          <string-name>
            <surname>Red</surname>
            <given-names>Hook</given-names>
          </string-name>
          ,
          <string-name>
            <surname>NY</surname>
          </string-name>
          , USA,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>G.</given-names>
            <surname>Izacard</surname>
          </string-name>
          , E. Grave,
          <article-title>Leveraging passage retrieval with generative models for open domain question answering</article-title>
          , in: P. Merlo,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tiedemann</surname>
          </string-name>
          , R. Tsarfaty (Eds.),
          <source>Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics:</source>
          Main Volume,
          <article-title>Association for Computational Linguistics</article-title>
          , Online,
          <year>2021</year>
          , pp.
          <fpage>874</fpage>
          -
          <lpage>880</lpage>
          . URL: https://aclanthology.org/
          <year>2021</year>
          . eacl-main.
          <volume>74</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2021</year>
          .eacl-main.
          <volume>74</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Naseriparsa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Osborne</surname>
          </string-name>
          ,
          <article-title>Knowledge graphs: Opportunities and challenges</article-title>
          ,
          <source>Artif. Intell. Rev</source>
          .
          <volume>56</volume>
          (
          <year>2023</year>
          )
          <fpage>13071</fpage>
          -
          <lpage>13102</lpage>
          . URL: https://doi.org/10.1007/s10462-023-10465-9. doi:
          <volume>10</volume>
          . 1007/s10462-023-10465-9.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>