<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Leveraging Knowledge Graphs and Generative AI for Augmented Research Paper Retrieval</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rima Dessí</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Erick Mendez Guzman</string-name>
          <email>eguzman@hct.ac.ae</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alya Alshaami</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Amna Alowais</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hamda Alhammadi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nada Alzarooni</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Weam N.A Jarbou</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zarak Khan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Legal Data, Portoroz, Slovenia</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Higher Colleges of Technology (HCT) - University City</institution>
          ,
          <addr-line>Sharjah, UAE</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Recent advancement of Large Language Models (LLMs) has significantly enhanced Natural Language Processing and led to improved human-machine interaction. Despite their excellent benefits, LLMs still face challenges, particularly in generating accurate and reliable answers. To overcome these limitations Retrieval-Augmented Generation (RAG) systems have been proposed, which incorporate external knowledge to improve the responses generated by LLMs. In this paper, we argue that integrating a scientific Knowledge Graph (KG) into RAG systems can further enhance the efectiveness of LLMs in academic research paper retrieval. We further discuss the benefits and limitations of such approaches.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The advancement of Large Language Models (LLMs) has transformed the field of Natural Language
Processing and fostered significant improvements in capabilities such as human-machine interaction [
        <xref ref-type="bibr" rid="ref1 ref2">1,
2</xref>
        ]. These models such as the GPT series are based on transformer architecture and trained on extensive
and diverse datasets. Further, such models have illustrated advanced capabilities of understanding
human language and providing coherent answers. However, they face critical limitations, including (1)
the tendency to produce inaccurate or hallucination answers, and (2) lack of domain-specific knowledge.
These challenges arise primarily because their knowledge capabilities rely on static training data,
meaning, they have limited access to up-to-date information after training. Further, the training
data often lack very domain-specific information such as scientific research, technology, finance, and
humanities. , etc. which limits the model’s efectiveness in such areas.
      </p>
      <p>To overcome the aforementioned challenges, several Retrieval-Augmented Generation (RAG)
approaches have been proposed to enhance the performance of Large Language Models (LLMs) by
supplying relevant external knowledge. The RAG methods combine a retrieval component with a
generative model to allow LLMs to access more relevant and up-to-date data and provide more knowledge to
LLMs beyond their training data. Upon a query submission, the retrieval model identifies and provides
the most relevant document which then LLMs exploit this external knowledge to generate a higher
degree of accuracy in responses. In the case of research papers, this process is particularly beneficial by
enabling models to identify studies, methodologies, and experimental results to the given query and
provide more reliable and precise scientific results.</p>
      <p>Although these approaches achieved impressive performance by integrating external relevant
knowledge, they fail to capture the semantic and structural relations among the documents provided as
external knowledge to LLMs. To address these limitations Knowledge Graphs (KGs) are integrated
in RAG approaches. KGs such as Wikidata1 provide an excellent source of machine-understandable
structured knowledge, enabling RAG systems to retrieve documents that are more semantically relevant
and interconnected.</p>
      <p>
        Overall, this paper argues that Scientific KGs such as CS-KG [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] can significantly improve the quality
and factual accuracy of content generated by Large Language Models (LLMs) through their efective
      </p>
      <p>CEUR</p>
      <p>ceur-ws.org
integration into Retrieval-Augmented Generation (RAG) systems, particularly in the context of retrieving
research papers and specialized information.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Towards Integration of KG in RAG Systems for Research Paper</title>
    </sec>
    <sec id="sec-3">
      <title>Retrieval</title>
      <p>Integration of KGs into RAG systems provides significant benefits to LLMs generative capabilities.
Since KGs serve factual and structured knowledge in the form of entities and their relations enable
RAG systems to retrieve much relevant data efectively. To illustrate this integration, we propose the
development of a Knowledge Graph-Enhanced Academic Research Publication Retrieval System, which
later can easily be utilized by LLMs.</p>
      <p>The primary goal is designing and implementing a prototype tool that integrates a Scientific
Knowledge Graph with RAG to retrieve relevant research publications. This system will utilize structured
knowledge to generate search results that are more contextually accurate and relevant to researcher
queries, specifically targeting abstracts and titles.</p>
      <p>Additionally, the system will improve search relevance through the integration of the KG and LLMs.
By leveraging the knowledge graph, the system will enhance search relevance by connecting research
papers through semantic relationships, such as shared topics, data, and methodologies. This approach
will enable the generation of more meaningful recommendations that extend beyond traditional
keywordbased searches.</p>
      <p>Finally, we aim to lay the foundation for future knowledge graph-based research tools. This includes
documenting the technical challenges and solutions encountered during the project, thereby
creating a roadmap for future research and development of knowledge graph-based scholarly tools. The
documentation will identify additional functionalities that could further support researchers, such as
personalized recommendations and trend analysis.</p>
    </sec>
    <sec id="sec-4">
      <title>3. Challenges and Limitations</title>
      <p>The integration of Scientific Knowledge Graphs (KGs) into Retrieval-Augmented Generation (RAG)
systems can provide numerous benefits across diferent domains. KG-enhanced RAG can significantly
improve especially in the context of academic research by enabling researchers to find relevant
information more eficiently and with greater accuracy. However, there are challenges and limitations
need to be considered while integrating KG into RAG system for the more enhanced research academic
retrieval. The most notable ones include:
Data Quality and Completeness: The efectiveness of RAG system relies heavily on the KG
completeness. The number of published research papers and findings are increasing everyday. Therefore, it is
curicial to the KG should be regularly updated.</p>
      <p>Complexity of Integration: Integration of KGs into RAG systems proposes significant technical
challenges such as diferent data formats which needs to be addressed.</p>
      <p>Scalability Issues: As mentioned before, the number of available research papers and findings are
increasing everyday. To provide up-to-date results the KG should be continuously augmented with
them. This poses several scalability issues due to the volume of the data.</p>
    </sec>
    <sec id="sec-5">
      <title>4. Conclusion and Future Directions</title>
      <p>In this paper, we argue that KGs provide substantial benefits when integrated into RAG systems,
particularly in the field of academic research. Scientific KGs such as can generate search results that are
more contextually accurate and relevant to researcher queries, specifically targeting abstracts and titles.
While the challenges remain such as scalability and complexity, scientific KGs are excellent resources to
be integrated into the RAG system to enable LLMs to generate a higher degree of accuracy in responses.
During the preparation of this work, the author(s) used GPT-4 and Grammarly for Grammar and spelling
checks. After using these tool(s)/service(s), the author(s) reviewed and edited the content as needed
and take(s) full responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Bo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Tang,
          <article-title>Graph retrieval-augmented generation: A survey</article-title>
          ,
          <source>CoRR abs/2408</source>
          .08921 (
          <year>2024</year>
          ). URL: https://doi.org/10.48550/arXiv.2408.08921. doi:
          <volume>10</volume>
          .48550/ARXIV.2408.08921. arXiv:
          <volume>2408</volume>
          .
          <fpage>08921</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L.</given-names>
            <surname>Bahr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wehner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wewerka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bittencourt</surname>
          </string-name>
          , U. Schmid,
          <string-name>
            <given-names>R.</given-names>
            <surname>Daub</surname>
          </string-name>
          ,
          <article-title>Knowledge graph enhanced retrieval-augmented generation for failure mode and efects analysis</article-title>
          ,
          <source>CoRR abs/2406</source>
          .18114 (
          <year>2024</year>
          ). URL: https://doi.org/10.48550/arXiv.2406.18114. doi:
          <volume>10</volume>
          .48550/ARXIV.2406. 18114. arXiv:
          <volume>2406</volume>
          .
          <fpage>18114</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Dessì</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Osborne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Recupero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Buscaldi</surname>
          </string-name>
          , E. Motta,
          <article-title>CS-KG: A large-scale knowledge graph of research entities and claims in computer science</article-title>
          ,
          <source>in: The Semantic Web - ISWC 2022 - 21st International Semantic Web Conference, Virtual Event, October 23-27</source>
          ,
          <year>2022</year>
          , Proceedings,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>