<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>June</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Semantic Similarity Analysis of Scientific Papers in Scholarly Knowledge Graphs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Thu Huong Nguyen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cédric Pruski</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marcos Da Silveira</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Luxembourg Institute of Science and Technology</institution>
          ,
          <addr-line>5 avenue des hauts-fourneaux, L-4362 Esch sur Alzette</addr-line>
          ,
          <country country="LU">Luxembourg</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>0</volume>
      <fpage>1</fpage>
      <lpage>02</lpage>
      <abstract>
        <p>Scholarly Knowledge Graphs (SKGs) structure academic information, enabling research discovery and knowledge synthesis. However, detecting their structural and semantic evolution remains challenging due to contextual variations and implicit knowledge shifts. In this paper, we introduce MOKA, a framework that integrates bibliometric data, advanced Natural Language Processing, and Agentic Retrieval-Augmented Generation to characterize SKG evolution. As part of this process, we propose a method to measure semantic similarity between papers by refining the representation of documents and applying four pretrained language models. Our approach assesses semantic similarity across diferent textual granularity levels, including full papers, abstracts, and the ORKG metadata. Our experimental results demonstrates that the similarity scores highly depend on the combination of language model and the evoked dimensions of the papers to evaluate.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Scholarly Knowledge Graphs</kwd>
        <kwd>Semantic Similarity</kwd>
        <kwd>Knowledge Graph Evolution</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In recent years, numerous Scholarly Knowledge Graphs (SKGs) have emerged, employing diverse
approaches to structuring research and scientific data. Bibliographic and citation-centric SKGs [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4">1, 2, 3, 4</xref>
        ]
primarily align publications through citations and metadata, such as authors, institutions, and domains.
In contrast, content-oriented SKGs, such as CS-KG1 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and the ORKG2 (Open Research Knowledge
Graph [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]), focus on capturing scientific knowledge by detailing research problems, methods, and
ifndings.
      </p>
      <p>As scientific output expands and new disciplines emerge, SKGs must evolve both structurally and
semantically to keep pace with academic research. Structural evolution refers to changes in the SKG’s
network over time, observed through the continuous addition of nodes (representing newly published
articles) and edges (capturing relationships like citations, co-authorship, and venues). This evolution is
particularly prominent in bibliographic and citation-focused SKGs, reflecting the ever-growing body
of scholarly work. Tracking these changes ofers valuable insights into emerging research trends,
influential studies, academic collaborations, and funding patterns, all of which shape the trajectory of
scientific progress.</p>
      <p>Semantic evolution, on the other hand, pertains to shifts in academic ideas within SKGs. In
contentfocused SKGs like presented in the ORKG, we can obtain comparison tables showing studies that
introduce methods, findings, and concepts that redefine existing knowledge. Unlike structural changes,
detecting semantic evolution is more complex, as it requires not only identifying modifications in
relationships but also interpreting the evolving meanings of concepts within a dynamic research
landscape. This demands sophisticated Natural Language Processing (NLP) techniques capable of
capturing nuances in context, terminology, and evolving discourse.</p>
      <p>Recent advancements in Artificial Intelligence (AI), particularly in generative AI (GenAI) and the
development of Large Language Models (LLMs), have substantially improved the ability of computer
systems to analyze and interpret semantic evolution such as approaches, methodologies, or goals.
Trained on vast amounts of textual data, these models enable more precise detection of related (but
with some variations) works, ofering deeper insights into knowledge evolution within domains over a
period of time. However, detecting semantic evolution in SKGs remains particularly dificult due to
challenges such as contextual understanding, concept disambiguation, and knowledge synthesis.
Efective identification of approaches evolution requires not only tracking explicit changes in terminology
but also capturing implicit shifts in meaning driven by new discoveries, interdisciplinary interactions,
and emerging research trends.</p>
      <p>Addressing these challenges necessitates the integration of advanced AI-driven techniques with
domain-specific expertise to enhance the accuracy and reliability of semantic change detection in SKGs.
By refining methods for analyzing knowledge evolution, researchers can benefit from efective tools
for tracking scientific progress, fostering interdisciplinary collaboration, and supporting data-driven
decision-making in academia. In this paper, we deal with the challenge of Scholarly Knowledge Graph
(SKG) evolution, a complex problem that requires the integration of multiple AI-driven techniques.
Managing this evolution efectively demands a comprehensive approach that goes beyond structural
updates, incorporating advanced methods for detecting semantic evolution in scholarly content.</p>
      <p>
        To tackle this issue, we introduce MOKA, our view of a mid-term framework designed to capture
both structural and semantic evolution in SKGs. MOKA integrates three complementary techniques:
Natural Language Processing (NLP) technique for in-depth content analysis, with Agentic
RetrievalAugmented Generation (Agentic-RAG) technique [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] to dynamically formulate queries and plans, extract
and complete missing data, detect novelties, and Knowledge graphs (KGs) to represent the evolution of
domain knowledge. MOKA enables a more eficient and automated interrogation, analysis, and synthesis
of diverse information sources. In this paper, we mainly focus on the Semantic Similarity Analysis
of MOKA framework, providing a detailed description on how to identify if two papers are ‘related’.
Our approach involves harmonizing document representations and employing language models [8] to
assess their content similarity [9]. To validate our approach, we conduct an extensive experimental
evaluation, comparing semantic similarity measurements across diferent levels of textual granularity,
including full papers, titles and abstracts, and the ORKG metadata. A key component of our approach is
its integration with the comparison table feature of the ORKG infrastructure [10]. This feature provides
structured semantic descriptions of research articles, systematically organizing key attributes such as
objectives, methodologies, and findings. By leveraging this structured representation, MOKA enhances
the discovery of related work and facilitates meaningful comparisons between research contributions.
Our mid-term goal is to automate the identification of ‘evolutionary relations’ between papers, a process
that currently requires a laborious manual efort.
      </p>
      <p>The remainder of the paper is structured as follows: Section 2 presents related work of the field
semantic similarity between scientific papers. Section 3 presents our general approach for identifying
evolution relationships between scientific articles. More details of the Semantic Similarity Analysis
approach is given in Section 4. Section 5 proposes an experimental assessment of our method. Section 6
wraps up with concluding remarks and outlines future work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>Several recent studies have focused on automatically detecting relationship between scientific
publications. One prominent approach is to use natural language generation techniques to produce citation
texts that summarize and contextualize these relationships. Luu et al. [11] operationalize this task by
using citing sentences as a proxy for scientific relationships and train a large language model (LLM) to
generate relationship-explaining text. Their approach involves pre-training a large language model as a
foundation for autoregressive approaches and explores diferent perspectives on the documents, using
dense representations extracted with scientific information extraction systems. Li and Ouyang [ 12, 13]
extend this concept, by leveraging feature-based, LLM-prompting approaches to generate richer
citationtexts that capture complex interconnection among multiple papers. Their method extracts features
from local citation network, incorporate them into a prompt, and generate citation paragraphs enriched
with transition sentences, making citation texts more cohesive and interpretable. Similarly, Xing et
al. [14] propose a multi-source pointer-generator network with cross-attention to automatically
generate citation texts, demonstrating that such models can synthesize meaningful scholarly relationships.
In addition, researchers have explored graph-based methods for analyzing the evolution of scientific
knowledge. Dalle Lucca Tosi &amp; Dos Reis [15] introduce a concept-based evolution tracking approach,
using knowledge graphs to identify and compare scientific subfields over time. Similarly, Rossanez
et al. [16] employ temporal knowledge graphs to model and analyze knowledge evolution within
unstructured scientific corpora. Aparicio et al. [ 17] leverage dynamic knowledge graphs to extract
emerging research trends by analyzing evolving knowledge communities, while [18] applies knowledge
graph embedding techniques to enhance citation recommendation systems, addressing limitations in
static models.</p>
      <p>Other works focus on forecasting the impact of scientific research. Gu &amp; Krenn [ 19] construct
an evolving knowledge graph of over 21 millions scientific papers, integrating semantic and citation
networks to predict the future impact of emerging research ideas. Salatino et al. [20] take a
bibliometricdriven approach, using ontology-based research topic modeling to analyze and forecast trends in
research dynamics. Meanwhile, [21] introduces the AIDA knowledge graph, which characterizes
research topics and industrial sectors, providing detailed insights into academia-industry knowledge
transfer.</p>
      <p>Beyond citation text generation, various approaches have been explored to measure semantic
similarity between scientific papers. These methods range from traditional vector-space models to deep
learning-based techniques, each ofering distinct advantages and limitations. Early approaches to
measuring document similarity rely on vector-space models such as TF-IDF [22], Latent Semantic Analysis
(LSA) [23]. TF-IDF ranks documents based on term frequency while down-weighting common words,
making them efective for lexical matching but inadequate for capturing deeper semantic relationships.
LSA, on the other hand, applies singular value decomposition (SVD) to uncover latent relationships
between words and documents, improving synonym recognition but sufering from interpretability
issues and high computational cost. These models work well for keyword-based retrieval but struggle
with polysemy, paraphrasing, and contextual understanding, limiting their applicability in semantic
similarity tasks.</p>
      <p>To address the limitations of traditional models, word embeddings such as Doc2Vec [24], fastText [25]
or GloVe [26] have been introduced. These techniques represent words as dense, continuous vectors in
a high-dimensional space, capturing semantic relationships through contextual co-occurrence patterns.
While efective in preserving word-level semantics, these models lack sentence-level understanding
and struggle with out-of-vocabulary words, particularly in the ever-evolving landscape of scientific
terminology. Recent advances in deep learning have led to the development of sentence embeddings,
which extend word embeddings to entire sentences or documents. One of the most prominent models in
this category is SBERT (Sentence-BERT) [27], which fine-tunes BERT-based models using Siamese and
triplet network architectures to generate meaningful sentence-level embeddings. SBERT significantly
outperforms traditional word embeddings by capturing contextual meaning and sentence semantics,
making it well-suited for scientific text similarity tasks. However, while SBERT excels at capturing
semantic nuances, it requires computationally expensive fine-tuning and may struggle with
domainspecific jargon if not properly trained on scholarly corpora.</p>
      <p>An emerging trend in document similarity assessment is the integration of generative AI, including
retrieval-augmented generation (RAG) [28] and large language models (LLMs) [29], to enhance scholarly
knowledge extraction and relationship modeling. RAG combines information retrieval with generative
capabilities, retrieving contextually relevant scientific text from large corpora and incorporating it
into the generation or analysis of scholarly relationships. This approach helps bridge gaps in missing
contextual information, particularly for sparsely cited or newly published papers. However, these
techniques are computationally intensive, prone to hallucination, and rely on well-curated external
knowledge sources to ensure accuracy and reliability. Despite these challenges, generative AI-driven
approaches hold great promise in advancing automated literature review, citation recommendation,
and knowledge graph enrichment, making them valuable tools for scientific research discovery.</p>
    </sec>
    <sec id="sec-3">
      <title>3. The MOKA framework</title>
      <p>
        We propose the MOKA framework, illustrated in Fig. 1, based on an Agentic-RAG architecture [
        <xref ref-type="bibr" rid="ref7">30, 7</xref>
        ]
to dynamically identify and explicitly represent the evolutionary relationships between scientific
publications described in the ORKG. This approach leverages LLM advancements, incorporating sequential
processes such as temporal analysis, citation tracking, and semantic analysis of abstracts and full texts.
These steps provide context augmentation, improving the discovery of evolutionary links between
research contributions within the ORKG.
      </p>
      <p>In this framework, Agentic-RAG acts as an orchestrator, planning complex information flows and
enabling flexible task execution across multiple data sources, including the ORKG, citation databases,
and abstract/full-text repositories. The model capitalizes on the reasoning capabilities of ReAct Agents
[31], embedded within the Agentic-RAG framework, to execute retrieval and analysis tasks eficiently.</p>
      <p>The process is initiated whenever a new comparison table related to a specific research topic is added
or updated in the ORKG. This triggers an agent-based reasoning process, structured as follows:
• Query Planning (QP): The agent, assisted by a Large Language Model (LLM), formulates an
exploration strategy using the existing comparison table as input. The exploration task is decomposed
into specific retrieval objectives, which are assigned to Query Engines (QE) within the Tool Use
framework for eficient data acquisition.
• Citation Tracking &amp; Data Retrieval: The Citation Tracking QE connects to citation databases,
identifying citation relationships between research papers. This step enriches the dataset with
explicit citation links, uncovering potential evolutionary relationships and preparing for semantic
analysis.
• Semantic Analysis: The agent dynamically adapts its reasoning to infer semantic relationships,
independent of direct citations. The Semantic Analysis QE processes targeted text segments,
including: Contextual snippets surrounding citations in full texts, abstracts of related studies, and
the ORKG metadata, capturing both explicit and inferred connections. The agent performs paired
analyses between the new contribution and each related study using:
– Semantic Similarity Analysis [32, 33] to quantify content alignment, and
– Textual Entailment Analysis [34] to detect conceptual dependencies and knowledge transfer.
This phase operates iteratively, refining retrieval strategies within QP through a continuous
feedback loop. The agent adjusts retrieval parameters to enhance relevance, ensuring that only
the most contextually aligned studies are retained for synthesis.
• Synthesis &amp; Evolution Mapping. In the final phase, the agent, aided by the LLM, aggregates and
synthesizes findings from the semantic analysis phase to construct a coherent view of research
evolution. Studies are arranged chronologically, allowing the LLM to map the progression of
research over time. This results in a refined set of evolutionary relationships, outlining how the
new contribution aligns with, diverges from, or builds upon prior studies in the ORKG.
By automating the identification of research evolution and semantic transformation, this framework
will enhances the discovery of implicit relations within scholarly literature. These connections can
either be updated directly in the ORKG or stored in a separate historical knowledge graph (HKG) [35]
to efectively describe evolutionary relationships while distinguishing them from factual research data,
enabling in-depth analysis, preventing overload on the ORKG, and allowing for independent refinement
of complex temporal data within the HKG.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Semantic Similarity Analysis</title>
      <p>In this section, we will delve into semantic similarity analysis, one of the key component of semantic
analysis step mentioned in Section 3. By evaluating the semantic relatedness between publications, a
comprehensive and detailed view of the interrelationships among documents can be revealed which can
serve as a foundational basis for subsequent semantic analyses in exploring the evolutionary relations
between them. Specifically, we developed a system to measure semantic similarity between scientific
publications across three key dimensions:
• the properties extracted from the papers, which consist of metadata available on the ORKG
comparison tables
• the combined title and abstract across papers
• the main content of the paper (excluding abstract)
In order to quantify similarity across these dimensions, we employ several prominent sentence
embedding models SBERT- all-MiniLM-L6-v2, all-MiniLM-L12-v2, known for its eficiency and robust
general language representations that capture broad semantic features, and allenai-specter, allenai/
specter2_classification , which, being specifically trained on scientific literature, provides deeper domain
specific insights. Cosine similarity is then used as the metric to compute the similarity between the
resulting sentence embeddings.</p>
      <sec id="sec-4-1">
        <title>Analyzing Properties Similarity:</title>
        <p>In this dimension, the system focuses on evaluating technical and contextual metadata associated
publications. These properties provided by the ORKG users capture details about the research aim,
employed methods, outcome, and other relevant information (e.g., author, publisher,...). Initially, a set
of heuristic rules is applied to groups these properties into four main categories: goal, method, result
and other. For example, properties containing phrases like “aim” or “objective” are grouped under
“goal” while those mentioning “technique” or “algorithm” are classified as “method”. In the case when
the heuristic rules cannot assign a property to a category, a zero-short classification model is used by
default to assign it to the most appropriate category. This systematic grouping refers to standardizing
the diverse metadata provided by various paper templates which enables fine-grained comparison of
their core characteristics. The text corresponding to each category is converted into vector embedding
and pairwise cosine similarity is computed between the corresponding groups of diferent papers to
quantify their semantic alignment.</p>
      </sec>
      <sec id="sec-4-2">
        <title>Analyzing Title and Abstract Similarity:</title>
        <p>The semantic relatedness of each paper’s core identifying information is evaluated by merging its title
and abstract in a single composite text. This combined text is then transformed into a high dimensional
vector representation using advanced embedding models of SBERT, describing the key themes and
contributions of the paper. The cosine similarity between these vectors is computed to measure how
closely the core ideas of diferent papers align.</p>
      </sec>
      <sec id="sec-4-3">
        <title>Analyzing Main Content Similarity:</title>
        <p>The main content structure similarity is assessed by leveraging the extension of IMRaD standard (with
the addition of a Related Work section) [36, 37] which organizes scientific publications into five main
sections: Introduction, Methodology, Discussion, Results, and Related Work. Specifically, the similarity
between each pair of papers is determined by comparing the corresponding sections in each document,
thereby capturing the semantic relatedness across the publications. This content similarity analysis
begins with the extraction of main content of the paper. Input documents in PDF format are preprocessed
using GROBID [38, 39], a robust tool for extracting the core content of scientific publications as an
XML file. Following this, a comprehensive cleaning procedure is applied to the raw XML text to remove
extraneous header blocks (e.g., Title, DOI, Abstract), non-essential formatting and to convert table and
ifgure elements into plain text with section titles normalized to title case. This step ensures that only the
core content, standardized content remains for accurate section classification and semantic similarity
analysis. Next, the classification of content sections is performed by applying IMRaD framework. For
each publication, if explicit section headers are present, the corresponding content is directly extracted,
ensuring the original structure of the document is preserved. In the cases when the paper lacks clear
section markers or does not adhere to the IMRaD format, a fine-tuned classification model built on the
available dataset of [37] is employed to analyze each paragraph and automatically assign it to the most
appropriate IMRaD category, thereby standardizing the content regardless of its original format.</p>
        <p>After the sections of each paper are identified and grouped, each section is processed to compute
semantic similarity. For each section, if the content is extensive, it is segmented into smaller chunks
based on a fixed number of sentences. Each chunk is then converted into a vector representation using
an SBERT model. The resulting chunk vectors are averaged to generate a single vector that represents
the entire section. Finally, the cosine similarity between the averaged vectors of corresponding sections
from two papers is calculated to assess their semantic relatedness.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experiments</title>
      <p>In this section we describe the experimental assessment of our method to measure semantic similarity
between papers of a SKG. We start with the introduction of the experimental protocol and material and
then we present the results and discuss the limitations of the approach.</p>
      <sec id="sec-5-1">
        <title>5.1. Experimental Protocol</title>
        <p>Our objective is to evaluate the approach we have described in Section 4. To do so, we evaluated the
agreement between our algorithm and reference datasets as depicted in Fig. 2. Our reference datasets
consist of two sets of scientific papers selected from two diferent comparison tables from the ORKG.</p>
        <p>The Paper selection and preprocessing task consists in preparing the papers that will serve in our
experiments. In Papers group 1, we have a set of 66 pairs of related papers. We obtained this group by
considering the 12 papers contained in the ORKG CT1 comparison table3. As the papers came from the
same comparison table, we can assume that these papers are identified as related by the user who has
created the table. Papers group 2 contains 72 pairs of papers (, ) where  ∈ ORKG CT1 and  ∈ ORKG
Papers selection</p>
        <p>and
preprocessing
Papers 
group 1</p>
        <p>Papers 
group 2
Semantic similarity
measurement and
classification
Results in
Table1</p>
        <p>Semantic similarity
measurement and
classification
Results in
Table2
CT2. ORKG CT2 is another comparison table4 which thematic is diferent from the one of ORKG CT1
and contains 6 papers. The pair (, ) therefore form a pair of unrelated papers.</p>
        <p>The Semantic similarity measurement and classification task consists in taking the pairs
(1, 1) ∈ Papers group 1 and the pairs (2, 2) ∈ Papers group 2 and computing the semantic similarity
between 1 and 1 on one hand and between 2 and 2 on the other hand. To do so, we consider the three
cases described in Section 4 and generated embeddings using a dedicated model (in our experiments we
have tested all-MiniLM-L6-v2, allenai-specter, all-MiniLM-L12-v2 and allenai/specter2_classification ).
Once we have the required embeddings, we use the well-known cosine distance to measure the distance
between the embeddings.</p>
        <p>The classification is then done based on the interpretation of the consolidated value of the cosine
distance we obtained. This consolidation is done as follows:
• Regarding the ORKG properties: We compute the distance between the Goal property of each
paper, the Method property of each paper and the Result property of each paper. The resulting
distance is the average value obtained from these 3 values.
• Regarding title and abstract: The distance is obtained by considering the full abstract and title.
• Regarding main content: After harmonizing the structure of each papers according to the IMRaD
structure (see Section 4), we compute the distance between the Introduction section of each paper,
the Method section of each paper and the Result section of each paper. The resulting distance is
the average value obtained from these 3 values.</p>
        <p>Then if the obtained consolidated distance value is below 0.4 we consider the papers as “Not related”,
if the value is between 0.4 and 0.7 the relationship is “Unclear” and if the value is greater than 0.7 we
consider the papers as “Related”.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Results</title>
        <p>In this section, we present the results obtained from executing our experimental protocol on two groups
of data. Table 1 and Table 2 provide a comparative analysis of the models, with columns representing
the models and sub-columns distinguishing between the types of information used in the experiments:
title &amp; abstract, full harmonized text, and structured data from the ORKG. The rows in these tables show
the percentage of agreement between the model predictions and the manual classification of papers
performed by the ORKG experts, which serves as the baseline for evaluation.</p>
        <p>In Table 1, the row “Related” indicates the true positive rate, while the row “Not Related” represents
the false positive rate. Conversely, in Table 2, these definitions are reversed: the “Not Related” row
corresponds to the true positive rate, and the “Related” row corresponds to the false positive rate. In
both tables, the “Unclear” row captures cases where the algorithm was unable to make a definitive
classification (i.e., when the similarity score falls between 0.4 and 0.7).</p>
        <p>The results from Table 1 highlight the exceptional performance of allenai-specter and allenai-specter2
model in detecting similar papers. Even when using only title and abstract, these models achieved 100%
precision. This outcome aligns with the fact that allenai-specter and allenai-specter2 are specifically
trained to detect co-citation similarity, meaning they excel at identifying pairs of papers that cite the
same cluster of references. However, while highly efective at classifying related papers, Table 2 reveals
that both models are less eficient in identifying non-related pairs. A possible explanation is that the
models primarily focus on signals that establish connections between papers but lacks mechanisms to
detect dissimilarities efectively. Another possibility is that the threshold used to make decisions should
be diferent from the adopted one.</p>
        <p>A diferent approach is taken by all-MiniLM-L6-v2 and all-MiniLM-L12-v2, both models are based on
SBERT and trained to assess sentence-level similarity. As observed in Table 1, for all-MiniLM-L6-v2,
using only title &amp; abstract does not provide suficient context to precisely determine whether two
papers are related, often leading to misclassifications. However, structured the ORKG metadata appears
to capture essential similarities in related papers, while the main content provides even more reliable
results though it requires preprocessing PDF files. The model all-MiniLM-L12-v2 shows a significant
improvement in the results for Title &amp; Abstracts, as well as for the ORKG, but the main content seams
to be less important for decisions. Table 2 also shows the improvement on the new version of MiniLM
for all three data sources. We would like to highlight that the ORKG presents a balanced compromise
between these data sources. It provides high-quality, structured information while maintaining low
execution time, as the data in comparison tables is both concise and highly relevant. In contrast,
processing full-text papers requires significantly more preprocessing efort, generates a large volume of
data proportional to the document size, and results in longer execution times.</p>
        <p>A noteworthy observation is that false negatives are minimal for title &amp; abstract corpora and
completely absent when using full text or the ORKG metadata. However, the rate of unclear classifications
remains relatively high. A deeper analysis of these cases reveals that in over 70% of unclear instances,
the similarity score falls between 0.6 and 0.7.</p>
        <p>In future works, we plan to expand our experiments to all comparison tables of the ORKG. We will also
further refine our classification thresholds. In particular, we will investigate how to systematically define
the threshold for each model (independently) for determining relatedness between papers. Additionally,
we aim to analyze the impact of diferent IMRaD (Introduction, Methods, Results, and Discussion)
categories on classification accuracy. This will allow us to explore whether assigning diferent weights to
these categories (e.g., calculate weighted average of categories) or focusing solely on specific categories
could enhance decision-making in scholarly knowledge graph evolution.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>In this paper, we present the mid-term vision of our agentic-RAG framework, with a particular focus
on one key feature: semantic similarity analysis. We describe our approach to select and structure
information from scientific papers and how these data sources are utilized within our MOKA framework.
Additionally, we detail the integration of various semantic similarity tools to assess whether pairs
of papers are related. Our extensive evaluation demonstrates that the choice of similarity model
significantly impacts precision, with results ranging from 12% to 100%. These findings highlight the
advantages of agentic approaches, which dynamically adjust the framework’s configuration based on
the specific objectives and intentions of the end-user.</p>
      <p>The preliminary results shows a promising path for MOKA. This framework for analyzing how new
research contributions align, diverge from, or build upon prior studies, can potentially facilitate a deeper
understanding of the evolving scholarly landscape. By leveraging structured knowledge graphs (SKGs)
such as the ORKG, MOKA will enhances the discovery of semantic evolutionary relationships and ofers
a novel approach to tracking research progression. However, several limitations must be acknowledged,
highlighting key areas for future improvements.</p>
      <p>One major limitation lies in the restricted exploration scope within the ORKG. While the ORKG serves
as an efective prototype SKG, the model is designed to generalize across diferent scholarly knowledge
graphs. Adapting the framework to other SKGs with unique data structures and ontologies would
enhance its applicability across various research domains. Future work should explore fine-tuning the
model for diverse SKGs to maximize its impact on a broader scholarly landscape.</p>
      <p>Additionally, the model currently lacks a dedicated mechanism to represent discovered evolutionary
relationships. Directly updating the ORKG with inferred connections may lead to data overload and
ambiguity. A promising alternative is to store these relationships in a separate structure, such as a
historical knowledge graph (HKG) [35], allowing for more refined temporal analysis while preserving
the integrity of factual research data.</p>
      <p>The access restrictions to full-text publications pose another significant limitation. Many scholarly
works remain locked behind paywalls, restricting the model’s ability to perform comprehensive text
analysis. A practical mitigation strategy is to rely on open-access repositories (e.g., Arxiv5, HAL6),
which provide unrestricted access to valuable research content.</p>
      <p>Finally, processing lengthy scholarly texts with large language models (LLMs) remains a challenge,
particularly in maintaining contextual accuracy across multi-page documents. However, with rapid
advancements in LLM architectures, this limitation is expected to diminish over time. In the short
term, a strategic approach is to focus on key sections of publications such as abstracts, methods, or
discussions, and literature reviews where crucial research contributions and evolutionary relationships
are most explicitly stated.</p>
      <p>Despite these challenges, MOKA framework represents a significant step toward automating the
identification of research evolution in scholarly literature. Future work should focus on refining data
retrieval strategies, expanding applicability to multiple SKGs, and integrating advanced validation
frameworks to enhance reliability and scalability.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This project is funded by the European Union’s Horizon 2022 research and innovation programme
under the Marie Sklodowska-Curie Grant Agreement No. 101102337.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>The authors have not employed any Generative AI tools.
5https://arxiv.org/
6https://hal.science/
[8] N. Reimers, I. Gurevych, Sentence-bert: Sentence embeddings using siamese bert-networks, in:</p>
      <p>EMNLP/IJCNLP (1), Association for Computational Linguistics, 2019, pp. 3980–3990.
[9] J. Gómez, P.-P. Vázquez, An empirical evaluation of document embeddings and similarity metrics
for scientific articles, Applied Sciences 12 (2022) 5664.
[10] A. Oelen, M. Y. Jaradeh, K. E. Farfar, M. Stocker, S. Auer, Comparing research contributions in a
scholarly knowledge graph, in: SciKnow@K-CAP, volume 2526 of CEUR Workshop Proceedings,
CEUR-WS.org, 2019, pp. 21–26.
[11] K. Luu, X. Wu, R. Koncel-Kedziorski, K. Lo, I. Cachola, N. A. Smith, Explaining relationships
between scientific documents, in: ACL/IJCNLP (1), Association for Computational Linguistics,
2021, pp. 2130–2144.
[12] X. Li, J. Ouyang, Explaining relationships among research papers, CoRR abs/2402.13426 (2024).
[13] X. Li, J. Ouyang, Explaining relationships among research papers, in: COLING, Association for</p>
      <p>Computational Linguistics, 2025, pp. 1080–1105.
[14] X. Xing, X. Fan, X. Wan, Automatic generation of citation texts in scholarly papers: A pilot
study, in: Annual Meeting of the Association for Computational Linguistics, 2020. URL: https:
//api.semanticscholar.org/CorpusID:220045125.
[15] M. Dalle Lucca Tosi, J. C. dos Reis, Understanding the evolution of a scientific field by clustering
and visualizing knowledge graphs, Journal of Information Science 48 (2022) 71–89.
[16] A. Rossanez, J. C. dos Reis, R. da Silva Torres, Representing scientific literature evolution via
temporal knowledge graphs, in: MEPDaW@ISWC, 2020. URL: https://api.semanticscholar.org/
CorpusID:233433129.
[17] J. T. Aparicio, E. Arsenio, F. Santos, R. Henriques, Using dynamic knowledge graphs to detect
emerging communities of knowledge, Knowledge-Based Systems 294 (2024) 111671.
[18] J.-C. Liu, C.-T. Chen, C. Lee, S.-H. Huang, Evolving knowledge graph representation learning
with multiple attention strategies for citation recommendation system, ACM Transactions on
Intelligent Systems and Technology 15 (2024) 1–26.
[19] X. Gu, M. Krenn, Impact4cast: Forecasting high-impact research topics via machine learning on
evolving knowledge graphs, in: ICML 2024 AI for Science Workshop, 2024.
[20] A. A. Salatino, A. Mannocci, F. Osborne, Detection, Analysis, and Prediction of Research Topics
with Scientific Knowledge Graphs, Springer International Publishing, Cham, 2021, pp. 225–252.
doi:10.1007/978-3-030-86668-6_11.
[21] S. Angioni, A. Salatino, F. Osborne, D. R. Recupero, E. Motta, Aida: A knowledge graph about
research dynamics in academia and industry, Quantitative Science Studies 2 (2021) 1356–1398.
[22] A. Widianto, E. Pebriyanto, F. Fitriyanti, M. Marna, Document similarity using term
frequencyinverse document frequency representation and cosine similarity, Journal of Dinda: Data Science,
Information Technology, and Data Analytics 4 (2024) 149–153.
[23] H. Xu, W. Zeng, J. Gui, P. Qu, X. Zhu, L. Wang, Exploring similarity between academic paper
and patent based on latent semantic analysis and vector space model, in: 2015 12th international
conference on fuzzy systems and knowledge discovery (FSKD), IEEE, 2015, pp. 801–805.
[24] Q. Le, T. Mikolov, Distributed representations of sentences and documents, in: International
conference on machine learning, PMLR, 2014, pp. 1188–1196.
[25] P. Bojanowski, E. Grave, A. Joulin, T. Mikolov, Enriching word vectors with subword information,</p>
      <p>Transactions of the association for computational linguistics 5 (2017) 135–146.
[26] J. Pennington, R. Socher, C. D. Manning, Glove: Global vectors for word representation, in:
Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP),
2014, pp. 1532–1543.
[27] N. Reimers, I. Gurevych, Sentence-bert: Sentence embeddings using siamese bert-networks, arXiv
preprint arXiv:1908.10084 (2019).
[28] J. Jeong, D. Jin, Automatic classification of scientific and technical papers using large language
models and retrieval-augmented generation (2024).
[29] G. Mitrov, B. Stanoev, S. Gievska, G. Mirceva, E. Zdravevski, Combining semantic matching, word
embeddings, transformers, and llms for enhanced document ranking: Application in systematic
reviews, Big Data and Cognitive Computing 8 (2024) 110.
[30] C. Ravuru, S. S. Srinivas, V. Runkana, Agentic retrieval-augmented generation for time series
analysis, CoRR abs/2408.14484 (2024).
[31] S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. R. Narasimhan, Y. Cao, React: Synergizing reasoning and
acting in language models, in: The Eleventh International Conference on Learning Representations,
ICLR 2023, Kigali, Rwanda, May 1-5, 2023, OpenReview.net, 2023. URL: https://openreview.net/
forum?id=WE_vluYUL-X.
[32] S. Xu, Z. Wu, H. Zhao, P. Shu, Z. Liu, W. Liao, S. Li, A. Sikora, T. Liu, X. Li, Reasoning before
comparison: Llm-enhanced semantic similarity metrics for domain specialized text analysis, CoRR
abs/2402.11398 (2024).
[33] Y. Feng, Semantic textual similarity analysis of clinical text in the era of llm, 2024 IEEE Conference
on Artificial Intelligence (CAI) (2024) 1284–1289. URL: https://api.semanticscholar.org/CorpusID:
271040424.
[34] I. Dagan, W. B. Dolan, B. Magnini, D. Roth, Recognizing textual entailment: Rational, evaluation
and approaches – erratum, Natural Language Engineering 16 (2010) 105 – 105. URL: https:
//api.semanticscholar.org/CorpusID:8336653.
[35] S. D. Cardoso, M. D. Silveira, C. Pruski, Construction and exploitation of an historical knowledge
graph to deal with the evolution of ontologies, Knowl. Based Syst. 194 (2020) 105508.
[36] I. Ahmed, M. T. Afzal, A systematic approach to map the research articles’ sections to imrad, IEEE</p>
      <p>Access 8 (2020) 129359–129371. URL: https://api.semanticscholar.org/CorpusID:220733920.
[37] T. Saier, J. Krause, M. Färber, unarXive 2022: All arXiv Publications Pre-Processed for NLP,
Including Structured Full-Text and Citation Network, in: Proceedings of the 23rd ACM/IEEE Joint
Conference on Digital Libraries, JCDL ’23, 2023.
[38] Grobid, https://github.com/kermitt2/grobid, 2008–2025.
[39] P. Lopez, L. Romary, Grobid - information extraction from scientific publications, ERCIM News
2015 (2015). URL: https://api.semanticscholar.org/CorpusID:36526770.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Acekg: A large-scale knowledge graph for academic data mining</article-title>
          , in: CIKM, ACM,
          <year>2018</year>
          , pp.
          <fpage>1487</fpage>
          -
          <lpage>1490</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Färber</surname>
          </string-name>
          ,
          <article-title>The microsoft academic knowledge graph: A linked data source with 8 billion triples of scholarly data</article-title>
          ,
          <source>in: ISWC (2)</source>
          , volume
          <volume>11779</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2019</year>
          , pp.
          <fpage>113</fpage>
          -
          <lpage>129</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Giambattista</surname>
          </string-name>
          , I. Heibi,
          <string-name>
            <given-names>S.</given-names>
            <surname>Peroni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Shotton</surname>
          </string-name>
          ,
          <article-title>Opencitations: an open e-infrastructure to foster maximum reuse of citation data</article-title>
          ,
          <source>Int. J. Digit. Curation</source>
          <volume>17</volume>
          (
          <year>2022</year>
          )
          <article-title>5</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Färber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lamprecht</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Krause</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Aung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Haase</surname>
          </string-name>
          ,
          <article-title>Semopenalex: The scientific landscape in 26 billion RDF triples</article-title>
          ,
          <source>in: ISWC</source>
          , volume
          <volume>14266</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2023</year>
          , pp.
          <fpage>94</fpage>
          -
          <lpage>112</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D.</given-names>
            <surname>Dessì</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Osborne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Recupero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Buscaldi</surname>
          </string-name>
          , E. Motta,
          <article-title>CS-KG: A large-scale knowledge graph of research entities and claims in computer science</article-title>
          ,
          <source>in: ISWC</source>
          , volume
          <volume>13489</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2022</year>
          , pp.
          <fpage>678</fpage>
          -
          <lpage>696</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Oelen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Haris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Stocker</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. D'Souza</surname>
            ,
            <given-names>K. E.</given-names>
          </string-name>
          <string-name>
            <surname>Farfar</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Vogt</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Prinz</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Wiens</surname>
            ,
            <given-names>M. Y.</given-names>
          </string-name>
          <string-name>
            <surname>Jaradeh</surname>
          </string-name>
          ,
          <article-title>Improving access to scientific literature with knowledge graphs</article-title>
          ,
          <source>Bibliothek Forschung und Praxis</source>
          <volume>44</volume>
          (
          <year>2020</year>
          )
          <fpage>516</fpage>
          -
          <lpage>529</lpage>
          . URL: https://doi.org/10.1515/bfp-2020
          <article-title>-2042</article-title>
          . doi:doi: 10.1515/bfp-2020-
          <year>2042</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ehtesham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kumar</surname>
          </string-name>
          , T. T. Khoei,
          <article-title>Agentic retrieval-augmented generation: A survey on agentic rag</article-title>
          ,
          <source>ArXiv abs/2501</source>
          .09136 (
          <year>2025</year>
          ). URL: https://api.semanticscholar.org/CorpusID: 275570331.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>