<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Scaling Scientific Knowledge Discovery with Neuro-Symbolic AI and Large Language Models</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Wilma</forename><forename type="middle">Johanna</forename><surname>Schmidt</surname></persName>
							<email>wilma.schmidt@de.bosch.com</email>
							<affiliation key="aff0">
								<orgName type="department">Bosch Center for AI</orgName>
								<orgName type="institution">Robert Bosch GmbH</orgName>
								<address>
									<settlement>Renningen</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
							<affiliation key="aff3">
								<orgName type="laboratory" key="lab1">SIRIUS</orgName>
								<orgName type="laboratory" key="lab2">Centre for Scalable Data Access</orgName>
								<orgName type="institution">University of Oslo</orgName>
								<address>
									<settlement>Oslo</settlement>
									<country key="NO">Norway</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Diego</forename><surname>Rincon-Yanez</surname></persName>
							<email>drinconyanez@unisa.it</email>
							<affiliation key="aff1">
								<orgName type="institution">University of Salerno</orgName>
								<address>
									<settlement>Fisciano</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
							<affiliation key="aff5">
								<orgName type="department">Facultad de Ingenierías y Tecnologías</orgName>
								<orgName type="institution">Universidad de Santander</orgName>
								<address>
									<settlement>Cucuta</settlement>
									<country key="CO">Colombia</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Evgeny</forename><surname>Kharlamov</surname></persName>
							<email>evgeny.kharlamov@de.bosch.com</email>
							<affiliation key="aff0">
								<orgName type="department">Bosch Center for AI</orgName>
								<orgName type="institution">Robert Bosch GmbH</orgName>
								<address>
									<settlement>Renningen</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
							<affiliation key="aff3">
								<orgName type="laboratory" key="lab1">SIRIUS</orgName>
								<orgName type="laboratory" key="lab2">Centre for Scalable Data Access</orgName>
								<orgName type="institution">University of Oslo</orgName>
								<address>
									<settlement>Oslo</settlement>
									<country key="NO">Norway</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Adrian</forename><surname>Paschke</surname></persName>
							<email>adrian.paschke@fu-berlin.de</email>
							<affiliation key="aff2">
								<orgName type="department">AG Corporate Semantic Web</orgName>
								<orgName type="institution">Freie Universität Berlin</orgName>
								<address>
									<settlement>Berlin</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
							<affiliation key="aff4">
								<orgName type="department">Data Analytics Center</orgName>
								<orgName type="institution">Fraunhofer FOKUS</orgName>
								<address>
									<settlement>Berlin</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Scaling Scientific Knowledge Discovery with Neuro-Symbolic AI and Large Language Models</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">7527C57876DE35F96CB32A521590ADCF</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T18:46+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Neuro-Symbolic AI</term>
					<term>Knowledge Graph</term>
					<term>Large Language Model</term>
					<term>Retrieval-Augmented Generation (RAG)</term>
					<term>Systematic Literature Review</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The increasing amount of available research data leads to the need to scale scientific knowledge discovery, e.g., the conduction of systematic literature reviews (SLRs), to keep up with fast developments in research and further support decision-making in the industry. AI-based methods are gaining importance in these tasks and have been integrated into many SLR tools. Yet, several challenges are still open on applying especially neural methods on scientific knowledge discovery tasks. To address this, we evaluate various neural and neuro-symbolic scenarios on a specific generative writing task. While confirming existing concerns on pure Large Language Model (LLM) approaches for these tasks, we obtain a heterogeneous picture of Retrieval-Augmented Generation (RAG) approaches. The most promising candidate is a Knowledge Graph (KG) based context-enhanced LLM approach for Knowledge Discovery.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Recent AI approaches are drastically impacting solutions and the ways of working in several industries at an additional fast-ongoing development pace. Yet, at least two trends are expected to remain predictable: (i) the high, even increasing need for fast decision-making and (ii) the continuously increasing amount of available data to make decisions. This is highly reflected in the growing research field of data-driven decision-making.</p><p>Large language models (LLMs) are a novel generative AI approach that shows promising results on various industrial challenges, yet LLMs tend to encounter limitations on reliability <ref type="bibr" target="#b0">[1]</ref> and interpretability <ref type="bibr" target="#b1">[2]</ref>  <ref type="bibr" target="#b0">[1]</ref>. Fortunately, e.g., smart prompting techniques may "enhance the model's ability to explain their reasoning and justify their decision" <ref type="bibr" target="#b1">[2]</ref>. With context-enhanced prompts, LLMs can be more strongly guided toward suitable responses. The versatility and capability of LLMs mark a paradigm shift in how we interact with machines, making these interactions more intuitive and resembling humanlike conversations. However, a notable challenge with LLMs is their occasional tendency to produce information not rooted in reality or their training data, a phenomenon often termed "hallucinations" <ref type="bibr" target="#b2">[3]</ref>  <ref type="bibr" target="#b0">[1]</ref>. To mitigate these hallucinations, the concept of Retrieval Augmented Generation (RAG) has arisen as the ability of the LLM to analyze text with the capacity to retrieve relevant information from selected external sources; this enhances the accuracy and reliability of the produced answer.</p><p>On the other hand, neuro-symbolic AI, as a combination of neural and symbolic methods <ref type="bibr" target="#b3">[4]</ref>, positions itself as a promising candidate for industrial applications <ref type="bibr" target="#b4">[5]</ref>. One benefit of neuro-symbolic solutions includes the integration of domain knowledge <ref type="bibr" target="#b5">[6]</ref>, e.g., in the form of Knowledge Graphs (KGs) <ref type="bibr" target="#b0">[1]</ref>. Integrating KGs as a structured and symbolic knowledge representation into RAG-type applications offers a powerful approach to addressing the challenge of reducing the hallucinations <ref type="bibr" target="#b6">[7]</ref> by combining the ability of language models to analyze text with the capability to retrieve relevant information from external sources, such as knowledge bases.</p><p>Nowadays, it is virtually impossible to keep track of new research, considering the overload in scientific publications worldwide <ref type="bibr" target="#b7">[8]</ref>. Research needs to support the decision-making process at an industrial scale, meaning the engineering of scientific knowledge and discovery that comes with the necessity of analyzing a massive corpus of data. There are established methods in research that can be applied for systematic analyses of a large landscape of publications, such as a Systematic Literature Review (SLR) <ref type="bibr" target="#b8">[9]</ref>. Yet, SLRs are time-consuming if conducted manually. AI methods have shown to be effective for increasing efficiency, such as paper selection <ref type="bibr" target="#b9">[10]</ref>, yet recent research has not fully exploited these capabilities <ref type="bibr" target="#b9">[10]</ref>. Specifically, LLMs open up new steps to automate SLRs further with knowledge representation and smart prompting. While some open challenges in scientific knowledge discovery are addressed by AI-based techniques <ref type="bibr" target="#b9">[10]</ref>, neuro-symbolic approaches have not been explicitly assessed on their potential and limitations in this field.</p><p>Considering this potential, this paper identifies the benefits and limitations of different approaches for scientific knowledge discovery, specifically answering research questions of an SLR. We evaluate LLMbased and neuro-symbolic, specifically document-based RAG and RDF-KG-based context-enhanced LLM-based approaches. Additionally, a prompt engineering process was conducted based on different neuro-symbolic approaches drafted as systematic experimentation scenarios.</p><p>Moreover, this work tackles the missing transparency on proprietary SLR tools with AI support ([2] [10]); For this reason and unpredictability concerns, a GitHub repository<ref type="foot" target="#foot_0">1</ref> with the used system and user prompts was prepared including different specific scientific knowledge discovery questions and the respective answers.</p><p>The further parts of this paper are structured as follows: we analyze and discuss research on the status and open challenges of AI-supported SLRs in Section 2. We present our approach in Section 3. In Section 4, we first describe the different scenarios of our experiment. Second, we show the obtained results and analyze the benefits and limitations. After discussing open challenges on scaling scientific knowledge discovery with neuro-symbolic AI in Section 5, we conclude in Section 6 and point to the limitations of our work and future steps.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>This section shows relevant related work on scaling scientific knowledge discovery, with a focus on neuro-symbolic AI.</p><p>One of the most prominent LLM challenges is hallucination reduction. An ML-oriented method to solve this is fine-tuning, but this comes at a high cost in terms of time and effort <ref type="bibr" target="#b10">[11]</ref>. It is possible to develop a model that allows for the prediction of multiple tail or head entities for a given relation and entity, leveraging the relevant neighbors of the entities <ref type="bibr" target="#b11">[12]</ref>. This has resulted in improved efficiency and effectiveness of LLMs in utilizing KG information in specialized or personalized domains. However, both cases generate new challenges, such as increased costs due to the need for fine-tuning on LLMs, although it is significantly lower than other methods since very specific, compressed, and previously validated information is mapped. An additional challenge is the risk of information loss in the graphs due to the difficulty in leveraging the most relevant neighbors because of the large number of connections a node can have.</p><p>As one example of scientific knowledge discovery, SLRs have proven valuable. An SLR consists of three main phases: planning, conduction, and reporting. De la Torre-López et al. <ref type="bibr" target="#b9">[10]</ref> show in an SLR that most AI-based support in automating SLRs is on the conduction phase of SLRs, specifically the task of paper selection. Phase planning is semi-automated with traditional methods (see, e.g., <ref type="bibr" target="#b12">[13]</ref> on duplicate identification), and the reporting phase is commonly done manually. The authors see accordingly a gap in more research on AI-driven writing tasks <ref type="bibr" target="#b9">[10]</ref>.</p><p>Bolaños et al. reviewed AI opportunities and challenges for literature reviews <ref type="bibr" target="#b1">[2]</ref> by reviewing existing SLR tools. The authors stress the importance of the research direction on integrating advanced NLP technologies to replace possibly outdated methodologies in available SLR tools and the "promising research direction" of "the use of semantic technologies [...]" particularly knowledge graphs, to enhance the characterization and classification of research papers <ref type="bibr" target="#b1">[2]</ref>. An interesting work on integrating advanced NLP technologies by Jansen et al. employs LLMs in survey research <ref type="bibr" target="#b13">[14]</ref>. The authors see "potential advantages to using LLMs like ChatGPT for survey research to generate survey responses" and discuss potential issues such as bias and lack of contextual understanding of LLMs. Our work addresses the latter by evaluating neuro-symbolic approaches to knowledge injection.</p><p>Further work (e.g., <ref type="bibr" target="#b14">[15]</ref> [16]) shows research interest in this field, yet still lacks research on neurosymbolic, e.g., RAG and Graph RAG, Memory-based, to improve the reporting phase in scientific knowledge discovery.</p><p>Focused on the medical domain, Yun et al. <ref type="bibr" target="#b16">[17]</ref> summarizes that "further research is warranted for using LLMs for literature reviews in other domains as our study only focused on the task of writing medical systematic reviews." While van Dinter et al. <ref type="bibr" target="#b17">[18]</ref> extend the domain view in their work, the focus still remained on the medical and computer science domain, leading to no SLRs evaluated from the manufacturing domain.</p><p>In summary, the related work shows interest in the AI-support for scientific knowledge discovery. The exploration focuses on SLRs as a method and general medical or computer science as a domain.</p><p>To the best of our knowledge, no SLR has been conducted manually and then challenged against LLM capabilities in any way. Further, no AI-based support for SLRs started with a KG, but only on metadata of publications or texts containing the respective content of a publication. With our work, we address the previously mentioned gaps.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Building Neuro-Symbolic AI Frameworks for Scientific Knowledge Discovery</head><p>In this section, we describe the underlying neuro-symbolic approaches and the architectural pattern employed in our work's neuro-symbolic scenarios.</p><p>In order to address scalability in the realm of scientific knowledge discovery, we evaluate different approaches on the example of an SLR's generative writing task. In addition to the human and LLM-based responses to specific research questions, an evaluation of neuro-symbolic potentials and challenges is needed. In this section, we describe a document-based RAG approach and a framework for an RDF-KG-based context-enhanced LLM; these are the basis of the selected neuro-symbolic scenarios in our experiments. Lewis et al. <ref type="bibr" target="#b6">[7]</ref> introduce Retrieval-Augmented Generation (RAG) as the combination of "pre-trained, parametric-memory generation models with a non-parametric memory through a general-purpose finetuning approach". In our work, the RAG approach is based on an LLM for the parametric-memory model based on a folder of text files for the non-parametric memory, see Figure <ref type="figure" target="#fig_0">1</ref>. The LLM is executed in scenarios with different GPT models from OpenAI<ref type="foot" target="#foot_1">2</ref> .</p><p>The document base contains 49 text files of the final search corpus from a recently conducted SLR <ref type="bibr" target="#b19">[20]</ref>. Each text file was scrapped, and the text was extracted from the main publication website. With the selected document base, a Knowledge Graph construction process was performed using the extracted paper content and the paper metadata and a schema was assembled by leveraging existing ontologies such as BIBO <ref type="foot" target="#foot_2">3</ref> , SWRC <ref type="foot" target="#foot_3">4</ref> , ORKG <ref type="foot" target="#foot_4">5</ref> and others.</p><p>To test the RDF-KG-based context-enhanced LLM, see Figure <ref type="figure" target="#fig_0">1</ref>, the public API of OpenAI was employed, specifically on the GPT-4-turbo model. The KG includes entities from the 49 assessed publications, authors, venues, and identified research fields. The complete publication list (49) can be found in the GitHub repository <ref type="foot" target="#foot_5">6</ref> ; as well as the assembled schema and the fully populated KG.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Scaling Knowledge Discovery with Knowledge Graphs and Neuro-Symbolic AI</head><p>In this section, we describe the experimental framework 4.1 conducted, RAGs and RDF-KG-based context-enhanced LLMs. We conclude with the results of our experiments 4.2.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Experimental Configuration</head><p>The evaluation was centered on evaluating two approaches on LLMs, RAGs (1) and RDF-KG-based context-enhanced (2). The main goal is to scale scientific knowledge discovery as can be detailed in Figure <ref type="figure" target="#fig_1">2a</ref>. The performed evaluation was centered on the results of five research questions (a main research question and an additional four) drafted for the selected document base. With the scope of assessing the generative writing capabilities and knowledge discovery by leveraging research questions of an SLR. To increase comparability between the LLM with no knowledge and the RAG-based approach, GPT-3.5-turbo and GPT-4-turbo were employed in both scenarios, the scenario detail is listed in Table <ref type="table" target="#tab_0">1</ref>. The approach on an RDF-KG-based context-enhanced LLM is conducted only on GPT-4-turbo. The GPT-4-turbo serves as the basis for the evaluation across the neural and neuro-symbolic approaches. In each scenario, five steps are undertaken, each of them addressing the research questions (RQ) from <ref type="bibr" target="#b19">[20]</ref>: (1) Which role play neuro-symbolic AI approaches in knowledge graph construction for Smart Manufacturing? (Main RQ), ( <ref type="formula">2</ref>) What are publication characteristics on neuro-symbolic AI in knowledge graph construction for Smart Manufacturing? (RQ1), <ref type="bibr" target="#b2">(3)</ref> In which steps of the knowledge graph construction process are neuro-symbolic AI methods applied in Smart Manufacturing? (RQ2), ( <ref type="formula">4</ref>) What are common neuro-symbolic AI architectures in knowledge graph construction? (RQ3), and (5) For which manufacturing use cases are knowledge graphs constructed with neuro-symbolic AI? (RQ4).</p><p>Considering that, the scenario 5 holds the model token constraint. Hence, the KG containing 49 documents is split into five SubKGs with a separate context, each, and asked to merge the five responses.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Evaluation</head><p>In this Section, we present the evaluation approach and the analysis of the results. The underlying framework of all scenarios is shown in Figure <ref type="figure" target="#fig_1">2</ref>. The conducted LLM-based and neuro-symbolic scenarios are listed as follows:</p><p>1. LLM only: No further data provided. Scenarios: S1 and S3 2. Document-based RAG Files contain the retrieved text retrieved from the manually selected 49 publications. Scenarios: S2 and S5 3. RDF-KG-based context-enhanced LLM An RDF KG is provided as the context in addition to a system prompt and user prompt to LLM. Scenario: S5</p><p>Considering the lack of gold standards for evaluating an LLM response, an evaluation model was selected that reflects on the known weaknesses of LLMs and yet might not cover all requirements for answering a scientific research question. The selected evaluation criteria were adapted from <ref type="bibr" target="#b13">[14]</ref>, each with a score from 1 to 5, on the scenarios, see Table <ref type="table" target="#tab_1">2</ref>. We show our results in Figure <ref type="figure" target="#fig_2">3</ref>. Based on the results, we see that scaling scientific knowledge with LLMs and improving this approach with RAGs is at an interesting yet not applicable level. On the one hand, the responses vary significantly across scenarios and research questions. On the other hand, scientific criteria are not met as hallucinations occur, and references are handled unreliably. In contrast, we obtain promising results from the RDF-KG-based context-enhanced LLM. We discuss these specific points in our next section 5.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Discussion</head><p>Overall, the responses across the different scenarios show a wide range from disappointing to promising answers. Some responses (e.g. S2-RQ1, S2-RQ2) do not attempt an answer although the relevant context is provided via text chunks and the LLM is trained on general knowledge to at least return a more complex answer. On the contrary, one of the best answers (S4-RQ3) includes an outlook on evolutionary knowledge which is not explicitly requested by the prompts. Underneath the variety, at least two common flaws can be identified, that apply to all scenarios: (i) missing (references to) definitions and (ii) missing tables, charts or figures to illustrate the statements.</p><p>We see on LLM-based and RAG-based scenarios severe challenges. With consistent system prompts and varying research questions, the responses vary unexpectedly on several factors: (i) the reference list (S1-RQ1 and S1-RQ4 contain no references at all), (ii) in-text citations (none provided by e.g. S4-RQ4) and, (iii) whether the provided references are not made up (e.g. S1-RQ2 returns a template for references with no actual values included). S2-mainRQ quotes directly from a provided source, yet omits quote indication and citation. The RDF-KG-based context-enhanced LLM is a promising direction, yet it also needs further improvement to ensure responses on a scientific level.</p><p>Neuro-symbolic approaches are one way of reducing hallucinations in LLMs. Our results show a good performance of S1 and S4, yet a disappointing performance of S2. The RAG-based approach with a GPT-3.5-turbo model (S2) describes neuro-symbolic AI as a combination of "merits of statistical learning with semantical knowledge and reasoning", omitting the neural perspective, which is crucial.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusion</head><p>In summary, our work shows a promising neuro-symbolic approach of an RDF-KG-based contextenhanced LLM for scaling scientific knowledge discovery. One further benefit of this approach is the foundation for handling evolutionary knowledge. Via the KG the knowledge can be updated and made available for future scientific queries to the LLM with minimal effort.</p><p>Our results show a need for caution when working with RAG-based approaches. Based on the overall results, we see that scaling scientific knowledge with LLMs and improving this approach with simple RAGs is not at an applicable level. On the other hand, scientific criteria are not met as hallucinations occur, and references are treated unreliably. RDF-KG-based context-enhanced LLMs appear to be better suited for this task based on our results, yet also require further improvements before being applicable.</p><p>Our experiment sheds light on scientific knowledge discovery from research data from the manufacturing domain yet is applicable to SLRs across industries.</p><p>Our work does not cover the whole area of scientific knowledge discovery, omitting, e.g., paper selection tasks in SLR or expert interviews as approaches.</p><p>Lastly, token processing is a costly parameter. As a research paper may contain about ten thousand tokens, processing a large data corpus quickly runs into a token issue. Smart prompting and suitable neuro-symbolic architectures are needed to address this.</p><p>In future work, we plan to evaluate different parameter configurations, especially temperature and number of message sources on RDF-KG-based context-enhanced LLMs.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure1: Neuro-Symbolic AI Enhancement approach for ingesting Knowledge Graphs into the LLM; NeuroSymbolic AI Architecture {d-K-s-M-d}, using the boxology notation<ref type="bibr" target="#b18">[19]</ref> </figDesc><graphic coords="3,117.13,570.50,361.02,75.03" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: High-Level Architecture View</figDesc><graphic coords="4,72.00,373.44,284.30,79.07" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Results on scenarios for scaling scientific knowledge discovery</figDesc><graphic coords="6,162.25,65.61,270.77,164.18" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc></figDesc><table><row><cell cols="2">Model Information</cell><cell></cell></row><row><cell>Scenario</cell><cell>Model</cell><cell>Setup</cell></row><row><cell>S1</cell><cell cols="2">gpt-3.5-turbo temperature 0.5; zero shot</cell></row><row><cell>S2</cell><cell cols="2">gpt-3.5-turbo temperature 0.5; zero shot; 10 message sources</cell></row><row><cell>S3</cell><cell>gpt-4-turbo</cell><cell>temperature 0.5; zero shot</cell></row><row><cell>S4</cell><cell>gpt-4</cell><cell>temperature 0.5; zero shot; 10 message sources</cell></row><row><cell>S5</cell><cell>gpt-4-turbo</cell><cell>temperature 0.5; zero shot; five times 9-10 message sources in context, then</cell></row><row><cell></cell><cell></cell><cell>the summary of 5 responses in additional prompt</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>Evaluation Criteria.</figDesc><table><row><cell>Id</cell><cell>Criterion</cell><cell>Description</cell><cell>Score</cell></row><row><cell></cell><cell>Name</cell><cell></cell><cell></cell></row><row><cell cols="2">C1 Domain-specific</cell><cell>Use of neuro-symbolic-and manufac-</cell><cell>1 (specific vocabulary not used or used</cell></row><row><cell></cell><cell>vocabulary</cell><cell>turing domain-specific vocabulary</cell><cell>in the wrong context) to 5 (specific vo-</cell></row><row><cell></cell><cell></cell><cell></cell><cell>cabulary correctly employed)</cell></row><row><cell cols="2">C2 Contextual</cell><cell>Degree of "nonsensical or inappropriate</cell><cell>1 (completely inappropriate response)</cell></row><row><cell></cell><cell>understanding</cell><cell>responses"</cell><cell>to 5 (appropriate response)</cell></row><row><cell></cell><cell>(hallucination)</cell><cell></cell><cell></cell></row><row><cell cols="2">C3 Compelling mis-</cell><cell>Share of "highly convincing text that is</cell><cell>1 (at least 50% of response is factu-</cell></row><row><cell></cell><cell>information</cell><cell>factually wrong"</cell><cell>ally wrong) to 5 (response is completely</cell></row><row><cell></cell><cell></cell><cell></cell><cell>true)</cell></row><row><cell cols="2">C4 Lack of trans-</cell><cell>Degree of increasing transparency</cell><cell>1 (no or ineligible sources provided) to</cell></row><row><cell></cell><cell>parency</cell><cell>caused by "disclosing LLM participation</cell><cell>5 (all relevant sources provided and all</cell></row><row><cell></cell><cell></cell><cell>and intractability of LLM training and</cell><cell>cited in-text)</cell></row><row><cell></cell><cell></cell><cell>the text-generation process"</cell><cell></cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">GitHub Repository -https://github.com/d1egoprog/KG-SLR4LLM</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">https://platform.openai.com</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">Namespace: http://purl.org/ontology/bibo/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">Namespace: http://swrc.ontoware.org/ontology#</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_4">Namespace: http://orkg.org/core</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_5">https://github.com/wAIlma/SLR-NeSyAI-KGC-I40/data</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgements</head><p>We want to thank Valentin Knappich and Cem Akdag for their helpful support and insights during our work.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Unifying large language models and knowledge graphs: A roadmap</title>
		<author>
			<persName><forename type="first">S</forename><surname>Pan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Luo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wu</surname></persName>
		</author>
		<idno type="DOI">10.1109/TKDE.2024.3352100</idno>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Knowledge and Data Engineering</title>
		<imprint>
			<biblScope unit="volume">36</biblScope>
			<biblScope unit="page" from="3580" to="3599" />
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<author>
			<persName><forename type="first">F</forename><surname>Bolanos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Salatino</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Osborne</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Motta</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2402.08565</idno>
		<title level="m">Artificial Intelligence for Literature Reviews: Opportunities and Challenges</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">GPT-4 is here: what scientists think</title>
		<author>
			<persName><forename type="first">K</forename><surname>Sanderson</surname></persName>
		</author>
		<idno type="DOI">10.1038/d41586-023-00816-5</idno>
	</analytic>
	<monogr>
		<title level="j">Nature</title>
		<imprint>
			<biblScope unit="volume">615</biblScope>
			<biblScope unit="page">773</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Neuro-symbolic approaches in artificial intelligence</title>
		<author>
			<persName><forename type="first">P</forename><surname>Hitzler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Eberhart</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ebrahimi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">K</forename><surname>Sarker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zhou</surname></persName>
		</author>
		<idno type="DOI">10.1093/nsr/nwac035</idno>
	</analytic>
	<monogr>
		<title level="j">National Science Review</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="page">35</biblScope>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Addressing the Scalability Bottleneck of Semantic Technologies at Bosch</title>
		<author>
			<persName><forename type="first">D</forename><surname>Rincon-Yanez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">H</forename><surname>Gad-Elrab</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Stepanova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">T</forename><surname>Tran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Chu Xuan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Karlamov</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-031-43458-7_33</idno>
	</analytic>
	<monogr>
		<title level="j">LNCS</title>
		<imprint>
			<biblScope unit="volume">13998</biblScope>
			<biblScope unit="page" from="177" to="181" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">A survey on neural-symbolic learning systems</title>
		<author>
			<persName><forename type="first">D</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Pan</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.neunet.2023.06.028</idno>
	</analytic>
	<monogr>
		<title level="j">Neural Networks</title>
		<imprint>
			<biblScope unit="volume">166</biblScope>
			<biblScope unit="page" from="105" to="126" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Retrieval-augmented generation for knowledge-intensive NLP tasks</title>
		<author>
			<persName><forename type="first">P</forename><surname>Lewis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Perez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Piktus</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Petroni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Karpukhin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Küttler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lewis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">T</forename><surname>Yih</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Rocktäschel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Riedel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Kiela</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in Neural Information Processing Systems</title>
		<imprint>
			<date type="published" when="2020-12">December. 2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Scientific literature: Information overload</title>
		<author>
			<persName><forename type="first">E</forename><surname>Landhuis</surname></persName>
		</author>
		<idno type="DOI">10.1038/nj7612-457a</idno>
	</analytic>
	<monogr>
		<title level="j">Nature</title>
		<imprint>
			<biblScope unit="volume">535</biblScope>
			<biblScope unit="page" from="457" to="458" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Systematic literature reviews in software engineering -a systematic literature review</title>
		<author>
			<persName><forename type="first">B</forename><surname>Kitchenham</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Pearl Brereton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Budgen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Turner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Bailey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Linkman</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.infsof.2008.09.009</idno>
	</analytic>
	<monogr>
		<title level="j">Information and Software Technology</title>
		<imprint>
			<biblScope unit="volume">51</biblScope>
			<biblScope unit="page" from="7" to="15" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Artificial intelligence to automate the systematic review of scientific literature</title>
		<author>
			<persName><forename type="first">J</forename><surname>De La Torre-López</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ramírez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">R</forename><surname>Romero</surname></persName>
		</author>
		<idno type="DOI">10.1007/s00607-023-01181-x</idno>
	</analytic>
	<monogr>
		<title level="j">Computing</title>
		<imprint>
			<biblScope unit="volume">105</biblScope>
			<biblScope unit="page" from="2171" to="2194" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">On the Origin of Hallucinations in Conversational Models: Is it the Datasets or the Models?</title>
		<author>
			<persName><forename type="first">N</forename><surname>Dziri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Milton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Zaiane</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Reddy</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2022.naacl-main.387</idno>
	</analytic>
	<monogr>
		<title level="m">NAACL 2022 -2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, Association for Computational Linguistics</title>
				<meeting><address><addrLine>Stroudsburg, PA, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="5271" to="5285" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<author>
			<persName><forename type="first">Y.-H</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H.-T</forename><surname>Shieh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C.-Y</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K.-T</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H.-C</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-L</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y.-S</forename><surname>Lin</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2405.12656</idno>
		<title level="m">Retrieval-Augmented Language Model for Extreme Multi-Label Knowledge Graph Link Prediction</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">How-to conduct a systematic literature review: A quick guide for computer science research</title>
		<author>
			<persName><forename type="first">A</forename><surname>Carrera-Rivera</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Ochoa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Larrinaga</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Lasa</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.mex.2022.101895</idno>
	</analytic>
	<monogr>
		<title level="j">MethodsX</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="page">101895</biblScope>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Employing large language models in survey research</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">J</forename><surname>Jansen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S.-G</forename><surname>Jung</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Salminen</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.nlp.2023.100020</idno>
	</analytic>
	<monogr>
		<title level="j">Natural Language Processing Journal</title>
		<imprint>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="page">100020</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">M</forename><surname>Sami</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Rasheed</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K.-K</forename><surname>Kemell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Waseem</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Kilamo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Saari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">N</forename><surname>Duc</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Systä</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Abrahamsson</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2403.08399</idno>
		<title level="m">System for systematic literature review using multiple AI agents: Concept and an empirical evaluation</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">ChatGPT and a new academic reality: Artificial Intelligence-written research papers and the ethics of the large language models in scholarly publishing</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">D</forename><surname>Lund</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">R</forename><surname>Mannuru</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Nie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Shimray</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Wang</surname></persName>
		</author>
		<idno type="DOI">10.1002/asi.24750</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of the Association for Information Science and Technology</title>
		<imprint>
			<biblScope unit="volume">74</biblScope>
			<biblScope unit="page" from="570" to="581" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Appraising the Potential Uses and Harms of Large Language Models for Medical Systematic Reviews</title>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">S</forename><surname>Yun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">A</forename><surname>Trikalinos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">J</forename><surname>Marshall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">C</forename><surname>Wallace</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2023.emnlp-main.626</idno>
	</analytic>
	<monogr>
		<title level="m">EMNLP 2023 -2023 Conference on Empirical Methods in Natural Language Processing, Proceedings, Association for Computational Linguistics</title>
				<meeting><address><addrLine>Stroudsburg, PA, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="10122" to="10139" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Automation of systematic literature reviews: A systematic literature review</title>
		<author>
			<persName><forename type="first">R</forename><surname>Van Dinter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Tekinerdogan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename></persName>
		</author>
		<idno type="DOI">10.1016/j.infsof.2021.106589</idno>
	</analytic>
	<monogr>
		<title level="j">Information and Software Technology</title>
		<imprint>
			<biblScope unit="volume">136</biblScope>
			<biblScope unit="page">106589</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">A Boxology of Design Patterns forHybrid Learning and Reasoning Systems</title>
		<author>
			<persName><forename type="first">F</forename><surname>Van Harmelen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Teije</surname></persName>
		</author>
		<idno type="DOI">10.13052/jwe1540-9589.18133</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Web Engineering</title>
		<imprint>
			<biblScope unit="volume">18</biblScope>
			<biblScope unit="page" from="97" to="124" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Systematic Literature Review on Neuro-Symbolic AI in Knowledge Graph Construction for Manufacturing</title>
		<author>
			<persName><forename type="first">W</forename><surname>Schmidt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Rincon-Yanez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Kharlamov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Paschke</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Semantic Web Journal TBD</title>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
