<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">ORKG ASK: a Neuro-symbolic Scholarly Search and Exploration System</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Allard</forename><surname>Oelen</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">TIB -Leibniz Information Centre for Science and Technology</orgName>
								<address>
									<settlement>Hannover</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Mohamad</forename><surname>Yaser</surname></persName>
							<affiliation key="aff1">
								<orgName type="department">L3S Research Center</orgName>
								<orgName type="institution">Leibniz University of Hannover</orgName>
								<address>
									<settlement>Hannover</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Sören</forename><surname>Auer</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">TIB -Leibniz Information Centre for Science and Technology</orgName>
								<address>
									<settlement>Hannover</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department">L3S Research Center</orgName>
								<orgName type="institution">Leibniz University of Hannover</orgName>
								<address>
									<settlement>Hannover</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">ORKG ASK: a Neuro-symbolic Scholarly Search and Exploration System</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">C6A0E0A5106E0752552415399A5C1B01</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T18:46+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Neuro-symbolic AI</term>
					<term>Large Language Models</term>
					<term>Scholarly Knowledge Graphs</term>
					<term>Scholarly Search System</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Purpose: Finding scholarly articles is a time-consuming and cumbersome activity, yet crucial for conducting science. Due to the growing number of scholarly articles, new scholarly search systems are needed to effectively assist researchers in finding relevant literature. Methodology: We take a neuro-symbolic approach to scholarly search and exploration by leveraging state-of-the-art components, including semantic search, Large Language Models (LLMs), and Knowledge Graphs (KGs). The semantic search component composes a set of relevant articles. From this set of articles, information is extracted and presented to the user. Findings: The presented system, called ORKG ASK (Assistant for Scientific Knowledge), provides a production-ready search and exploration system. Our preliminary evaluation indicates that our proposed approach is indeed suitable for the task of scholarly information retrieval. Value: With ORKG ASK, we present a next-generation scholarly search and exploration system and make it available online. Additionally, the system components are open source with a permissive license.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Finding scholarly articles and exploring the body of scholarly literature consumes a significant share of a researcher's time. Due to the growing number of scholarly articles, this issue only becomes more apparent <ref type="bibr" target="#b0">[1]</ref>. Current scholarly search systems passively assist users with their information needs by providing a list of relevant articles. If instead active assistance were provided, the users' information needs, such as a research question, would be answered for them. We present ORKG ASK (Assistant for Scientific Knowledge), a new generation scholarly search and exploration system 1 . ORKG ASK helps researchers find relevant literature and automatically extract knowledge from the retrieved literature, actively supporting researchers with their information needs. The approach consists of three main components: 1) Semantic Search, 2) a Large Language Model (LLM), and 3) Knowledge Graphs (KGs). First, the semantic search addresses the previously discussed challenge of retrieving articles based on their relevance to a specific information need. In ORKG ASK users can formulate their information need as a research question, which is entered as a search query. Second, an LLM is leveraged to answer the research question by prompting with the context of the set of relevant articles. In addition to answers to the research question, a set of properties is extracted, among others, a summary, materials, methods, and results of the contributions described in the articles. Third, KGs are used to provide more fine-grained information extraction as well as for curating extraction results. This includes results filtering based on mentioned concepts in scholarly articles.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Background</head><p>There are various established, large, and multidisciplinary scholarly search systems, among others, Google Scholar, Semantic Scholar, and Scopus <ref type="bibr" target="#b1">[2]</ref>. Other systems, such as PubMed and ACM Digital Library, are domain-specific. These search systems take a similar approach where articles are ranked based on relevance, but where users have to manually extract relevant information from articles. A new approach provides active support via automatic information extraction by systems such as Elicit, Consensus, and Scispace <ref type="bibr" target="#b2">[3]</ref>. These systems are not opensource, leaving details about their approach, such as the model and dataset, to be unknown, in turn making results harder to reproduce. This makes such systems less suitable for systematic literature reviews where reproducibility is a key aspect of the approach <ref type="bibr" target="#b3">[4]</ref>.</p><p>To extract knowledge from a large set of scholarly documents, a Retrieval-Augmented Generation (RAG) approach can be used to provide the LLM with relevant context <ref type="bibr" target="#b4">[5]</ref>. The Retrieval aspect retrieves a set of documents, commonly done using vector databases. The Augmented aspect, augments the user query with the found context. Finally, the Generation aspect creates the response. To our knowledge, the previously mentioned AI-supported scholarly search systems use this approach and are thus similar to the approach we propose with ORKG ASK. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">System Overview</head><p>We now discuss the ORKG ASK system in more detail. The service is designed and developed in such a way that it provides a solid foundation for a sustainable service. Additionally, we focus on accessibility by providing a dark mode (for low-light conditions), a responsive interface (for mobile usage, or high zoom levels for visually impaired users), and implementing ARIA accessibility attributes where The ORKG ASK code base is published as open source under an MIT license and available online. <ref type="foot" target="#foot_0">2</ref> Figure <ref type="figure" target="#fig_0">1</ref> depicts the design of the search result page for a specific research question.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">User-Oriented Features</head><p>The Question Answering feature as depicted in Node 1 in Figure <ref type="figure" target="#fig_0">1</ref> illustrates the input field for the research question. Nodes 2 and 4 present the answers to the question. Node 2 shows a synthesized answer of the first five displayed results. The Information Extraction feature extracts additional information from an article (node 4). There are several default columns displayed, but users can customize the extracted information to their needs (node 3). The Filtering feature enables users to filter articles based on user-provided criteria (node 5). This includes the ability to filter based on year, language, words that appear in the title or abstract, the number of citations, author names, etc. The Bibliography Managing featured called "My Library" provides a bibliography manager where users can store and curate a list of articles.</p><p>Articles are added by clicking on the bookmark icon in the interface (node 8) or added manually via the My Library page (via DOI, title, or BibTeX). Articles from My Library can be manually added to a search query, which prepends the manually selected articles to the search results. The Data Reuse feature supports citing articles in APA, Vancouver, Harvard, citation styles, and exports to BibTeX, RIS, and CLS-JSON (node 7). The export button is displayed in node 8. Furthermore, there is an option to export the entire search result table to CSV and ORKG <ref type="bibr" target="#b5">[6]</ref> CSV (node 6). Finally, the Entity Linking feature links entities in article abstracts to their respective DBpedia entries. This provides the ability to filter articles based on semantically identical concepts, providing an additional means to more targeted information retrieval.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">System Workflow</head><p>Figure <ref type="figure" target="#fig_1">2</ref> depicts the system workflow. It starts with a user asking a research question. A set of relevant documents for this question is retrieved. The neural search component uses vectorized representations of the query and articles via the Nomic embeddings model to retrieve a set of relevant documents. Optionally, the search space can be narrowed down by filtering specific metadata or linked entities. Qdrant<ref type="foot" target="#foot_1">3</ref> is used as a vector and data store. The symbolic component processed article abstracts offline and stored these linked entities in the data store. The entity linking is conducted using DBpedia Spotlight <ref type="bibr" target="#b6">[7]</ref>. The CORE dataset <ref type="bibr" target="#b7">[8]</ref>, containing article metadata, abstracts, and full-text (in the case of open-access articles), is used as a data source for the vector store. Next, knowledge is extracted using an LLM from the top n articles, resembling the RAG approach. Currently, we use the Mistral Instruct 7B v0.2 model for the information extraction. To reduce system resource usage, the LLM is only prompted if the answer does not yet exist in the cache. Finally, the information is presented to the user.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Evaluation</head><p>As a preliminary system evaluation, we aim to assess the usability of the system. We did this using a 5-point user satisfaction assessment and the Usability Metric for User Experience lite (UMUX-lite) <ref type="bibr" target="#b8">[9]</ref> evaluation. Participants were recruited via the ORKG ASK production system, via a non-intrusive tooltip asking real-world system users for their opinion. To keep participation efforts as low as possible, no participant demographics were requested from users. In total, 30 participants took part in this evaluation. As Figure <ref type="figure" target="#fig_2">3</ref> shows, users are relatively satisfied with ORKG ASK. The UMUX-Lite evaluation displayed in Figure <ref type="figure" target="#fig_3">4</ref> results in an overall score of 65.2. As the individual results show, most participants agree that ORKG ASK is easy to use, but the system does not always meet their requirements. This could be explained by users' search behavior, as logs of asked questions revealed that not all questions are valid and answerable, leaving the user's specific search requirement unmet. Further evaluation is needed to determine what is needed to understand the user's expectations and needs better. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion</head><p>The introduction of ORKG ASK serves as a starting point for a neuro-symbolic approach to finding and exploring scholarly articles. The preliminary evaluation indicates that our approach is easy to use. In the future, we plan to extend the system by providing provenance information to highlight the source of extracted information. Furthermore, we plan to extend the KG part significantly, growing the KG automatically while the system is being used.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Design of the search result page of the ORKG ASK application.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: ORKG ASK system workflow integrating neuro-symbolic components.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Results for user satisfaction evaluation indicating relatively satisfied users.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Results for UMUX-Lite evaluation with a total score of 65.2.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0"><head></head><label></label><figDesc></figDesc><graphic coords="2,99.71,84.19,395.85,216.68" type="bitmap" /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_0">https://gitlab.com/TIBHannover/orkg/orkg-ask</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_1">https://qdrant.tech</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Scientific literature: Information overload</title>
		<author>
			<persName><forename type="first">E</forename><surname>Landhuis</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nature</title>
		<imprint>
			<biblScope unit="volume">535</biblScope>
			<biblScope unit="page" from="457" to="458" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Which academic search systems are suitable for systematic reviews or meta-analyses? Evaluating retrieval qualities of Google Scholar</title>
		<author>
			<persName><forename type="first">M</forename><surname>Gusenbauer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">R</forename><surname>Haddaway</surname></persName>
		</author>
		<idno type="DOI">10.1002/jrsm.1378</idno>
	</analytic>
	<monogr>
		<title level="j">Research Synthesis Methods</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="page" from="181" to="217" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note>PubMed</note>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<author>
			<persName><forename type="first">F</forename><surname>Bolanos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Salatino</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Osborne</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Motta</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2402.08565</idno>
		<idno>arXiv:2402.08565</idno>
		<title level="m">Artificial Intelligence for Literature Reviews: Opportunities and Challenges</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Search strategy formulation for systematic reviews: Issues, challenges and opportunities</title>
		<author>
			<persName><forename type="first">A</forename><surname>Macfarlane</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Russell-Rose</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Shokraneh</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.iswa.2022.200091</idno>
	</analytic>
	<monogr>
		<title level="j">Intelligent Systems with Applications</title>
		<imprint>
			<biblScope unit="volume">15</biblScope>
			<biblScope unit="page">200091</biblScope>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Retrieval-augmented generation for knowledge-intensive nlp tasks</title>
		<author>
			<persName><forename type="first">P</forename><surname>Lewis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Perez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Piktus</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Petroni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Karpukhin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Küttler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lewis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>-T. Yih</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Rocktäschel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in Neural Information Processing Systems</title>
		<imprint>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="page" from="9459" to="9474" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Improving access to scientific literature with knowledge graphs</title>
		<author>
			<persName><forename type="first">S</forename><surname>Auer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Oelen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Haris</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Stocker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Souza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">E</forename><surname>Farfar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Vogt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Prinz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Wiens</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">Y</forename><surname>Jaradeh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Bibliothek Forschung und Praxis</title>
		<imprint>
			<biblScope unit="volume">44</biblScope>
			<biblScope unit="page" from="516" to="529" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">DBpedia spotlight: Shedding light on the web of documents</title>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">N</forename><surname>Mendes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Jakob</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>García-Silva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Bizer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">7th International Conference on Semantic Systems</title>
				<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="1" to="8" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">CORE: A global aggregation service for open access papers</title>
		<author>
			<persName><forename type="first">P</forename><surname>Knoth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Herrmannova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Cancellieri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Anastasiou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Pontika</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Pearce</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Gyawali</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Pride</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nature Scientific Data</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="page">366</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">UMUX-LITE: When there&apos;s no time for the SUS</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">R</forename><surname>Lewis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">S</forename><surname>Utesch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">E</forename><surname>Maher</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">SIGCHI Conference on Human Factors in Computing Systems</title>
				<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="2099" to="2102" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
