<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>ORKG ASK: a Neuro-symbolic Scholarly Search and Exploration System</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Allard Oelen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mohamad Yaser Jaradeh</string-name>
          <email>jaradeh@l3s.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sören Auer</string-name>
          <email>auer@tib.eu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>TIB - Leibniz Information Centre for Science</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Technology</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hannover</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Germany</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Available online via https://ask.orkg.org</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>L3S Research Center, Leibniz University of Hannover</institution>
          ,
          <addr-line>Hannover</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>SEMANTiCS 2024 EU: 20th International Conference on Semantic Systems</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Purpose: Finding scholarly articles is a time-consuming and cumbersome activity, yet crucial for conducting science. Due to the growing number of scholarly articles, new scholarly search systems are needed to efectively assist researchers in finding relevant literature. Methodology: We take a neuro-symbolic approach to scholarly search and exploration by leveraging state-of-the-art components, including semantic search, Large Language Models (LLMs), and Knowledge Graphs (KGs). The semantic search component composes a set of relevant articles. From this set of articles, information is extracted and presented to the user. Findings: The presented system, called ORKG ASK (Assistant for Scientific Knowledge), provides a production-ready search and exploration system. Our preliminary evaluation indicates that our proposed approach is indeed suitable for the task of scholarly information retrieval. Value: With ORKG ASK, we present a next-generation scholarly search and exploration system and make it available online. Additionally, the system components are open source with a permissive license.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>research question, which is entered as a search query. Second, an LLM is leveraged to answer
the research question by prompting with the context of the set of relevant articles. In addition
to answers to the research question, a set of properties is extracted, among others, a summary,
materials, methods, and results of the contributions described in the articles. Third, KGs are
used to provide more fine-grained information extraction as well as for curating extraction
results. This includes results filtering based on mentioned concepts in scholarly articles.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <p>There are various established, large, and multidisciplinary scholarly search systems, among
others, Google Scholar, Semantic Scholar, and Scopus [2]. Other systems, such as PubMed
and ACM Digital Library, are domain-specific. These search systems take a similar approach
where articles are ranked based on relevance, but where users have to manually extract relevant
information from articles. A new approach provides active support via automatic information
extraction by systems such as Elicit, Consensus, and Scispace [3]. These systems are not
opensource, leaving details about their approach, such as the model and dataset, to be unknown, in
turn making results harder to reproduce. This makes such systems less suitable for systematic
literature reviews where reproducibility is a key aspect of the approach [4].</p>
      <p>To extract knowledge from a large set of scholarly documents, a Retrieval-Augmented
Generation (RAG) approach can be used to provide the LLM with relevant context [5]. The Retrieval
aspect retrieves a set of documents, commonly done using vector databases. The Augmented
aspect, augments the user query with the found context. Finally, the Generation aspect creates
the response. To our knowledge, the previously mentioned AI-supported scholarly search
systems use this approach and are thus similar to the approach we propose with ORKG ASK.
The Data Reuse feature supports citing articles in APA, Vancouver, Harvard, citation styles,
and exports to BibTeX, RIS, and CLS-JSON (node 7). The export button is displayed in node 8.
Furthermore, there is an option to export the entire search result table to CSV and ORKG [6]
CSV (node 6). Finally, the Entity Linking feature links entities in article abstracts to their
respective DBpedia entries. This provides the ability to filter articles based on semantically
identical concepts, providing an additional means to more targeted information retrieval.
3.2. System Workflow
Figure 2 depicts the system workflow. It starts with a user asking a research question. A set of
relevant documents for this question is retrieved. The neural search component uses vectorized
representations of the query and articles via the Nomic embeddings model to retrieve a set of
relevant documents. Optionally, the search space can be narrowed down by filtering specific
metadata or linked entities. Qdrant3 is used as a vector and data store. The symbolic component
processed article abstracts ofline and stored these linked entities in the data store. The entity
linking is conducted using DBpedia Spotlight [7]. The CORE dataset [8], containing article
metadata, abstracts, and full-text (in the case of open-access articles), is used as a data source for
the vector store. Next, knowledge is extracted using an LLM from the top n articles, resembling
the RAG approach. Currently, we use the Mistral Instruct 7B v0.2 model for the information
extraction. To reduce system resource usage, the LLM is only prompted if the answer does not
yet exist in the cache. Finally, the information is presented to the user.</p>
    </sec>
    <sec id="sec-3">
      <title>4. Evaluation</title>
      <p>As a preliminary system evaluation, we aim to assess the usability of the system. We did
this using a 5-point user satisfaction assessment and the Usability Metric for User Experience
lite (UMUX-lite) [9] evaluation. Participants were recruited via the ORKG ASK production
system, via a non-intrusive tooltip asking real-world system users for their opinion. To keep
participation eforts as low as possible, no participant demographics were requested from users.
In total, 30 participants took part in this evaluation. As Figure 3 shows, users are relatively
satisfied with ORKG ASK. The UMUX-Lite evaluation displayed in Figure 4 results in an overall
score of 65.2. As the individual results show, most participants agree that ORKG ASK is easy
to use, but the system does not always meet their requirements. This could be explained by
users’ search behavior, as logs of asked questions revealed that not all questions are valid and
answerable, leaving the user’s specific search requirement unmet. Further evaluation is needed
to determine what is needed to understand the user’s expectations and needs better.</p>
    </sec>
    <sec id="sec-4">
      <title>5. Conclusion</title>
      <p>The introduction of ORKG ASK serves as a starting point for a neuro-symbolic approach to
ifnding and exploring scholarly articles. The preliminary evaluation indicates that our approach
is easy to use. In the future, we plan to extend the system by providing provenance information
to highlight the source of extracted information. Furthermore, we plan to extend the KG part
significantly, growing the KG automatically while the system is being used.
[1] E. Landhuis, Scientific literature: Information overload, Nature 535 (2016) 457–458.
[2] M. Gusenbauer, N. R. Haddaway, Which academic search systems are suitable for systematic
reviews or meta-analyses? Evaluating retrieval qualities of Google Scholar, PubMed, and 26
other resources, Research Synthesis Methods 11 (2020) 181–217. doi:10.1002/jrsm.1378.
[3] F. Bolanos, A. Salatino, F. Osborne, E. Motta, Artificial Intelligence for
Literature Reviews: Opportunities and Challenges, arXiv preprint arXiv:2402.08565 (2024).
arXiv:2402.08565.
[4] A. MacFarlane, T. Russell-Rose, F. Shokraneh, Search strategy formulation for systematic
reviews: Issues, challenges and opportunities, Intelligent Systems with Applications 15
(2022) 200091. doi:10.1016/j.iswa.2022.200091.
[5] P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t.</p>
      <p>Yih, T. Rocktäschel, et al., Retrieval-augmented generation for knowledge-intensive nlp
tasks, Advances in Neural Information Processing Systems 33 (2020) 9459–9474.
[6] S. Auer, A. Oelen, M. Haris, M. Stocker, J. D’Souza, K. E. Farfar, L. Vogt, M. Prinz, V. Wiens,
M. Y. Jaradeh, Improving access to scientific literature with knowledge graphs, Bibliothek
Forschung und Praxis 44 (2020) 516–529.
[7] P. N. Mendes, M. Jakob, A. García-Silva, C. Bizer, DBpedia spotlight: Shedding light on the
web of documents, in: 7th International Conference on Semantic Systems, 2011, pp. 1–8.
[8] P. Knoth, D. Herrmannova, M. Cancellieri, L. Anastasiou, N. Pontika, S. Pearce, B. Gyawali,
D. Pride, CORE: A global aggregation service for open access papers, Nature Scientific
Data 10 (2023) 366.
[9] J. R. Lewis, B. S. Utesch, D. E. Maher, UMUX-LITE: When there’s no time for the SUS, in:
SIGCHI Conference on Human Factors in Computing Systems, 2013, pp. 2099–2102.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>