<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Joint Proceedings of BIR</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Enhancing Research Information Systems with Identification of Domain Experts</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Gautam Kishore Shahi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oliver Hummel</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Applied Sciences</institution>
          ,
          <addr-line>Mannheim</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>14</volume>
      <fpage>0000</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>Research organisations and their research outputs have been growing considerably in the past decades. This large body of knowledge attracts various stakeholders, e.g., for knowledge sharing, technology transfer, or potential collaborations. However, due to the large amount of complex knowledge created, traditional methods of manually curating catalogues are often out of time, imprecise, and cumbersome. Finding domain experts and knowledge within any larger organisation, scientific and also industrial, has thus become a serious challenge. Hence, exploring an institution's domain knowledge and finding its experts can only be solved by an automated solution. This work presents the scheme of an automated approach for identifying (scholarly) experts based on their publications and, prospectively, their teaching materials. Based on a search engine, this approach is currently being implemented for two universities, for which some examples are presented. The proposed system will be helpful for finding peer researchers as well as starting points for knowledge exploitation and technology transfer. As the system is designed in a scalable manner, it can easily include additional institutions and hence provide a broader coverage of research facilities in the future.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Research area classification</kwd>
        <kwd>Scholarly Dataset</kwd>
        <kwd>Search Engine</kwd>
        <kwd>Large language model</kwd>
        <kwd>Domain Experts Search</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In recent years, on the one hand, research institution and their research output have become more visible
due to the advancement of scholarly databases, data-sharing policies, and willingness to collaborate
amongst research institutions [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Nevertheless, most research institutions still follow the traditional
approach of merely listing metadata (such as titles and author names) of research results on their
websites alongside hand-curated profiles containing usually rather coarse-grained areas of expertise.
Hence, these websites usually only provide vague and often outdated information about researchers
and especially their specific expertise. This poses a major challenge for stakeholders interested in
understanding the research landscape or looking for domain experts or knowledge in, e.g. a nearby
institution.
      </p>
      <p>
        The number of research institutions has roughly doubled in every decade, and the number of
researchers has increased in a similar fashion [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Most institutions, however, are lagging in updating
publication metadata for their researchers, which leads to reduced visibility for researchers as well
as existing knowledge and hence limits the value of research institutions as nuclei for innovation,
especially for the surrounding regional industrial ecosystem. This situation is especially unpleasant
for small and medium-sized companies that cannot aford dedicated scientific staf who are able to
screen and penetrate scientific literature. Moreover, current research information management systems
(RIMS) are not capable of automatically keeping track of research areas that are tackled by publications.
Consider a researcher who started in the field of Natural Language Processing and Information Retrieval
and later also started working on chatbots and large language models (LLM) as an example. Due to
the involved manual work, RIMSs or websites are often not updated in a timely manner with such
evolving research domains. If, in this example, someone would be looking for an expert in artificial
intelligence language technology, then a RIMS might not give accurate results due to its rather static
content. Moreover, even the use of RIMS might still not be widely adopted, as it is often the case for
smaller Universities of Applied Sciences in Germany, where research has only slowly been gaining
importance in recent years. Hence, interested stakeholders usually have to rely on open-domain search
engines to find helpful experts from nearby research institutions, which in turn have to rely on the
rather static web content of these institutions.
      </p>
      <p>
        With the advance of Generative Artificial Intelligence [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], various AI systems, such as ChatGPT or
Gemini, have become publicly available and, according to previous research, might be a promising
solution for this challenge [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. However, after asking ChatGPT the question "Can you please provide a
list of domain experts in the field of big data at Hochschule Mannheim?", it merely replied the following:
"As of my last update in January 2022, I don’t have access to specific lists of domain experts at Hochschule
Mannheim (...); I recommend visiting the university’s website, department pages, or contacting the relevant
faculty or research centres directly. They can provide you with information about faculty members,
researchers, and experts in the field of big data at Hochschule Mannheim."
      </p>
      <p>This example anecdotally illustrates that even the most advanced open-domain chatbots are currently
over-challenged with this specific exercise and merely refer the user to perform a standard web search.
To overcome such time-consuming manual searches of domain experts based on static and potentially
not updated information, we propose a knowledge-based search engine, which is able to automatically
extract the research field(s) of scientists based on their publications and other information published
on the Web and, prospectively, also on materials from internal learning management systems, such as
Moodle. The key contributions of this paper are as follows, it presents:
• ideas for extracting the field of research from scientific articles.
• a search engine for finding domain experts based on the field of research
• a prototypical version of the proposed system.</p>
      <p>In the remainder of this paper, we briefly discuss the state of the art in section 2 and the proposed
approach itself in section 3. After that, we discuss its implementation in section 4 and some preliminary
results in section 5. Finally, we present important ideas for future work in section 6 and conclude our
work with section 7.</p>
    </sec>
    <sec id="sec-2">
      <title>2. State of the Art</title>
      <p>
        As the literature illustrates, the topic of classifying scientific articles into their respective field of research
is still emerging. Until today, academic institutions have mostly used a manual approach for collecting
and analysing scholarly data [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. For example, while reviewing the research data management at his
institution, the author of [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] was confronted with the fact that data is still collected manually to deliver
simple services such as a list of publications per researcher. Hence, it is not possible to search for
researchers based on a given topic. The research data in the university of the author of [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] was also
curated manually; overall this is time-consuming and produces delays in collecting and publishing the
data. The same holds true for the University of the authors, where publication lists are still managed
with the help of Excel tables and not centrally published at all.
      </p>
      <p>
        Another interesting study that has been conducted on the information-seeking behaviour of users of
17 search systems for academics has found that these search systems basically use very simple keyword
searches and hence bear great potential for improvement through more advanced search functionalities
[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. In the study reported by [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], an exploratory search using semantic technologies is used to provide
better access to domain experts. However, it is merely based on the research area provided by the
researchers and manually fed into the system. In another study, the author proposes REDI, a Linked
Data-powered framework for managing and storing academic data [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. However, they also still use
static data that is provided manually by researchers for this purpose.
      </p>
      <p>Beyond purely theoretical research, now there are also some workshops and challenges emerging,
aiming at building and comparing models able to classify research contained in scientific publications,
such NSLP 20241 for example: This exchange and such challenges are likely to attract diferent
approaches and models for classification in the future and, hence, help with the advancement of the
scientific community in this important field.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Approach</title>
      <p>
        Given the insights from, e.g., [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], it is clear that a purely search-based solution for research information
systems will be as imperfect as other manually curated catalogues, such as in libraries. Since the
development of a complex research system is a highly dynamic process that – according to experience
from various fields such as software engineering, design thinking, or entrepreneurship – needs to be
user-centric, we adopt a highly agile approach with rapid prototyping and brief feedback cycles mainly
inspired by the Lean Startup method [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>For the current early stage of our prototype, we envisage a research interested person searching for
domain experts as our central persona and attributed two use cases to it, namely directly finding domain
experts based on a classic keyword search and identifying the domains of expertise that are actually
represented at an institution. The main idea is to implement a prototypical solution according to the
following scheme and, once this is accomplished, evaluate it with researchers and other colleagues
involved in research management and technology transfer at our university.</p>
      <p>In the approach designed so far (cf. Figure 1 for an illustration), we aim to ingest a given list of
researchers from a university or a similar organisation in order to avoid noisy data usually coming from
a general web crawl. With an oficial list provided by the university, one can crawl for publications, e.g.,
via the university’s homepage or scrape it from another source, such as Research Gate or Google Scholar.
Another advantage of using an oficially provided list is that, in our case (and probably most other
cases as well), it contains at least some helpful metadata, such as the department or the broader subject
area. For the actual crawling, we apply a heuristic approach, for which we, e.g., take the university
name or the email’s domain into account to get the best possible matches. Once a researcher’s name
is identified with a given degree of certainty, we crawl their information, such as citations over past
years, co-authors, and lists of publications, and try to find the PDFs of the publications on the Web
where possible.</p>
      <p>
        Once the papers are extracted, we extract their content from the PDFs [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. For the time being, we
use ChatGPT’s API to identify a research area for the extracted content [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], which is then added to
the corresponding researcher’s profile. Both the texts of the crawled materials, i.e., the papers, as well
as the researcher profile, are then stored in a search engine as described in the section immediately
following, where we also describe the other steps involved in this process in more detail. In the section
that follows the next one, we elaborate on how we plan to further enhance classic search technology to
achieve better results in the quest for domain experts in the future.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Implementation</title>
      <p>This section discusses a prototypical implementation of our proposed approach with data from
Hochschule Mannheim University of Applied Sciences2. Currently, the proposed system is
implemented based on a list of professors by the university. Below, we highlight the most important insights
from implementing the sketch from Figure 1:</p>
      <p>Gathering Professors. First, we parsed a list of professors from the university website. In total,
it contained 188 professors that are listed together with some metadata like department, email, and
telephone number.</p>
      <p>Crawling Publication Data. Based on this initial list of professor names, we used a crawling script
in Python using an open-source3 library to search for the name of each professor on Google Scholar. If
there was a match, we extracted metadata of professors, such as given research areas, citation counts,
and list of publications for a total of 28 professors, most of them coming from the departments of
computer science and biotechnology. These 28 matches were manually verified for correctness.</p>
      <p>
        Collecting Publications. For each professor, we scraped a list of publications using Beautifulsoup
[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] to gather additional publication information like title, author, and link to the PDF of the paper.
Currently, we have links for 420 publications and were able to download 268 of them for our analysis.
The remaining papers were not avaiblable to us, mainly because they were behind a paywall or otherwise
not accessible. Once the PDFs were downloaded, we used another Python script to parse the PDF using
TIKA4 and GROBIRD5 to extract the textual content of the paper, excluding references as this might
add unnecessary noise to searches later. After cleaning out further unwanted information, like email
ID or URLs, we indexed the texts in our search engine.
      </p>
      <p>
        Identifying Research Areas. To identify the research area of each professor, we are currently
evaluating three approaches. First, we used the metadata from the university homepage; second, we
scraped the data entered by the researchers themselves in Google Scholar. Third, we aim to extend
these by extracting more fine-granular information from each downloaded paper with the help of a
Large Language Model (LLM), as indicated before. Currently, we do this by simply calling the ChatGPT
API, providing it with the research area classification from the Library of Congress [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], and asking
it to classify the field of research for each given paper accordingly. For each author, we merged the
research areas delivered by ChatGPT for his papers with those provided by the university, as well as
with those retrieved from Google Scholar. The result gives a relatively broad overview of the research
expertise of each professor.
      </p>
      <p>Illustrating Research Areas. From the union of all extracted research areas, we derived a word
cloud using bi-gram tokens. An example is shown in Figure 2 (b).</p>
      <p>This word cloud is intended to get stakeholders interested in a university an up-to-date overview of
research topics that are currently addressed at an institution.
4.1. Search Engine
We are currently using Elasticsearch 8.7.06 as our core search engine since it provides out of the box
text search functionality as well as advanced text analysis features for the data we collect. The overall
architecture, data model, and search &amp; browse interface are discussed in the following.</p>
      <sec id="sec-4-1">
        <title>3https://pypi.org/project/scholarly/ 4https://pypi.org/project/tika/ 5https://github.com/kermitt2/grobid 6https://www.elastic.co/guide/en/elasticsearch/reference/current/release-notes-8.7.0.html</title>
        <p>
          Architecture. We propose a proof of concept for our system based on a classic client-server
architecture. The backend consists of a Flask application [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], which operates as a server and is
connected to an Elasticsearch index containing all collected data. The front end is a client-side
single-page application also based on Flask with a "thin server" architecture. I.e., most business logic is
moved from the server to the client that requests data when needed, thereby allowing for a seamless
user experience. This architecture is shown in Figure 2 (a), which also explains the flow of data triggered
by a user request.
        </p>
        <p>Data Model. The data model of the application is based on the information required from a researcher.
Hence, it consists of the data extracted from the university website, Google Scholar, and the research
areas extracted from scholarly publications. The data model can later be discretionarily extended to
support further entities and their associated data.</p>
        <p>Search &amp; Browse. The homepage of the search engine initially shows the word cloud of the research
areas available in the institution to provide an overview of the fields of expertise that are present there.
An information-seeking stakeholder can then start looking for the desired experts by entering a search
term (i.e., a desired research area) in the text box, or they can browse a sortable list of research fields
that serve as a starting point to get to a domain expert. Once the search button is clicked, the user gets
a basic definition of the search term extracted from Wikipedia, as well as a list of available domain
experts. The user can further click on the domain expert to get more detailed profile information, such
as on the research area, publications, and potential links to other bibliographic sources. A glimpse of
the search interface with the interface is shown in Figure 3.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Preliminary Results and Lessons Learned</title>
      <p>We have implemented the approach as described before, and since search has become a fairly
wellunderstood topic in recent years, we have experienced no unexpected issues with its basic functionality.
What we have learned so far is that manually attributed research areas are typically more coarse-grained
than those that are extracted by ChatGPT, so it seems like a good complement at first glance. Even
after a closer look, the research areas extracted from ChatGPT make sense and nicely illustrate how
this approach is able to highlight even recent trends in a personal publication history. Consider the
following research areas extracted for one professor at our university:
• University Website: Big Data, Data Science, Information Retrieval, Software Engineering
• Google Scholar: Big Data, Software Engineering, Information Retrieval
• Extracted from publications by ChatGPT: Cognitive Neuroscience, Software Design Patterns,</p>
      <p>Object-Oriented Programming
The ChatGPT data is apparently fine-grained and also mirrors a very recent collaborative work of
this colleague in an unusual field. However, it is, of course, reasonable to question whether one
joint publication in an area like "Cognitive Neuroscience" turns a computer scientist into an expert in
psychology. Thus, it might make sense to consider further metrics, such as the number of papers in an
area or something similar, to obtain even better results in the future.</p>
      <p>Another lesson learned is that although a word cloud seems to be a nice visual aid at first glance,
our current implementation reveals some weaknesses at second glance. First and foremost, it is visible
that not all research areas are well represented by bi-grams. Moreover, a mixture of languages (such as
English and German in our case) in non-English speaking countries might be somewhat confusing for
prospective users and needs to be fixed in future versions.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Future Work</title>
      <p>We plan to extend the search database to a partner university to add more researchers and demonstrate
that our approach is generalisable to multiple institutions in the future. We have also planned to
integrate additional data sources, like other academic search engines or platforms such as Research
Gate or DBLP, to increase the coverage of our approach. Moreover, it is necessary to improve the
quality of the word cloud since, obviously, not all research areas are well represented by bi-grams. One
way to handle this better might be to use a positive list of research areas as a filter for retrieved results.
As mentioned before, it also seems necessary to add some basic language detection and translation
capabilities so that a word cloud does not contain a mixture of various languages, such as English and
German, as in our example. In the short term, we are also aiming to improve the design of the search
page by adding more functionality, such as a chatbot for answering questions about the domain and
providing contact details.</p>
      <p>In the spirit of the Lean Startup approach, we also plan to gather feedback from potential users of
our system to make sure that we are actually developing a useful piece of technology. We also plan to
use semantic web technologies for the mapping of research areas within ontologies [17, 18].</p>
      <p>Another possible more long-term extension for our work is to test diferent (and locally hosted) LLM
implementations for the extraction of subject areas from papers and to evaluate the results obtained
from them. As the preliminary results from ChatGPT illustrated, it still seems necessary to better
understand the accuracy of the search results in general and the applicability of LLMs for such tasks in
particular.</p>
      <p>The knowledge embodied in research publications is certainly important and valuable; however, it is
probably only one side of the coin as it mostly covers the latest research results. As most scientists also
have teaching obligations, it is probably safe to assume that a large part of their more fundamental
knowledge is embodied in teaching materials. This is obviously less interesting for research transfer, but
nevertheless, it might be interesting for finding potential teachers for advanced training, e.g., technical
domains, and, of course, for broadening and sharpening the recognised knowledge areas of domain
experts.</p>
      <p>However, teaching materials are usually not published and, hence, not freely available. With access
to the course management platform (CMP) of an institution, which should be possible for our system
once it becomes oficially used there, downloading these materials is probably less a technical issue
and more a question of Copyright and willingness of the afected colleagues to at least share their
materials for analysis with our system, if they do not want their scripts and slide decks to become
publicly available. Hence, integrating a "stealth" mode for files that should be analysed but not indexed,
as well as a crawler for our institution’s CMP (which is Moodle) into our system, is another future task
we are about to tackle soon.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>Finding domain experts at research institutions is a challenging task that current search- or even
catalogue-based approaches are not able to achieve. Hence, in this work, we presented the core of a
modern research information system that is able to extract the research field from scientific publications
and can be searched using a web interface. Built upon an open-source search engine, it can provide
a list of domain experts for a given topic as well as links to their profile page or personal website for
users in need of further information. Hence, it will be useful for stakeholders to easily identify available
expertise at an institution, e.g., to initiate collaborations, research transfer, or advanced training.</p>
      <p>One of the current limitations of our approach is that not all researchers have a profile on a scholarly
database like Google Scholar, which might make it dificult to retrieve their publications and derive the
research areas in which they are active. Another limitation is that getting a PDF version of a publication
is often not possible due to the paywalls used by many publishers. Hence, in conclusion, although our
system already delivers promising results in its early stages of development, there is plenty of room
and need for future work.</p>
    </sec>
    <sec id="sec-8">
      <title>ACKNOWLEDGEMENT</title>
      <p>The work has been carried out under the TransforMA project. Authors disclosed receipt of the following
ifnancial support for the research, authorship, and/or publication of this article. This project has received
funding from the federal-state initiative "Innovative Hochschule" of the Federal Ministry of Education
and Research (BMBF) in Germany.</p>
    </sec>
    <sec id="sec-9">
      <title>Declaration on Generative AI</title>
      <sec id="sec-9-1">
        <title>The present study does not use any AI tool for text generation or rephrasing.</title>
        <p>[17] D. Nandini, G. K. Shahi, An ontology for transportation system, Kalpa Publications in Computing
10 (2019) 32–37.
[18] B. Dutta, D. Nandini, G. K. Shahi, Mod: metadata for ontology description and publication, in:
International Conference on Dublin Core and Metadata Applications, 2015, pp. 1–9.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Farooq</surname>
          </string-name>
          ,
          <article-title>Knowledge management and performance: a bibliometric analysis based on scopus and wos data (1988-2021)</article-title>
          ,
          <source>Journal of Knowledge Management</source>
          <volume>27</volume>
          (
          <year>2023</year>
          )
          <fpage>1948</fpage>
          -
          <lpage>1991</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>A century of science: Globalization of scientific collaborations, citations, and innovations</article-title>
          ,
          <source>in: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>1437</fpage>
          -
          <lpage>1446</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G.</given-names>
            <surname>Strobel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Banh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Möller</surname>
          </string-name>
          , T. Schoormann,
          <article-title>Exploring generative artificial intelligence: A taxonomy and types (</article-title>
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Askari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Aliannejadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Kanoulas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Verberne</surname>
          </string-name>
          ,
          <article-title>A test collection of synthetic documents for training rankers: Chatgpt vs. human experts</article-title>
          ,
          <source>in: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>5311</fpage>
          -
          <lpage>5315</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>Guest</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. E.</given-names>
            <surname>Namey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Mitchell</surname>
          </string-name>
          ,
          <article-title>Collecting qualitative data: A field manual for applied research</article-title>
          , Sage,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>Perrier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Blondal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. P.</given-names>
            <surname>Ayala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dearborn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Kenny</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lightfoot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Reka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Thuna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Trimble</surname>
          </string-name>
          , H. MacDonald,
          <article-title>Research data management in academic institutions: A scoping review</article-title>
          ,
          <source>PLoS One</source>
          <volume>12</volume>
          (
          <year>2017</year>
          )
          <article-title>e0178261</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>F.</given-names>
            <surname>Schuetzenmeister</surname>
          </string-name>
          , University research management:
          <article-title>An exploratory literature review (</article-title>
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Y. R.</given-names>
            <surname>Nedumov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. D.</given-names>
            <surname>Kuznetsov</surname>
          </string-name>
          ,
          <article-title>Exploratory search for scientific articles</article-title>
          ,
          <source>Programming and Computer Software</source>
          <volume>45</volume>
          (
          <year>2019</year>
          )
          <fpage>405</fpage>
          -
          <lpage>416</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>T.</given-names>
            <surname>Schopf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Machner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Matthes</surname>
          </string-name>
          ,
          <article-title>A knowledge graph approach for exploratory search in research institutions</article-title>
          .,
          <source>in: KMIS</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>265</fpage>
          -
          <lpage>270</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J. Ortiz</given-names>
            <surname>Vivar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Segarra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Villazón-Terrazas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Saquicela</surname>
          </string-name>
          , Redi:
          <article-title>Towards knowledge graphpowered scholarly information management and research networking</article-title>
          ,
          <source>Journal of Information Science</source>
          <volume>48</volume>
          (
          <year>2022</year>
          )
          <fpage>167</fpage>
          -
          <lpage>181</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>E. Ries,</surname>
          </string-name>
          <article-title>The lean startup: How today's entrepreneurs use continuous innovation to create radically successful businesses</article-title>
          ,
          <source>Currency</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>D.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <article-title>Revolutionizing retrieval-augmented generation with enhanced pdf structure recognition</article-title>
          ,
          <source>arXiv preprint arXiv:2401.12599</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>O. D.</given-names>
            <surname>Okey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. U.</given-names>
            <surname>Udo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. L.</given-names>
            <surname>Rosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. Z.</given-names>
            <surname>Rodríguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Kleinschmidt</surname>
          </string-name>
          ,
          <article-title>Investigating chatgpt and cybersecurity: A perspective on topic modeling and sentiment analysis</article-title>
          ,
          <source>Computers &amp; Security</source>
          <volume>135</volume>
          (
          <year>2023</year>
          )
          <fpage>103476</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>L.</given-names>
            <surname>Richardson</surname>
          </string-name>
          , Beautiful soup documentation,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>A.</given-names>
            <surname>Salaba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. M.</given-names>
            <surname>Chan</surname>
          </string-name>
          ,
          <article-title>Cataloging and classification: an introduction</article-title>
          ,
          <source>Rowman &amp; Littlefield</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>G. K.</given-names>
            <surname>Shahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. Kana</given-names>
            <surname>Tsoplefack</surname>
          </string-name>
          ,
          <article-title>Mitigating harmful content on social media using an interactive user interface</article-title>
          ,
          <source>in: International Conference on Social Informatics</source>
          , Springer,
          <year>2022</year>
          , pp.
          <fpage>490</fpage>
          -
          <lpage>505</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>