<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Recognition in Scientific Domain with Fine-Tuning and Few-Shot Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Davide Buscaldi</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Danilo Dessi</string-name>
          <email>ddessi@sharjah.ac.ae</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesco Osborne</string-name>
          <email>francesco.osborne@open.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Davide Piras</string-name>
          <email>d.piras38@studenti.unica.it</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Diego Reforgiato Recupero</string-name>
          <email>diego.reforgiato@unica.it</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Large Language Models, Named Entity Recognition, Knowledge Graph Construction, Scholarly Domain</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Business and Law, University of Milano Bicocca</institution>
          ,
          <addr-line>Milan</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer Science, College of Computing and Informatics, University of Sharjah</institution>
          ,
          <addr-line>Sharjah, UAE</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Department of Mathematics and Computer Science</institution>
          ,
          <addr-line>Via Ospedale 62, Cagliari, 09121</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Knowledge Media Institute, The Open University</institution>
          ,
          <addr-line>Walton Hall, Kents Hill, Milton Keynes, MK76AA</addr-line>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Laboratoire d'Informatique de Paris Nord, Sorbonne Paris Nord University</institution>
          ,
          <addr-line>Paris</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Entity extraction is a crucial step in constructing Knowledge Graphs (KGs) from natural language text. In the scientific domain, Named Entity Recognition (NER) is widely used to analyze research papers and facilitate the generation of knowledge graphs that capture research concepts. Given the vast scale of contemporary research output, this task necessitates automated pipelines to maintain eficiency while ensuring the quality of the extracted knowledge. Large Language Models (LLMs) present a promising solution to this challenge. As such, this paper explores the efectiveness of LLMs for NER in scientific texts, using the SciERC dataset as a benchmark. Specifically, it evaluates diferent LLM architectures, including encoder-only, decoder-only, and encoder-decoder models, to identify the most efective approach for NER in the computer science domain. By examining the strengths and limitations of each model type, this study aims to provide deeper insights into the applicability of LLMs for entity extraction, ultimately improving the construction of domain-specific KGs.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Entity extraction is a key step in constructing knowledge graphs (KGs) from natural language text. A
fundamental technique for this task is named entity recognition (NER), a natural language processing
(NLP) method that identifies text spans referring to real-world entities [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and assigns them to specific
categories. In the scientific domain, NER plays a key role in processing research papers and facilitating
the generation of KGs that encapsulate research concepts. As a result, NER is an essential component
in scientific KG construction pipelines [
        <xref ref-type="bibr" rid="ref2 ref3 ref4">2, 3, 4</xref>
        ] and is widely employed for the semantic indexing of
documents [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Large Language Models (LLMs) have been achieving significant success across a wide
range of tasks. Their ability to understand general-purpose language is attributed to their extensive
parameters, which are trained on vast amounts of data. The rise of models such as BERT [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], GPT [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ],
and T5 [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] has revolutionized NLP by providing robust tools that can be fine-tuned for specific tasks,
including NER. These models leverage deep learning techniques to capture nuanced patterns in text,
thus improving the performance of various NLP applications.
      </p>
      <p>
        In this work, we study diferent types of LLMs on a NER task within the scholarly domain using the
SciERC benchmark dataset [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Specifically, we investigate the performance of encoder-only models,
encoder-decoder models, and decoder-only models in recognizing and classifying entities in scientific
texts. The primary objectives of this comparison are to 1) determine which model type performs best
on NER tasks within specialized domains and 2) understand the strengths and weaknesses of each
      </p>
      <p>CEUR</p>
      <p>ceur-ws.org
model type in handling domain-specific language. To achieve these goals, we focused on three diferent
strategies: fine-tuning, zero-shot, and few-shot learning. The reason behind incorporating zero-shot
and few-shot approaches alongside fine-tuning is to test the generalization capabilities of certain models.
Generalization capabilities enable the possibility to apply LLMs on all those tasks that do not provide
suficient training data, making LLMs a valuable tool for several domains. Achieving good results with
these techniques indicates that a model has human-like capabilities and can solve the task by leveraging
its pre-existing knowledge without the need for extensive additional training. Our analysis aims to
provide insights into the suitability of diferent LLM architectures for NER tasks for the automatic
detection of scientific entities from natural language text.</p>
      <p>The contribution of this paper is twofold. First, we evaluate the performance of three LLMs on a
NER task in the scientific domain. Second, we provide insights into the architecture of these models
and their efectiveness in addressing NER tasks within this domain. All the source code used for the
analysis reported in this paper can be found at https://github.com/dpiras38/scierc_notebooks_ner.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>LLMs have been recently explored for NER applications in scientific writing. However, the task has
proven to be dificult due to the writing style, nuances, and technical vocabulary used in academic texts.</p>
      <p>
        Luan et al. (2018) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] introduced the SciERC dataset, comprising 500 annotated scientific abstracts
with entities, relationships, and coreference clusters. They proposed a multi-task model for jointly NER,
relation extraction, and coreference resolution. This approach has proved to be efective for datasets like
SciERC, where entities and relationships are closely linked. A later variation [10] replaced the multi-task
strategy with entity-type prediction and incorporated cross-sentence context to enrich the input for the
model. BERT (Bidirectional Encoder Representations from Transformers) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] revolutionized NLP by
leveraging a new architecture [11] that interprets word semantics based on their context. SciBERT [12]
was one of the first adaptations for scientific texts, trained on research papers with a specialized
vocabulary, SciVocab. In particular, only 42% of its terms overlap with the BERT’s, highlighting existing
diferences between common and scientific language. SciBERT has since outperformed general models
in entity and key term recognition on datasets like SciERC and BC5CDR [12].
      </p>
      <p>Over the past five years, LLMs have significantly advanced the state of the art in NLP and information
extraction across various domains [13, 14, 15, 16, 17]. In particular, several decoder-only models
have demonstrated exceptional performance in tasks related to academic text, including research
paper classification [ 18], citation recommendation [19, 20], automatic construction of research topics
ontologies [21, 22], and literature review generation [23]. Conversely, encoder-decoder models, such as
T5, have shown excellent performance in tasks such as molecule captioning [24] and scientific question
answering via natural language to SPARQL translation [25]. In this paper, we compare these models on
the task of NER applied to research papers for knowledge graph generation.</p>
      <p>Indeed, the scientific and academic communities, which traditionally relied only on relatively simple
knowledge organisation systems [26] for structuring research topics and indexing papers, have witnessed
a substantial expansion in the use of KGs, which can accommodate a wider range of concepts and
connections.</p>
      <p>
        Several KGs have been developed using language models and transformer-based architectures. These
eforts span fully automated pipelines [ 27, 28] as well as hybrid methodologies that incorporate human
expertise into the construction and curation process [29, 30]. The resulting graphs are often further
enhanced through link prediction techniques [31], which improve their completeness and semantic
coherence by inferring missing relationships. Notable examples of these KGs include SemOpenAlex [32],
AIDA-KG [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], OpenCitations [33], ORKG [34], AI-KG [35], CS-KG [36, 37], and Nano-publications [38].
These KGs enable a wide range of domain-specific applications, such as academic writing support [ 39],
verification and completion of research claims [ 40], automated generation of research hypotheses [41],
enriching metadata of scientific books [ 42], and the creation of specialised conversational agents [43?
, 44], among others. Developing robust and accurate models for NER in scientific papers is crucial for
constructing and refining these knowledge bases.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. The Used Dataset: SciERC</title>
      <p>
        The SciERC dataset, introduced in 2018 [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], stands as a pivotal resource for generating models aimed at
the extraction of entities, relationships, and coreference resolution within scientific texts. Models based
on this dataset have been employed in KG construction with remarkable performances [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. SciERC is
manually crafted by annotating research articles from 12 major workshops and conferences, including
the Annual Meeting of the Association for Computational Linguistics (ACL) and the Empirical Methods in
Natural Language Processing conference (EMNLP). SciERC diverged from earlier datasets by concentrating
exclusively on computer science research articles, filling a significant gap in the field. The dataset
comprises annotations covering: 6 types of entities (Task, Metric, Method, Generic, OtherScientificTerm ,
and Material), relationships between entities (used-for, feature-of, hyponym-of, part-of, compare,
and conjunction), and coreference (identification of diferent text spans that refer to the same entity).
      </p>
      <p>At the time of release, SciERC is pre-divided into training, validation, and test sets. Its data contains
both entities and relationships. It can be downloaded in both raw and processed formats (tokenized and
JSON) from http://nlp.cs.washington.edu/sciIE/. For the analysis presented in this paper, we only use
the entities and their types, and leave the analysis on relationship extraction and coreference resolution
tasks to future endeavors. Furthermore, since in SciERC an entity span can include other sub-entities
(e.g., the entity natural language processing contains the entity natural language, we have pre-processed
the dataset so that each entity is associated with the largest corresponding span, and all the other
entities within that span are removed.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Large Language Models</title>
      <p>In this section, we will illustrate the LLMs that we have used in this paper.</p>
      <p>
        SciBERT. SciBERT [12] is built upon the BERT architecture [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], and uses an encode-only architecture.
It is pre-trained on a corpus of 1.14 million papers randomly sampled from SemanticScholar (https:
//www.semanticscholar.org/). This corpus consists of 82% biomedical domain data and 18% computer
science domain data. SciBERT leverages domain-specific knowledge well, making it a valuable tool for
researchers for NLP tasks [12] in specialized domains.
      </p>
      <p>
        Mistral. Mistral, introduced in [45], is a decoder-only model that balances high performance and
eficiency in LLMs. Mistral builds on top of the transformer architecture and introduces key innovations
when compared to other models such as LLama, including: i) Sliding Window Attention, which
improves long-sequence processing by utilizing a window that allows the model to better exploit
contextual information, ii) Rolling Bufer Cache, which optimizes the model memory by storing recently
encountered tokens, and iii) Pre-fill and Chunking that split an input prompt into smaller segments
and pre-compiles the cache, thus enabling fast processing of small chunks from a larger data. Mistral’s
performance has been tested against leading competitors such as LLaMA on various tasks, including
commonsense reasoning, reading comprehension, code understanding, world knowledge, and math [45].
T5. T5, which stands for “Text-To-Text Transfer Transformer”, is a model introduced by Google Research
in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. It is based on the encoder-decoder architecture of Transformers and tackles tasks where the
input and the output are text strings. T5 uses a sequence-to-sequence framework, which consists of an
encoder that reads the input text and a decoder that generates the output text. Each task is formulated
as a text transformation problem. T5 was pre-trained on the C4 corpus, which has been commonly
used in the pre-training of other models like LLaMA.
      </p>
    </sec>
    <sec id="sec-5">
      <title>5. LLMs Learning Methodologies</title>
      <p>This section outlines the learning methodologies that we have employed within the LLMs.</p>
      <sec id="sec-5-1">
        <title>5.1. Zero-Shot Learning</title>
        <p>The zero-shot setting was employed with Mistral 7B model. Fig. 3 (in Appendix) shows the prompt used
for the zero-shot setting. The prompt instructs the LLM about the task and the expected entity types,
and requires the LLM to provide the answer in a structured format that can be processed automatically.
This approach is not possible with T5 and SciBERT models due to their architectural diferences.</p>
        <p>After collecting all responses, a post-processing step was applied to remove errors unrelated to
classification that could impact the results. These errors primarily involved inconsistencies in response
formatting and, in some cases, word repetition. Additionally, after an empirical analysis, we limited the
generated answers to 70 tokens to reduce the possibility of hallucination.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Few-Shot Learning</title>
        <p>Few-shot learning provides the LLMs with a few examples to learn how to better solve a task. In
this work, we focused on the variants with one example, referred to as one-shot, and three examples,
referred to as three-shots. An example of a one-shot prompt is shown in Fig. 4 (in the Appendix). The
process for developing this task was very similar to the zero-shot process, with the only diference
being in the prompt used. For the one-shot task, we chose an example that represented the maximum
number of diverse entities possible, while for the three-shot, three examples were randomly selected.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Fine Tuning</title>
        <p>In this work, we fine-tuned Mistral 7B, T5, and SciBERT for the proposed task using the SciERC dataset.
The parameters used for each model are detailed in the sections above.</p>
        <p>Mistral 7B. We used a quantized 4-bit version provided by TheBloke on the Hugging Face Hub1 and
loaded it using the HuggingFace AutoModelForCausalLM class. Each record in the training and validation
set was incorporated into a predefined prompt similar to a one-shot setup. However, for fine-tuning,
we add the list of entities in the format ”Type of Entity: Entity;” after the model response, as illustrated
in Fig. 5 (in the Appendix). Fine-tuning was conducted using the Trainer Library2 with the following
parameters: learning_rate = 2.5e - 5, gradient_checkpointing=True, optim=’paged_adamw_32bit’ and
num_train_epochs = 2. As for the previous prompts, we limited the answer to 70 tokens.</p>
        <p>T5. For T53, we utilized the models directly provided by Google via the Hugging Face Hub, selecting
the Base version to balance performance and hardware eficiency. This model was lightweight enough
to run on our available hardware without requiring quantization while still being powerful enough for
our tasks. We loaded the model using the AutoModelForSeq2SeqLM 4 library and applied a pre-configured
prompt for each sample in the Training and Validation sets. Compared to the one used for Mistral, this
prompt was significantly simpler. Figures 1 and 2 respectively illustrate a prompt used for inference
and the corresponding response generated by T5. Additionally, the contents illustrated in both figures
have been used together for the fine-tuning of T5.
1https://huggingface.co/TheBloke/Mistral-7B-v0.1-GPTQ
2https://huggingface.co/docs/transformers/main_classes/trainer
3https://huggingface.co/google-t5/t5-base
4https://huggingface.co/docs/transformers/model_doc/auto#transformers.AutoModelForSeq2SeqLM</p>
        <p>Precision</p>
        <p>Recall F1 Score
Mistral 7B (Fine Tuning)</p>
        <p>T5 Base (Fine Tuning)
SciBERT (Fine Tuning)
Mistral 7B (Zero Shot)
Mistral 7B (One Shot)
Mistral 7B (Three Shot)</p>
        <p>SciBERT. Regarding SciBERT5, we used the uncased version. The fine-tuning process was developed
as follows. First, the model was loaded, through the class AutoModelForTokenClassification , then the
corresponding tokenizer was created using the class AutoTokenizer, both from the Transformers library.
For training, we used learning_rate = 5e-5 and num_train_epochs = 6. Finally, we instantiated a trainer
using the class Trainer from Transformers library, and we proceeded with the training process.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Results</title>
      <sec id="sec-6-1">
        <title>6.1. Evaluation</title>
        <p>This section describes the evaluation setting as well as the outcome of our analysis.
To evaluate the diferent models, an entity is deemed correct if both its span boundaries and category
are accurately identified. We defined: True Positives (TP): elements that are present in the predictions
which are in the test set. False Positives (FP): elements present in the model’s predictions but absent or
diferent in the test set (in either span boundaries or category). False Negatives (FN): elements present
in the test set but absent or diferent in the model’s predictions.</p>
      </sec>
      <sec id="sec-6-2">
        <title>6.2. Outcome Analysis</title>
        <p>to 17% with three examples, again achieved by Mistral. Finally, it is evident from the results that LLMs
cannot directly be used of the shelf and that fine-tuning is necessary to capture domain peculiarities
and perform a satisfactory NER.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusions</title>
      <p>
        This work presented a comprehensive evaluation of various LLMs on the task of NER using the SciERC
dataset as a benchmark. The results demonstrate that fine-tuning LLMs significantly enhances their
performance on NER tasks. Among the models tested, SciBERT achieved the highest performance
with an F1 score of 69.72%. These results highlight the importance of domain-specific pre-training in
achieving better performance in scientific NER tasks. Besides, the decoder-only models performed worse
than any other model, even with fine-tuning, demonstrating that model architecture and pre-training
are critical performance factors. The results of zero-shot and few-shot learning approaches suggest
that these models should not be employed for entity detection, confirming insights already detected in
similar scenarios for KG construction [46]. Even the best few-shot approaches could not match the
performance of fine-tuning, highlighting the challenges these models face when applied to scientific
NER tasks without extensive training. In conclusion, our analysis suggests that i) SciBERT is still a
reliable and valid option for constructing KGs in the computer science domain, and ii) specialized
models can still be a better option for niche tasks. The insights of this work will be leveraged to improve
the construction pipeline SCICERO [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and to generate newer versions of the CS-KG [36].
      </p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used ChatGPT, Grammarly in order to: Grammar
and spelling check, paraphrase and reword. After using this tool/service, the author(s) reviewed and
edited the content as needed and take(s) full responsibility for the publication’s content.
[10] Z. Zhong, D. Chen, A Frustratingly Easy Approach for Entity and Relation Extraction, North</p>
      <p>American Chapter of the Association for Computational Linguistics (NAACL) (2021).
[11] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin,</p>
      <p>Attention Is All You Need, Neural Information Processing Systems (NeurIPS) (2017).
[12] I. Beltagy, K. Lo, A. Cohan, SciBERT: A Pretrained Language Model for Scientific Text, International</p>
      <p>Joint Conference on Natural Language Processing (IJCNLP) (2019).
[13] J. A. Omiye, H. Gui, S. J. Rezaei, J. Zou, R. Daneshjou, Large language models in medicine: the
potentials and pitfalls: a narrative review, Annals of internal medicine 177 (2024) 210–220.
[14] E. Motta, F. Osborne, M. M. Pulici, A. Salatino, I. Naja, Capturing the viewpoint dynamics in
the news domain, in: International Conference on Knowledge Engineering and Knowledge
Management, Springer, 2024, pp. 18–34.
[15] C. W. Kosonocky, C. O. Wilke, E. M. Marcotte, A. D. Ellington, Mining patents with large language
models elucidates the chemical function landscape, Digital Discovery 3 (2024) 1150–1159.
[16] K. Yang, T. Zhang, Z. Kuang, Q. Xie, J. Huang, S. Ananiadou, Mentallama: interpretable mental
health analysis on social media with large language models, in: Proceedings of the ACM Web
Conference 2024, 2024, pp. 4489–4500.
[17] A. Chessa, G. Fenu, E. Motta, F. Osborne, D. R. Recupero, A. Salatino, L. Secchi, Data-driven
methodology for knowledge graph generation within the tourism domain, IEEE Access 11 (2023)
67567–67599.
[18] A. Cadeddu, A. Chessa, V. D. Leo, G. Fenu, E. Motta, F. Osborne, D. R. Recupero, A. Salatino, L. Secchi,
Optimizing tourism accommodation ofers by integrating language models and knowledge graph
technologies, Information 15 (2024) 398.
[19] D. Buscaldi, D. Dessí, E. Motta, M. Murgia, F. Osborne, D. R. Recupero, Citation prediction by
leveraging transformers and natural language processing heuristics, Information Processing &amp;
Management 61 (2024) 103583.
[20] Y. Zhang, Y. Wang, K. Wang, Q. Z. Sheng, L. Yao, A. Mahmood, W. E. Zhang, R. Zhao, When large
language models meet citation: A survey, arXiv preprint arXiv:2309.09727 (2023).
[21] H. Babaei Giglou, J. D’Souza, S. Auer, Llms4ol: Large language models for ontology learning, in:</p>
      <p>International Semantic Web Conference, Springer, 2023, pp. 408–427.
[22] T. Aggarwal, A. Salatino, F. Osborne, E. Motta, Large language models for scholarly ontology
generation: An extensive analysis in the engineering field, arXiv preprint arXiv:2412.08258 (2024).
[23] F. Bolanos, A. Salatino, F. Osborne, E. Motta, Artificial intelligence for literature reviews:
Opportunities and challenges, Artificial Intelligence Review 57 (2024).
[24] C. Edwards, T. Lai, K. Ros, G. Honke, K. Cho, H. Ji, Translation between molecules and natural
language, arXiv preprint arXiv:2204.11817 (2022).
[25] J. Lehmann, A. Meloni, E. Motta, F. Osborne, D. R. Recupero, A. A. Salatino, S. Vahdati, Large
language models for scientific question answering: An extensive analysis of the sciqa benchmark,
in: European Semantic Web Conference, Springer, 2024, pp. 199–217.
[26] A. Salatino, T. Aggarwal, A. Mannocci, F. Osborne, E. Motta, A survey on knowledge organization
systems of research fields: Resources and challenges, Quantitative Science Studies (2025) 1–37.
[27] D. Dessì, F. Osborne, D. R. Recupero, D. Buscaldi, E. Motta, Generating knowledge graphs by
employing natural language processing and machine learning techniques within the scholarly
domain, Future Generation Computer Systems 116 (2021) 253–264.
[28] L. Zhong, J. Wu, Q. Li, H. Peng, X. Wu, A comprehensive survey on automatic knowledge graph
construction, ACM Computing Surveys 56 (2023) 1–62.
[29] S. Tsaneva, D. Dessì, F. Osborne, M. Sabou, Knowledge graph validation by integrating llms and
human-in-the-loop, Information Processing &amp; Management 62 (2025) 104145.
[30] A. Brack, A. Hoppe, M. Stocker, S. Auer, R. Ewerth, Analysing the requirements for an open research
knowledge graph: use cases, quality requirements, and construction strategies, International
Journal on Digital Libraries 23 (2022) 33–55.
[31] M. Nayyeri, G. M. Cil, S. Vahdati, F. Osborne, M. Rahman, S. Angioni, A. Salatino, D. R.
Recupero, N. Vassilyeva, E. Motta, et al., Trans4e: Link prediction on scholarly knowledge graphs,
Neurocomputing 461 (2021) 530–542.
[32] M. Färber, D. Lamprecht, J. Krause, L. Aung, P. Haase, Semopenalex: The scientific landscape in 26
billion rdf triples, in: International Semantic Web Conference, Springer, 2023, pp. 94–112.
[33] M. Daquino, S. Peroni, D. Shotton, G. Colavizza, B. Ghavimi, A. Lauscher, P. Mayr, M. Romanello,
P. Zumstein, The opencitations data model, in: International semantic web conference, Springer,
2020, pp. 447–463.
[34] M. Y. Jaradeh, A. Oelen, K. E. Farfar, M. Prinz, J. D’Souza, G. Kismihók, M. Stocker, S. Auer, Open
research knowledge graph: next generation infrastructure for semantic scholarly knowledge, in:
Proceedings of the 10th international conference on knowledge capture, 2019, pp. 243–246.
[35] D. Dessì, F. Osborne, D. Reforgiato Recupero, D. Buscaldi, E. Motta, H. Sack, Ai-kg: an automatically
generated knowledge graph of artificial intelligence, in: The Semantic Web–ISWC 2020: 19th
International Semantic Web Conference, Springer, 2020, pp. 127–143.
[36] D. Dessí, F. Osborne, D. Reforgiato Recupero, D. Buscaldi, E. Motta, Cs-kg: A large-scale knowledge
graph of research entities and claims in computer science, in: International Semantic Web
Conference, Springer, 2022, pp. 678–696.
[37] D. Dessí, F. Osborne, D. Buscaldi, D. Reforgiato Recupero, E. Motta, Cs-kg 2.0: A large-scale
knowledge graph of computer science, Scientific Data 12 (2025) 1–16.
[38] T. Kuhn, C. Chichester, M. Krauthammer, N. Queralt-Rosinach, R. Verborgh, G. Giannakopoulos,
A.-C. N. Ngomo, R. Viglianti, M. Dumontier, Decentralized provenance-aware publishing with
nanopublications, PeerJ Computer Science 2 (2016) e78.
[39] S. Brody, Scite, Journal of the Medical Library Association: JMLA 109 (2021) 707.
[40] A. Borrego, D. Dessi, I. Hernández, et al., Completing scientific facts in knowledge graphs of
research concepts, IEEE Access 10 (2022) 125867–125880.
[41] A. Borrego, D. Dessì, D. Ayala, I. Hernández, F. Osborne, D. R. Recupero, D. Buscaldi, D. Ruiz,
E. Motta, Research hypothesis generation over scientific knowledge graphs, Knowledge-Based
Systems 315 (2025) 113280.
[42] A. A. Salatino, F. Osborne, A. Birukou, E. Motta, Improving editorial workflow and metadata
quality at springer nature, in: The Semantic Web–ISWC 2019: 18th International Semantic Web
Conference, Auckland, New Zealand, October 26–30, 2019, Proceedings, Part II 18, Springer, 2019,
pp. 507–525.
[43] R. Alonso, D. Dessí, A. Meloni, M. Murgia, D. R. Recupero, G. Scarpi, A seamless chatgpt knowledge
plug-in for the labour market, IEEE Access (2024).
[44] L. Laranjo, A. G. Dunn, H. L. Tong, A. B. Kocaballi, J. Chen, R. Bashir, D. Surian, B. Gallego,
F. Magrabi, A. Y. Lau, et al., Conversational agents in healthcare: a systematic review, Journal of
the American Medical Informatics Association 25 (2018) 1248–1258.
[45] A. Q. Jiang, A. Sablayrolles, A. Mensch, C. Bamford, D. S. Chaplot, D. de Las Casas, F. Bressand,
G. Lengyel, G. Lample, L. Saulnier, L. R. Lavaud, M.-A. Lachaux, P. Stock, T. L. Scao, T. Lavril,
T. Wang, T. Lacroix, W. E. Sayed, Mistral 7B, arXiv (2023).
[46] L. Gan, M. Blum, D. Dessì, B. Mathiak, R. Schenkel, S. Dietze, Hidden entity detection from github
leveraging large language models, volume 3894 of CEUR Workshop Proceedings, CEUR-WS.org,
2024. URL: https://ceur-ws.org/Vol-3894/dl4kg_paper4.pdf.</p>
    </sec>
    <sec id="sec-9">
      <title>Appendix</title>
    </sec>
    <sec id="sec-10">
      <title>A. Zero-Shot Learning</title>
    </sec>
    <sec id="sec-11">
      <title>B. Few-Shot Learning</title>
    </sec>
    <sec id="sec-12">
      <title>C. Fine Tuning</title>
      <p>Figure 5: Example prompt for the fine-tuning of Mistral 7B.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B.</given-names>
            <surname>Jehangir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Radhakrishnan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <article-title>A survey on named entity recognition-datasets, tools, and methodologies</article-title>
          ,
          <source>Natural Language Processing Journal</source>
          <volume>3</volume>
          (
          <year>2023</year>
          )
          <fpage>100017</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Dessí</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Osborne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Recupero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Buscaldi</surname>
          </string-name>
          , E. Motta,
          <article-title>Scicero: A deep learning and nlp approach for generating scientific knowledge graphs in the computer science domain</article-title>
          ,
          <source>KnowledgeBased Systems</source>
          <volume>258</volume>
          (
          <year>2022</year>
          )
          <fpage>109945</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Angioni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Salatino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Osborne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Recupero</surname>
          </string-name>
          , E. Motta,
          <article-title>Aida: A knowledge graph about research dynamics in academia and industry</article-title>
          ,
          <source>Quantitative Science Studies</source>
          <volume>2</volume>
          (
          <year>2021</year>
          )
          <fpage>1356</fpage>
          -
          <lpage>1398</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zloch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dessì</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. D'Souza</surname>
          </string-name>
          , et al.,
          <article-title>Research knowledge graphs: the shifting paradigm of scholarly information representation</article-title>
          ,
          <source>in: The Semantic Web - 22nd International Conference, ESWC 2025</source>
          , Springer,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ehrmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hamdi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. L.</given-names>
            <surname>Pontes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Romanello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Doucet</surname>
          </string-name>
          ,
          <article-title>Named entity recognition and classification in historical documents: A survey</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>56</volume>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>47</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pre-training of Deep Bidirectional Transformers for Language Understanding, North American Chapter of the Association for Computational Linguistics (NAACL) (</article-title>
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Narasimhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Salimans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. Sutskever</given-names>
            ,
            <surname>Improving Language Understanding by Generative</surname>
          </string-name>
          Pre-Training,
          <source>Preprint</source>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>C.</given-names>
            <surname>Rafel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Narang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Matena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <article-title>Exploring the Limits of Transfer Learning with a Unified Text-to-</article-title>
          <string-name>
            <surname>Text</surname>
            <given-names>Transformer</given-names>
          </string-name>
          , arXiv (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Luan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ostendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hajishirzi</surname>
          </string-name>
          ,
          <string-name>
            <surname>Multi-Task Identification</surname>
          </string-name>
          of Entities, Relations, and
          <article-title>Coreference for Scientific Knowledge Graph Construction, Empirical Methods in Natural Language Processing</article-title>
          (EMNLP) (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>