<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>On Constructing Biomedical Text-to-Graph Systems with Large Language Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lorenzo Bertolini</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roel Hulsman</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sergio Consoli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Antonio Puertas-Gallardo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mario Ceresa</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>European Commission, Joint Research Centre (JRC)</institution>
          ,
          <addr-line>Ispra</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Knowledge graphs and ontologies represent symbolic and factual information that can ofer structured and interpretable knowledge. Extracting and manipulating this type of information is a crucial step in complex processes such as human reasoning. While Large Language Models (LLMs) are known to be useful for extracting and enriching knowledge graphs and ontologies, previous work has largely focused on comparing architecture-specific models (e.g. encoder-decoder only) across benchmarks from similar domains. In this work, we provide a large-scale comparison of the performance of certain LLM features (e.g. model architecture and size) and task learning methods (fine-tuning vs. in-context learning (iCL)) on text-to-graph benchmarks in the biomedical domain. Our experiment suggests that, while a simple truncation-based heuristic can notably boost the performance of decoder-only models used with iCL, small fine-tuned encoder-decoder models produce the most stable and strong performance. Moreover, we found that a massive out-of-domain text-graph pre-training has a positive impact on fine-tuned models, while we observed only a marginal impact of pre-training and size for decoder-only iCL models.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Acquiring structured knowledge from text is a fundamental step in a complex process like
reasoning and answering questions, whether such a process is carried out by a human or
an artificial intelligence (AI) system [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In natural language processing (NLP), structured
knowledge is often handled via ontologies or knowledge graphs [
        <xref ref-type="bibr" rid="ref2 ref3 ref4">2, 3, 4</xref>
        ]. Knowledge graphs
are typically organised as collections of [(head # relation # tail)] triplets, such as
[(dog # isA # animal)], or [(Rome # CapitalOf # Italy)]. Knowledge graphs and
ontologies play a pivotal role in representing knowledge across various domains, facilitating
intelligent applications such as chatbots [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], recommendation systems [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] question answering
systems [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ] and more [
        <xref ref-type="bibr" rid="ref1 ref9">9, 1</xref>
        ].
      </p>
      <p>
        Knowledge graphs have seen a surge in their application in recent years [
        <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
        ]. However,
building them can be laborious and costly [
        <xref ref-type="bibr" rid="ref4 ref8">8, 4</xref>
        ]. This has led to the development of numerous
methods aimed at auto-generation of these graphs from text sources in various fields [
        <xref ref-type="bibr" rid="ref11 ref12 ref4 ref9">12, 9, 11, 4</xref>
        ].
Until recently, extracting and manipulating knowledge graphs and other forms of graphs has
been largely dealt with by small knowledge graph embedding models (KGEs) [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], which are
lightweight but limited in capabilities, or diferent types of graph neural networks (GNNs)
[
        <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
        ], such as convolutional graph neural networks (CGNNs) [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], or gated attention graph
neural networks (GAT-GNN) [17]. Recently, many of these architectures have been replaced by
transformer-based large language models (LLMs) [18], which have shown great potential in
modelling graph-based data.
      </p>
      <p>
        Despite these advancements, current techniques still sufer from significant limitations
concerning accuracy, completeness, privacy, bias, and scalability [
        <xref ref-type="bibr" rid="ref4">19, 20, 4</xref>
        ]. Therefore, generating a
large-scale knowledge graph automatically from text corpora remains an open challenge [
        <xref ref-type="bibr" rid="ref3 ref4 ref9">3, 9, 4</xref>
        ].
As shown by a consistent body of evidence [21, 22, 23], LLMs can be adapted to both extract
knowledge graphs from a reference text (text-to-graph task), as well as to convert knowledge
graphs into natural language while maintaining the semantic meaning (graph-to-text task). We
are interested in the former.
      </p>
      <p>
        To adapt an LLM to a particular task, two popular task learning methods are fine-tuning and
in-context learning (iCL) [24]. Given a training dataset pertaining to the new task at hand,
ifne-tuning an LLM amounts to an additional training phase to update a subset of learnable
model parameters to adapt to the new task. In-context learning, on the other hand, consists of
including a few task examples in the model prompt at inference time - a special case of few-shot
learning. Typically, iCL provides weaker performance than fine-tuning and is computationally
more expensive at inference time [25, 24], yet it is highly flexible as it does not require any
parameter updates. Both options involve a vast amount of design choices, from the quality and
quantity of available training data to the amount of in-context examples to include in iCL.
While most work on knowledge graph extraction has focused on pushing the state-of-the-art
in terms of performance [26, 27] or summarising the field in terms of diferent applications
[21, 22, 23] and formulations of scenarios and tasks [
        <xref ref-type="bibr" rid="ref17 ref18">28, 29, 30</xref>
        ], it remains unclear to the general
AI practitioner what would be, given a specific dataset and computational resources, the best
solution to approach a text-to-graph task, formulated as an end-to-end LLM-based solution.
This work is directed to the general AI practitioner in the biomedical domain aiming to develop
an end-to-end LLM-based knowledge graph extraction system from textual sources. We
investigate how to best approach such task by examining various combinations of model design
choices, assuming a fixed and accessible computational resource of a single RTX 8000 GPU.
The main variables under investigation are model architecture (encode-decoder, decoder-only),
model family (T5, BART, Mistral-v0.1, Llama-2), model size (small (60M) to mid (13B learnable
parameters)), task learning method (fine-tuning, iCL) and additional pre-training data
(relation extraction data, conversation data, instruction data, (bio)medical data). In brief, the main
insights of this paper encompass the following:
1. We provide tentative evidence that biomedical knowledge graphs can be hard to model.
      </p>
      <p>Mid-sized decoder-only models adopting iCL show weak performance, while performance
of small fine-tuned encoder-decoder models is robust compared to the general domain.
2. For small fine-tuned encoder-decoder models we observe power-law scaling in model size,
while for mid-sized decoder-only models adopting iCL we instead observe power-law
scaling in the number of in-context examples. This is in line with known results [24].
3. Only additional pre-training data on relation extraction tasks boosts model performance,
while neither observing conversation data, instruction data nor (bio)medical data during
pre-training makes a notable diference.
4. We propose and experimentally prove the efectiveness of a simple truncation-based
heuristic on model output to control for a specific type of hallucination of in-context
learning, avoiding expensive prompt tuning and prompt design.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Material and Methods</title>
      <p>
        Knowledge graph structure To ensure a stable and fair comparison across models with
diferent pre-training, we pre-process the selected dataset to match the following linearised
text-graph structure. Formally, a dataset consists of two sets of strings  and , where each
reference text  ∈  and knowledge graph  ∈  are assumed to be identical representations
semantically, but difer syntactically. For example, given a reference text “The pencil is on the
table.”, we represent the corresponding knowledge graph as containing one linearised triplet
“[(pencil # IsOn # table)]”. In the coming paragraphs, we present a detailed example
of the proposed linearisation, in the context of the prompt used for in-context-learning set-up.
Dataset We use BioEvent [
        <xref ref-type="bibr" rid="ref18">30</xref>
        ], a benchmark that aggregates 10 popular biomedical datasets,
and adopt a simple strategy to clean up the data by removing any duplicate pairs and breaking
ties in favour of the text-graph pair pertaining to the longest linearised knowledge graph,
assuming that the longest knowledge graph is the most complete description of the entities and
relations described, and finally obtain a train/validation/test set using an 80/10/10% split.
Metrics We evaluate performance with Rouge scores [
        <xref ref-type="bibr" rid="ref19">31</xref>
        ], namely Rouge- ( = 1, 2) and
Rouge-L. The former is based on -grams, while the latter on the longest common sub-sequence
(LCS) between two strings, as implemented by Hugging Face evaluate library.
Models We adopt models from two families of pre-trained encoder-decoder: T5 and BART.
We adopt three sizes in the T5 family, namely 60.5M parameters (t5-small), 223 M (t5-base)
and 738 M (t5-large). As for BART, we use of the BART model (bart-large) introduced in
[
        <xref ref-type="bibr" rid="ref20">32</xref>
        ], as well as a version from [
        <xref ref-type="bibr" rid="ref21">33</xref>
        ], tuned on REBEL dataset [
        <xref ref-type="bibr" rid="ref22">34</xref>
        ], a large relation extraction
dataset designed for text-to-text modelling.
      </p>
      <p>
        We then opt for two sets of pre-trained decoder-only LLMs: Mistral-v0.1 and Llama-2. We
include two sizes, one with 7B parameters (Llama-2-chat-7b-hf) and the largest model in
our analysis with 13B parameters (Llama-2-chat-13b-hf), as well as a Llama-2 model
finetuned on biomedical knowledge and question-answering (meditron-7b) [
        <xref ref-type="bibr" rid="ref23">35</xref>
        ], to investigate
the beneficial efect of domain-specific training in the biomedical domain.
      </p>
      <p>
        As for the Mistral family, we adopt three models. The original 7B model (Mistral-v0.1), a
version fine-tuned on a variety of open-source conversation datasets ( Mistral-Instruct-v0.1),
and finally a version fine-tuned by OpenOrca ( Mistral-OpenOrca) [
        <xref ref-type="bibr" rid="ref24">36</xref>
        ] on a reproduction
attempt of the Orca dataset [
        <xref ref-type="bibr" rid="ref25">37</xref>
        ], leveraging the Flan Collection for efective instruction-tuning
[
        <xref ref-type="bibr" rid="ref26">38</xref>
        ]. Importantly, all models adopted in this work are fully open-source and accessible through
Hugging Face by adopting the transformer library [
        <xref ref-type="bibr" rid="ref27">39</xref>
        ].
      </p>
      <p>
        Learning methods We adopt two distinct task learning methods, fine-tuning for the smaller
encoder-decoder models and iCL [24] for the larger decoder-only models. All fine-tuning
experiments are based on the trainer class implementation from Hugging Face. Given a
text-graph pair (, ) in the pre-defined training set, each model undergoes an additional
ifne-tuning phase where it is trained to generate the graph  as output, using the text  as
input. All models are tuned end-to-end for up to ten epochs, selecting the best model based on
the validation Rouge-1 score, as per standard practice in NLP and knowledge graph literature
[
        <xref ref-type="bibr" rid="ref28 ref29">40, 41</xref>
        ]. For training, hyper-parameters are as given in Table 2 of Appendix A.
For the iCL setting, each pre-trained model is queried with a simple prompt, containing a set
of  solved text-graph examples taken from the available training set. To limit the impact of
selecting a set of poor examples, we sample  examples randomly from the training set for each
test instance at inference time. Moreover, we omit time and computation-consuming prompt
engineering and computationally expensive prompt tuning to resemble common practice of
endusers. However, we highlight the importance of such practice to prevent model hallucinations,
and, more generally, to prevent spurious features in prompt design along the lines of [
        <xref ref-type="bibr" rid="ref30">42</xref>
        ]. To
provide a fair estimate of iCL performance, we introduce a simple post-hoc hallucination-control
heuristic to determine the end of the desired structured output (i.e. the end of a knowledge
graph). Simply put, we truncate model output at the appearance of the tokens “)]”, signalling
the end of a knowledge graph in our graph structure. An example of the finalised iCL prompt
(with  = 2) is presented in Figure 1.
      </p>
      <p>Experimental setup The main experiment is designed to unveil the approximate overall
power of selected models and task learning methods, as well as to understand what impacts
and shapes their performance. The models’ algorithms have been implemented in Python
version 3.8.5, and all the computations run on a single RTX 8000 GPU within a AMD EPYC 7282
16-Core 64-bit microprocessor at 1.50GHz with 512GB RAM. We do recognise that assuming
larger computational power could significantly improve results, especially by including large
decoder-only models or by fine-tuning the mid-sized Mistral-v0.1 and Llama-2 families, which
is out of the computational reach of the current setup.</p>
      <p>k
s
a
T
t
x
e
t
n
o
C
t
x
e
T
y
r
e
u
Q</p>
      <sec id="sec-2-1">
        <title>Convert the text into a sequence of triplets:</title>
      </sec>
      <sec id="sec-2-2">
        <title>Text: Further investigation using inhibition or genetic deletion of Erbb2 in vitro revealed reduced Cdc25a levels and increased S-phase arrest in UV-irradiated cells lacking Erbb2 activity.</title>
      </sec>
      <sec id="sec-2-3">
        <title>Graph: [(reduced # Theme # Cdc25a) | (reduced # Cause # genetic deletion) | (genetic deletion # Theme # Erbb2)]</title>
      </sec>
      <sec id="sec-2-4">
        <title>Text: In this study, we showed that iNOS was ubiquitinated and degraded dependent on CHIP (COOH terminus of heat shock protein 70-interacting protein), a chaperone-dependent ubiquitin ligase.</title>
      </sec>
      <sec id="sec-2-5">
        <title>Graph: [(dependent # Theme # ubiquitinated) | (ubiquitinated # Theme # iNOS) | (dependent # Cause # CHIP)]</title>
      </sec>
      <sec id="sec-2-6">
        <title>Text: Such activity was abolished in mechanically stimulated mouse MRTF-A(-/-) cells or upon inhibition of CREB-binding protein (CBP)</title>
      </sec>
      <sec id="sec-2-7">
        <title>Graph:</title>
        <p>
          The main goal of the experiment is to understand how LLM characteristics and task-learning
methods perform in our text-to-graph task, under fixed computational resources. Throughout,
we aim to guide the general AI practitioner to understand which combination is most suited
for such a task and to showcase how to navigate (part of) the vast and complex spectrum of
model design choices. Given the fixed computational resources, we fine-tune the previously
introduced set of smaller encoder-decoder models and compare performance to the set of larger
decoder-only models in combination with iCL. This choice is framed in the context of a given
computational resource such that fine-tuning is computationally infeasible for larger models.
At the same time, the short context window of the T5 and BART families (1k tokens or below)
proves iCL unsuitable. Following [
          <xref ref-type="bibr" rid="ref31">43</xref>
          ], we adopt  = 8 for the amount of in-context examples.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Results</title>
      <p>
        The overall results of various combinations of model architecture, family, size, relevant
pretraining data and task learning method are shown in Table 1. First, we can observe a clear
benefit in fine-tuning smaller encoder-decoder models. We hypothesise this relates to issues
regarding benchmark quality. Bio Event presents a high amount of unique entities and triplets,
creating a complex distribution of patterns in reference texts that is dificult to infer correctly
from just 8 in-context examples. Overall the best performance across metrics is reached with
ifne-tuned decoder-only models, i.e. the largest model in the T5 family.
Moreover, within both the T5 and Llama-2 family, we find a clear positive correlation between
model size and performance. This is in line with the well-documented phenomenon of
powerlaw scaling of LLM performance in the number of model parameters [
        <xref ref-type="bibr" rid="ref32">44</xref>
        ]. Focusing on the
BART family, we see that adopting an additional relation extraction dataset during pre-training
(REBEL) yields universally superior results. This is in sharp contrast to other pre-training
additions, since neither conversation data nor instruction, OpenOrca or Meditron datasets seem
to afect performance on either benchmark. We hypothesise none are particularly relevant to
our text-to-graph task, although this is notably most surprising for the biomedical knowledge
in the Meditron pre-training data.
      </p>
      <p>Table 1 also shows that our hallucination-control heuristic for iCL models yields a large
performance boost, independently of architecture, family, size, or pre-training data. To briefly
reiterate, this was put in place to avoid computationally and experimentally demanding prompt
engineering or tuning, and implemented by truncating model output after tokens signalling
a graph’s end (i.e., “)]”). The jump in performance can reach more than 20 points, and is
consistent across all Rouge scores. Concerning specific metrics, we found Rouge-1 (R-1) scores
to be consistently higher, especially in decoder-only models, indicating a stronger entity and
relation recognition. We also found R-L scores to be systematically above Rouge-2 (R-2) score,
and closer to R-1. This suggests that the identified entities and relations are often in the right
order, but certain entities or relations are missing such that correct 2-grams are lacking.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Discussion</title>
      <p>
        This work is directed at biomedical researchers and practitioners aiming to develop an
end-toend LLM-based automatic graph extraction system from textual sources. Assuming a realistic
computational baseline, our large-scale comparison contributed to the development of a more
efective and eficient pipeline for biomedical knowledge extraction and representation tasks by
highlighting the impact of a plethora of design choices and provided several empirical insights.
Indeed, of-the-shelf LLMs together with a task learning method can achieve strong entity
and relation recognition, and reach moderate yet promising overall results on knowledge
graph completion. The optimal performance of LLMs is likely higher than displayed here,
e.g. due to prompt engineering/tuning, hyper-parameter tuning, more computational power
and more model parameters. Our results indicate that, without fine-tuning, LLMs might not
be directly suitable for biomedical text-to-graph tasks. Fine-tuning has proven more robust
than iCL, since mid-sized decoder-only models adopting iCL show weak performance, while
small fine-tuned encoder-decoder models achieve robust moderate results. We hypothesise that
expert knowledge contained in reference texts in the biomedical domain poses a more dificult
knowledge extraction problem, such that iCL with a small amount of in-context examples is
not suficient to correctly extrapolate said task. That is, knowledge graphs in the biomedical
domain might require knowledge obtained across a large set of examples. However, we provide
strong and consistent evidence for our simple truncation-based heuristic to be highly efective in
boosting model performance without time-expensive prompt engineering and computationally
expensive prompt tuning, which is not necessarily generalisable across subsets of the same task
[
        <xref ref-type="bibr" rid="ref33">45</xref>
        ]. Crucially, this suggests that when the output of a model follows a constrained structure,
simple rule-based heuristics can be an eficient method to limit undesired output.
      </p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>This work examined the ability of LLMs to generate biomedical knowledge graphs from
reference texts, comparing end-to-end fine-tuned encoder-decoder models, against decoder-only
models used with in-context learning (iCL). Our results showed how small fine-tuned
encoderdecoder models consistently outperform mid-sized decoder-only models adopting iCL. We found
evidence that our simple heuristic to control for model hallucination has a consistently positive
impact on the performance of decoder-only models, but no connection between performance
and including additional datasets during pre-training that are not directly linked to the
text-tograph task, such as conversation-tuning, instruction-tuning and biomedical expert knowledge.
On the contrary, we found that including a relation-extracting dataset like REBEL showed a
notable boost in the performance of encoder-decoder models, for which we also observed a
power-law connection between model size and performance.
1007/978-3-030-04648-4\_7.
[17] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, Y. Bengio, Graph Attention
Networks, in: International Conference on Learning Representations, 2018. URL: https:
//openreview.net/forum?id=rJXMpikCZ.
[18] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I.
Polosukhin, Attention is all you need, Advances in Neural Information Processing Systems 30
(2017).
[19] F. Radulovic, N. Mihindukulasooriya, R. García-Castro, A. Gómez-Pérez, A comprehensive
quality model for linked data, Semantic Web 9 (2018). doi:10.3233/SW-170267.
[20] M. R. A. Rashid, G. Rizzo, M. Torchiano, N. Mihindukulasooriya, O. Corcho, R.
GarcíaCastro, Completeness and consistency analysis for evolving knowledge bases, Journal of
Web Semantics 54 (2019). doi:10.1016/j.websem.2018.11.004.
[21] B. Jin, G. Liu, C. Han, M. Jiang, H. Ji, J. Han, Large language models on graphs: A
comprehensive survey, arXiv preprint arXiv:2312.02783 (2023).
[22] J. Liu, C. Yang, Z. Lu, J. Chen, Y. Li, M. Zhang, T. Bai, Y. Fang, L. Sun, P. S. Yu, et al., Towards
graph foundation models: A survey and beyond, arXiv preprint arXiv:2310.11829 (2023).
[23] S. Pan, L. Luo, Y. Wang, C. Chen, J. Wang, X. Wu, Unifying large language models and
knowledge graphs: A roadmap, IEEE Transactions on Knowledge and Data Engineering
(2024). doi:10.1109/TKDE.2024.3352100.
[24] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan,
P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan,
R. Child, A. Ramesh, D. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin,
S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, D. Amodei,
Language models are few-shot learners, Advances in Neural Information
Processing Systems 33 (2020). URL: https://proceedings.neurips.cc/paper_files/paper/2020/file/
1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.
[25] H. Liu, D. Tam, M. Muqeeth, J. Mohta, T. Huang, M. Bansal, C. A. Rafel, Few-shot
parameter-eficient fine-tuning is better and cheaper than in-context learning, Advances
in Neural Information Processing Systems 35 (2022).
[26] Q. Guo, Z. Jin, X. Qiu, W. Zhang, D. Wipf, Z. Zhang, CycleGT: Unsupervised
graph-totext and text-to-graph generation via cycle training, in: T. Castro Ferreira, C. Gardent,
N. Ilinykh, C. van der Lee, S. Mille, D. Moussallem, A. Shimorina (Eds.), Proceedings of
the 3rd International Workshop on Natural Language Generation from the Semantic Web
(WebNLG+), Association for Computational Linguistics, Dublin, Ireland (Virtual), 2020.</p>
      <p>URL: https://aclanthology.org/2020.webnlg-1.8.
[27] Z. Jin, Q. Guo, X. Qiu, Z. Zhang, GenWiki: A dataset of 1.3 million content-sharing
text and graphs for unsupervised graph-to-text generation, in: D. Scott, N. Bel, C. Zong
(Eds.), Proceedings of the 28th International Conference on Computational Linguistics,
International Committee on Computational Linguistics, Barcelona, Spain (Online), 2020.
doi:10.18653/v1/2020.coling-main.217.
[28] L. Wang, Y. Li, O. Aslan, O. Vinyals, WikiGraphs: A Wikipedia text - knowledge graph
paired dataset, in: A. Panchenko, F. D. Malliaros, V. Logacheva, A. Jana, D. Ustalov,
P. Jansen (Eds.), Proceedings of the Fifteenth Workshop on Graph-Based Methods for
Natural Language Processing (TextGraphs-15), Association for Computational Linguistics,</p>
    </sec>
    <sec id="sec-6">
      <title>A. Fine-tuning hyper-parameters</title>
      <p>Hyperparameters set, with their respective values adopted for our experiments with the
encoderdecoder models.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Tiwari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ortíz-Rodriguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. B.</given-names>
            <surname>Abbés</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. U.</given-names>
            <surname>Usip</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hantach</surname>
          </string-name>
          ,
          <string-name>
            <surname>Semantic</surname>
            <given-names>AI</given-names>
          </string-name>
          <source>in Knowledge Graphs</source>
          , Taylor &amp; Francis, Boca Raton,
          <string-name>
            <surname>US</surname>
          </string-name>
          ,
          <year>2023</year>
          . doi:
          <volume>10</volume>
          .1201/9781003313267.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>H.</given-names>
            <surname>Paulheim</surname>
          </string-name>
          ,
          <article-title>Knowledge graph refinement: A survey of approaches and evaluation methods</article-title>
          ,
          <source>Semantic Web</source>
          <volume>8</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          , E. Blomqvist,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cochez</surname>
          </string-name>
          ,
          <string-name>
            <surname>C. D'Amato</surname>
            ,
            <given-names>G. D.</given-names>
          </string-name>
          <string-name>
            <surname>Melo</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Gutierrez</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Kirrane</surname>
            ,
            <given-names>J. E. L.</given-names>
          </string-name>
          <string-name>
            <surname>Gayo</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Navigli</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Neumaier</surname>
            ,
            <given-names>A.-C. N.</given-names>
          </string-name>
          <string-name>
            <surname>Ngomo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Polleres</surname>
            ,
            <given-names>S. M.</given-names>
          </string-name>
          <string-name>
            <surname>Rashid</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Rula</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Schmelzeisen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Sequeda</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Staab</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Zimmermann</surname>
          </string-name>
          ,
          <article-title>Knowledge graphs</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>54</volume>
          (
          <year>2021</year>
          ). doi:
          <volume>10</volume>
          .1145/3447772.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Naseriparsa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Osborne</surname>
          </string-name>
          ,
          <source>Knowledge Graphs: Opportunities and Challenges, Artificial Intelligence Review</source>
          <volume>56</volume>
          (
          <year>2023</year>
          ).
          <source>doi: 10.1007/s10462-023-10465-9.</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ait-Mlouk</surname>
          </string-name>
          , L. Jiang,
          <article-title>KBot: A Knowledge Graph Based ChatBot for Natural Language Understanding over Linked Data, IEEE Access 8 (</article-title>
          <year>2020</year>
          ). doi:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2020</year>
          .
          <volume>3016142</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Muthukrishnan</surname>
          </string-name>
          , G. De Melo,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <article-title>Reinforcement Knowledge Graph Reasoning for Explainable Recommendation, Association for Computing Machinery</article-title>
          , New York, NY, USA,
          <year>2019</year>
          . doi:
          <volume>10</volume>
          .1145/3331184.3331203.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>X.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Knowledge Graph Embedding Based Question Answering, Association for Computing Machinery</article-title>
          , New York, NY, USA,
          <year>2019</year>
          . doi:
          <volume>10</volume>
          .1145/3289600. 3290956.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kejriwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sequeda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Lopez</surname>
          </string-name>
          ,
          <article-title>Knowledge Graphs: Construction, Management and Querying</article-title>
          ,
          <source>Semantic Web</source>
          <volume>10</volume>
          (
          <year>2019</year>
          ). doi:
          <volume>10</volume>
          .3233/SW-190370.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kejriwal</surname>
          </string-name>
          ,
          <article-title>Knowledge Graphs: A Practical Review of the Research Landscape</article-title>
          , Information
          <volume>13</volume>
          (
          <year>2022</year>
          ). doi:
          <volume>10</volume>
          .3390/info13040161.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xiang</surname>
          </string-name>
          ,
          <string-name>
            <surname>A Review</surname>
          </string-name>
          :
          <article-title>Knowledge Reasoning Over Knowledge Graph</article-title>
          ,
          <source>Expert Systems with Applications</source>
          <volume>141</volume>
          (
          <year>2020</year>
          ). doi:
          <volume>10</volume>
          .1016/j.eswa.
          <year>2019</year>
          .
          <volume>112948</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pan</surname>
          </string-name>
          , E. Cambria,
          <string-name>
            <given-names>P.</given-names>
            <surname>Marttinen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. S.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          <article-title>Survey on Knowledge Graphs: Representation, Acquisition, and Applications</article-title>
          ,
          <source>IEEE Transactions on Neural Networks and Learning Systems</source>
          <volume>33</volume>
          (
          <year>2022</year>
          ). doi:
          <volume>10</volume>
          .1109/TNNLS.
          <year>2021</year>
          .
          <volume>3070843</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Duan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <article-title>Knowledge graph construction techniques</article-title>
          ,
          <source>Journal of Computer Research and Development</source>
          <volume>53</volume>
          (
          <year>2016</year>
          ). doi:
          <volume>10</volume>
          .7544/issn1000-
          <fpage>1239</fpage>
          .
          <year>2016</year>
          .
          <volume>20148228</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Mao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <article-title>Knowledge Graph Embedding: A Survey of Approaches and Applications</article-title>
          ,
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          <volume>29</volume>
          (
          <year>2017</year>
          ). doi:
          <volume>10</volume>
          .1109/TKDE.
          <year>2017</year>
          .
          <volume>2754499</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. J.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. O.</given-names>
            <surname>Sing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>A comprehensive survey of graph neural networks for knowledge graphs</article-title>
          ,
          <source>IEEE Access 10</source>
          (
          <year>2022</year>
          ). doi:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2022</year>
          .
          <volume>3191784</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>L.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Long</surname>
          </string-name>
          ,
          <source>Graph Neural Networks for Natural Language Processing: A Survey, Foundations and Trends in Machine Learning</source>
          <volume>16</volume>
          (
          <year>2023</year>
          ). doi:
          <volume>10</volume>
          .1561/2200000096.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Tong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Maciejewski</surname>
          </string-name>
          , Graph Convolutional Networks: Algorithms, Applications and Open Challenges, Springer International Publishing, Cham,
          <year>2018</year>
          . doi:10.
          <string-name>
            <surname>Mexico</surname>
            <given-names>City</given-names>
          </string-name>
          , Mexico,
          <year>2021</year>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2021</year>
          .textgraphs-
          <volume>1</volume>
          .7.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>A.</given-names>
            <surname>Colas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sadeghian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Eventnarrative: A large-scale event-centric dataset for knowledge graph-to-text generation</article-title>
          ,
          <source>in: Thirty-fifth Conference on Neural Information Processing (NeurIPS 2021) Track on Datasets and Benchmarks</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>G.</given-names>
            <surname>Frisoni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Moro</surname>
          </string-name>
          , L. Balzani,
          <article-title>Text-to-text extraction and verbalization of biomedical event graphs</article-title>
          ,
          <source>in: Proceedings of the 29th International Conference on Computational Linguistics</source>
          ,
          <source>International Committee on Computational Linguistics</source>
          , Gyeongju, Republic of Korea,
          <year>2022</year>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .coling-
          <volume>1</volume>
          .
          <fpage>238</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [31]
          <string-name>
            <surname>C.-Y. Lin</surname>
            ,
            <given-names>ROUGE:</given-names>
          </string-name>
          <article-title>A package for automatic evaluation of summaries, in: Text Summarization Branches Out, Association for Computational Linguistics</article-title>
          , Barcelona, Spain,
          <year>2004</year>
          . URL: https://www.aclweb.org/anthology/W04-1013.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ghazvininejad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mohamed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          , L. Zettlemoyer, BART:
          <article-title>Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension</article-title>
          , in: D.
          <string-name>
            <surname>Jurafsky</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Chai</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Schluter</surname>
          </string-name>
          , J. Tetreault (Eds.),
          <source>Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</source>
          ,
          <year>2020</year>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .acl-main.
          <volume>703</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>G.</given-names>
            <surname>Rossiello</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. F. M. Chowdhury</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Mihindukulasooriya</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Cornec</surname>
            ,
            <given-names>A. M.</given-names>
          </string-name>
          <string-name>
            <surname>Gliozzo</surname>
          </string-name>
          ,
          <article-title>Knowgl: Knowledge generation and linking from text</article-title>
          ,
          <source>in: The Thirty-Seventh AAAI Conference on Artificial Intelligence</source>
          , AAAI Press,
          <year>2023</year>
          , pp.
          <fpage>16476</fpage>
          -
          <lpage>16478</lpage>
          . doi:
          <volume>10</volume>
          .1609/ aaai.v37i13.
          <fpage>27084</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [34]
          <string-name>
            <surname>P.-L. Huguet</surname>
            <given-names>Cabot</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Navigli</surname>
          </string-name>
          , REBEL:
          <article-title>Relation extraction by end-to-end language generation, in: Findings of the Association for Computational Linguistics: EMNLP 2021, Association for Computational Linguistics</article-title>
          , Punta Cana, Dominican Republic,
          <year>2021</year>
          . URL: https://aclanthology.org/
          <year>2021</year>
          .findings-emnlp.
          <volume>204</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. H.</given-names>
            <surname>Cano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Romanou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bonnet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Matoba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Salvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pagliardini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Köpf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mohtashami</surname>
          </string-name>
          , et al., Meditron-70b:
          <article-title>Scaling medical pretraining for large language models</article-title>
          ,
          <source>arXiv preprint arXiv:2311.16079</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>W.</given-names>
            <surname>Lian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Goodson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Pentland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cook</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Vong</surname>
          </string-name>
          ,
          <article-title>"Teknium", MistralOrca: Mistral-7B Model Instruct-tuned on Filtered OpenOrcaV1 GPT-4 Dataset, HuggingFace repository (</article-title>
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>S.</given-names>
            <surname>Mukherjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mitra</surname>
          </string-name>
          , G. Jawahar,
          <string-name>
            <given-names>S.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Palangi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Awadallah</surname>
          </string-name>
          , Orca:
          <article-title>Progressive learning from complex explanation traces of gpt-4</article-title>
          , arXiv preprint arXiv:
          <volume>2306</volume>
          .02707 (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>S.</given-names>
            <surname>Longpre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Hou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Vu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Webson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. W.</given-names>
            <surname>Chung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zoph</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wei</surname>
          </string-name>
          , et al.,
          <article-title>The flan collection: Designing data and methods for efective instruction tuning</article-title>
          ,
          <source>arXiv preprint arXiv:2301.13688</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>T.</given-names>
            <surname>Wolf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Debut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sanh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chaumond</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Delangue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Moi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cistac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rault</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Louf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Funtowicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Davison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shleifer</surname>
          </string-name>
          , P. von Platen, C. Ma,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jernite</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Plu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. Le</given-names>
            <surname>Scao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gugger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Drame</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Lhoest</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rush</surname>
          </string-name>
          , Transformers:
          <article-title>State-of-the-art natural language processing</article-title>
          , in: Q. Liu, D. Schlangen (Eds.),
          <source>Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Association for Computational Linguistics</source>
          ,
          <year>2020</year>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .emnlp-demos.
          <volume>6</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>I.</given-names>
            <surname>Balazevic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Allen</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.</surname>
          </string-name>
          <article-title>Hospedales, Multi-relational poincaré graph embeddings</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>32</volume>
          (
          <year>2019</year>
          ). URL: https://proceedings. neurips.cc/paper_files/paper/2019/file/f8b932c70d0b2e6bf071729a4fa68dfc-Paper.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>I.</given-names>
            <surname>Chami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wolf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.-C.</given-names>
            <surname>Juan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Sala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ravi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ré</surname>
          </string-name>
          ,
          <article-title>Low-dimensional hyperbolic knowledge graph embeddings</article-title>
          , in: D.
          <string-name>
            <surname>Jurafsky</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Chai</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Schluter</surname>
          </string-name>
          , J. Tetreault (Eds.),
          <source>Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</source>
          ,
          <year>2020</year>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .acl-main.
          <volume>617</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sclar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tsvetkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Suhr</surname>
          </string-name>
          ,
          <article-title>Quantifying language models' sensitivity to spurious features in prompt design or: How i learned to start worrying about prompt formatting</article-title>
          ,
          <source>arXiv preprint arXiv:2310.11324</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [43]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schuurmans</surname>
          </string-name>
          , M. Bosma, b. ichter,
          <string-name>
            <given-names>F.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Chi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Chain-of-thought prompting elicits reasoning in large language models</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>35</volume>
          (
          <year>2022</year>
          ). URL: https://proceedings.neurips.cc/ paper_files/paper/2022/file/9d5609613524ecf4f15af0f7b31abca4-Paper-Conference.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [44]
          <string-name>
            <given-names>J.</given-names>
            <surname>Hestness</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Narang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ardalani</surname>
          </string-name>
          , G. Diamos,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Kianinejad</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. A. Patwary</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Deep learning scaling is predictable, empirically</article-title>
          ,
          <source>arXiv preprint arXiv:1712.00409</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [45]
          <string-name>
            <given-names>L.</given-names>
            <surname>Bertolini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Weeds</surname>
          </string-name>
          , D. Weir,
          <article-title>Testing large language models on compositionality and inference with phrase-level adjective-noun entailment</article-title>
          , in: N.
          <string-name>
            <surname>Calzolari</surname>
            ,
            <given-names>C.-R.</given-names>
          </string-name>
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Pustejovsky</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Wanner</surname>
          </string-name>
          , K.-S. Choi,
          <string-name>
            <surname>P.-M. Ryu</surname>
          </string-name>
          , H.
          <string-name>
            <surname>-H. Chen</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Donatelli</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Ji</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Kurohashi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Paggio</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Xue</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Hahm</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>He</surname>
            ,
            <given-names>T. K.</given-names>
          </string-name>
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Santus</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Bond</surname>
          </string-name>
          , S.-H. Na (Eds.),
          <source>Proceedings of the 29th International Conference on Computational Linguistics</source>
          ,
          <source>International Committee on Computational Linguistics</source>
          , Gyeongju, Republic of Korea,
          <year>2022</year>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .coling-
          <volume>1</volume>
          .
          <fpage>359</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>