K-Flares: A K-Adapter Based Approach for the FLARES
                         Challenge
                         Joel Pardo1,*,† , Jiayun Liu1,† , Virginia Ramón-Ferrer1,† , Elvira Amador-Domínguez1,2,† and
                         Pablo Calleja1,†
                         1
                           Ontology Engineering Group, Departamento de Inteligencia Artificial, ETSI Informáticos, Universidad Politécnica de Madrid,
                         28660 Boadilla del Monte, Madrid, Spain
                         2
                           Departamento de Sistemas Informáticos, ETSI Sistemas Informáticos, Universidad Politécnica de Madrid, 28031 Madrid, Spain


                                     Abstract
                                     We refer as 5W1H method to the technique used to extract comprehensive information from textual data answering
                                     the questions Who, What, When, Where, Why, and How. Identifying these instances within a text can provide a
                                     structured approach to better understand the details presented in a text, which can help to assess the reliability
                                     of the information displayed there. In this context, we present K-Flares, an adaptation of the original work of
                                     K-Adapter for 5W1H Instances Recognition in the context of the challenge of FLARES 2024. In this challenge the
                                     aim is to detect 5W1H instances from text and assess the reliability of the information displayed in each of them.
                                     We focus on the first part of the task, the detection of 5W1H instances.

                                     Keywords
                                     K-adapter, Covid-19, News, NER


                         1. Introduction
                         With the increasing use of the Internet, there has been a growing emergence of news providers that
                         deliver information to society. However, this growth in information sources has also raised questions
                         about the reliability of the information being disseminated. Many of the current online sources lack
                         fact-checking and editorial standards that traditional media, such as newspapers or television news
                         broadcasts, used to maintain, leading to the growing spread of non-factual information and biased
                         reporting. The amount of information continuously generated prevents the correct manual monitoring
                         of the veracity of each and every piece of information. This has led to a growing effort to research new
                         approaches to be able to monitor the reliability of this type of information without human supervision
                         and its derived tasks. An example of a task related to said line of research is the detection of fake news,
                         where, among other approaches, the use of Natural Language Processing (NLP) techniques, such as
                         Global Vectors (GloVe) [1], has been repeatedly combined with Deep Learning (DL) methods, such as
                         Long Short-Term Memory networks (LSTMs) [2], to tackle said problem [3].
                            In this context, the 5W1H approach poses an interesting choice for text reliability detection. The
                         5W1H method is a technique used to extract comprehensive information from textual data answering
                         the questions Who, What, When, Where, Why, and How. The identification of these key elements
                         within a text can provide a structured approach to understand better the details presented in said text,
                         which can help in the assessment of the reliability of the information displayed. For example, the
                         identification of the source of a statement presented, that is, a "who" in a text, can greatly impact the
                         human perception of the credibility of the information relied upon. If a statement is reported to come
                         from a specific commonly trusted source, such as "The Organisation for Economic Co-operation and
                         Development (OECD) has reported...", for example, it will likely be more believable. On the other hand,
                         IberLEF 2024, September 2024, Valladolid, Spain
                         *
                           Corresponding author.
                         †
                           These authors contributed equally.
                         $ joel.pardof@upm.es (J. Pardo); jiayun.liu@upm.es (J. Liu); virginia.ramon@upm.es (V. Ramón-Ferrer);
                         elvira.amador@upm.es (E. Amador-Domínguez); p.calleja@upm.es (P. Calleja)
                          0000-0001-8064-0128 (J. Pardo); 0009-0008-0740-4936 (J. Liu); 0009-0009-7197-1676 (V. Ramón-Ferrer);
                         0000-0001-6838-1266 (E. Amador-Domínguez); 0000-0001-8423-8240 (P. Calleja)
                                  © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
if the source has a more vague identity, such as "Experts have reported...", it can lead to a distrust in the
information originating from the lack of a precise information source.
    The mistrust originating from the lack of precision in the information can be present in all the 5W1H
elements in a text, which motivated the introduction of FLARES 2024 (Fine-Grained Language-based
Reliability Detection in Spanish News) [4], a shared task introduced in IberLEF 2024 [5], whose goal is
to assess the reliability of the language used in Spanish news writing, basing their efforts on the 5W1H
method. They divide the main task into two subtasks: Subtask 1, where, when provided with a text, the
goal is to find and annotate all the instances of 5W1H, and Subtask 2, where, for each 5W1H element
detected in the previous subtask, the goal is to determine if the language used in each item is "reliable",
"semi-reliable" or "unreliable" ("confiable", "semiconfiable" or "no confiable" in Spanish). To tackle these
tasks, a manually annotated dataset consisting of 190 news items is provided, containing over 9,000
5W1H annotations. Annotations consist on the 5W1H of each piece of news, as well as the reliability
label of each text snippet, on each case. A single sample has none, one, or multiple instances of the
same 5W1H label.
    When analysing the FLARES dataset, we find the difficulty of not having a clear syntactical consistency
in the labelling of the text. For example, we find that some annotations refer to nominal chunks, while
others do not. We also have that nominal chunks that are usually the direct object of the verb have
different annotation labels among themselves. These inconsistencies make it difficult to tackle this
problem using more traditional approaches, such as dependency parsing. On the other hand, we do
find a regular interconnection in the 5W1H instances in each individual text, which we can exploit to
generate a graph-like structure that can be used to inject information to the model chosen to tackle the
task.
    In this work, we present the KFloeg team submission for Subtask 1 in FLARES 2024. Our proposal
consists of a modification of the original K-Adapter [6] for the 5W1H task approaching it as a Named
Entity Recognition (NER) task. The adapters used to enrich the fine-tuning process are based on the
Wikidata relations of the entities and relations between labels (5W1H) inside each text from the same
training corpus.
    For Subtask 2, we conducted a simple fine-tuning of the Mistral-7B [7] model. Our approach involved
using an instruction prompt adapted to the specific task and fine-tuning this model using its causal
language model architecture to generate the expected output in an easy-to-process format. However, due
to time constraints and poor results (precision lower than 0.30), we did not pursue further improvements
or explore new approaches. Our primary efforts and contributions were concentrated on Subtask 1.
    Experimental results show that our approach obtains the best results using just one adapter and a
low number of epochs to avoid overfitting. Moreover, our proposal ranked 1st in the first task of the
challenge.
    This work is structured as follows: Section 2 shows the work related to our approach, followed
by section 3, where we present our approach, the data and its pre-processing, and the experimental
setup used. Finally, in sections 4 and 5 we present the results obtained from our experiments and the
conclusions extracted from them.


2. Related work
Pretrained language models have proven to be a powerful approach for various NLP tasks. However,
language models usually tend to have hallucination problems or lack of factuality. In this regard,
knowledge injection appears to offer a potential solution to this problem by including factual knowledge,
extracted from structured databases, or knowledge graphs, to boost model performance.
   One of the first works in this area is represented by K-BERT [8], which uses a specific domain KG
in the inference process. The knowledge triples coming from the graph are injected into the input
sentence. Another relevant work is K-former[9], which is a variation of the RoBERTa architecture[10],
in which knowledge is included in the network in the final encoder layers of the model. This work was
evaluated in question answering tasks.
   K-adapter [6] is another method that integrates linguistic and factual knowledge into language models
using adapters. These adapters, which are trainable networks inserted in the middle of the transformer
layer, allow the injection of knowledge without modifying the existing parameters of the language
model. These adapters operate independently from each other and can be trained in parallel, providing a
flexible and efficient approach to fuse different types of knowledge into language models. This approach
has been evaluated in the tasks of entity typing, relation extraction, and question answering.
   In this context, K-adapter suits perfectly the main problems identified and faced in this challenge.
First, the training data available is scarce, and there is no possibility to rely on external knowledge to
reinforce the learning process. The concept of adapters perfectly aligns with this purpose: the possibility
of fine-tuning the model for a task including other aspects of the same corpora not reflected directly in
the natural language text. Second, the labels of the 5W1H instances annotated in the corpus contain a
lot of inconsistencies and do not present a clear linguistical common pattern. However, these instances
do show that they are interconnected inside the context of the document (e.g., the WHAT labels with
the WHO labels in document), so we can generate a graph in which elements are interconnected for
the reinforcement of the learning process, which is the specific purpose of adapters.


3. Methodology
Along this section we present the approach we followed to detect 5W1H entities in Spanish texts.
Specifically, we present the dataset used for our task and the modifications made in the original K-
Adapter model in sections 3.1, and 3.2 respectively, followed by the explanation of the data preparation
we followed in section 3.3 and the experimental setup we used in section 3.4.

3.1. FLARES dataset
The FLARES 2024 dataset [4] is used for our experiments. It consists of approximately 9,034 5W1H
annotations across 190 news articles. The dataset is divided into 70% for training (6,934 annotations) and
30% for testing (2,100 annotations). These annotations are based on the 5W1H journalistic technique,
which labels entities in the text corresponding to the questions: WHAT, WHO, WHERE, WHEN, WHY,
and HOW. The annotations were performed by two linguists and one sociologist specialized in NLP
following specific annotation RUN-AS guidelines [11] established for this dataset. Figure 1 depicts an
example of annotated text provided, where the position of each detected 5W1H instance with its 5W1H
label and reliability label (in Spanish) are identified.


Figure 1: Example of FLARES annotation [4].


   The specific data for Subtask 1 is divided in two subsets: training and testing. A trial dataset is also
provided, but since it is duplicated from the training data, it is dismissed. The training set consists of
1,585 samples extracted from news with a total of 6,934 annotations. In Table 2 we can see a big gap in
the number of annotation instances of each 5W1H label: roughly 40% of these annotations are "What"
instances and over 25% are instances of "who", leaving a proportion of about 35% of annotations for the
other four labels. Test data consists of 402 non-annotated texts, which were to be evaluated using the
Kaggle challenge provided by FLARES 20241 [4].


Figure 2: Distribution of 5W1H Labels from the Subtask 1 training Dataset. The bar plot shows the count of
each 5W1H label (Who, What, When, Where, Why, How) across the training data.


3.2. K-Flares model
For our approach2 , we have taken the original K-Adapter [6] implementation3 and adapted it to our
needs, as shown in Figure 3. The first change we made was motivated by the language. The original
K-Adapter was designed to run on English data, not Spanish. Consequently, we first substituted the
pre-trained language model, the original RoBERTa-large [12], with a Spanish fine-tuned version of
said model, specifically RoBERTa-large-bne [13], a fine-tuned RoBERTa-large model pre-trained with
a corpus extracted from the National Library of Spain (Biblioteca Nacional de España)4 with a size
of over 500GB. The second change was related to the adapters used. Originally, the K-Adapter used


Figure 3: Overview of the K-Flares proposal.


two adapters: a factual adapter, which injects factual knowledge from the relationships among entities
in natural language; and a linguistic adapter, which injects linguistic knowledge from dependency
relationships among words in natural language text. In our case, we decided to use only one adapter, a
factual adapter that maintains some syntactical information related to the data represented. We initially
tested the use of two factual adapters, but, on the basis of the results obtained, we decided to only
maintain one of them. Adapters were trained using FLARES 2024 training data, presented in Section 3.1.
This data was transformed from its original state to a triple-like format, that is, relationships of entities,
where we used the WHAT elements of the texts as central nodes. In Section 3.3, we further explain the
data preparation made for this training.
1
  https://kaggle.com/competitions/flares-subtask-1-5w1hs-identification
2
  https://github.com/oeg-upm/k-flares
3
  https://github.com/microsoft/k-adapter
4
  https://www.bne.es/en
   Once the adapter is trained, the model is fine-tuned for the Named Entity Recognition (NER) task.
The fundamental purpose of NER is to identify people, organisations, and locations in text, but it can
also be used to identify other entities. Given that our task is fundamentally the recognition of 5W1H
instances in the text, our goal can be devised as a variation of the classical NER approach in which the
entities detected correspond to the 5W1H labels.

3.3. Data preparation
Triples are needed to train the adapter, which are composed by a subject, a relation, and an object. The
subject is the entity performing the action, the relation represents the action being performed, and the
object is the entity that receives the action. These triplets follow the format: subject-relation-object.
Different approaches can be employed to extract triplets from text depending on what is the goal of the
adapter; in this case, we have trained two different adapters: one based on the relations between the
labels 5W1H inside the document (5W1H labels factual adapter), and the other based on the relations
that the entities have in Wikidata (Wikidata factual adapter).
   For the 5W1H labels factual adapter, we used a rule-based approach in which the action is represented
by the mix of the labels of the subject and the object. The subject always corresponds to the WHAT, as
it appears to be the core of each sentence and its the most common label. If there are two annotations
containing a WHAT, the first is related to the rest, followed by the second becoming the subject and
being related to the rest. If a sample lacks of the WHAT annotation, the subject defaults to the first
presented label and is then related to the remaining labels. For an example, see the ’relation’ field in the
data structure below. Figure 4 shows the graph generated from the sample of Figure 1.


Figure 4: Overview of the triplets extraction output for the 5W1H labels factual adapter.
   For the Wikidata factual adapter, we used the mREBEL (multilingual REBEL) model [14]. This
autoregressive method employs a seq2seq model that, given a text input, directly outputs triplets. It
is trained with around 400 relation types. For both of them, the data is formatted by the following
structure and stored in json files:

    • docid: This field represents a unique identifier for the document from which the data is extracted.
      It is an integer stored as a string, ensuring each document can be distinctly recognized and
      referenced.
    • token: This is a list of strings, where each string represents a token or a word in the document.
      The tokens include words, punctuation marks, and other elements, reflecting the segmentation of
      the text. This list is made by a SpaCy model named es_core_news_lg.
    • relation: This field denotes the type of relationship identified between two segments in the
      document. It is represented by a code (e.g., "R1", "R2") that maps to a specific type of relationship
      between two specific entity types. For example, "R1" refers to the relationship between "WHAT"
      and "HOW" entities, "R23" refers to the relationship between a "WHY" and "WHERE" entities. This
      set of relations corresponds to all possible combinations of the 5W1H labels. This encoding was
      used for the 5W1H labels factual adapter. For the Wikidata factual adapter, we assess the variety
      of relations extracted by the mREBEL model and enumerate them using the same nomenclature
      this model returns (e.g. the subject is labeled as "concept" and the object is labelled as "loc", the
      relationship according to the model would be "country"). Both are contemplated in two different
      json files.
    • subj_start: An integer indicating the starting index of the subject token in the text in letter level.
      This index marks the beginning of the segment identified as the subject in the relationship.
    • subj_end: An integer indicating the ending index of the subject token in the text in letter level.
      This index marks the end of the segment identified as the subject in the relationship.
    • obj_start: An integer indicating the starting index of the object token in the text in letter level.
      This index marks the beginning of the segment identified as the object in the relationship.
    • obj_end: An integer indicating the ending index of the object token in the text in letter level.
      This index marks the end of the segment identified as the object in the relationship.
    • subj_label: A string that labels the subject segment, categorizing it according to its role in the
      relationship. It includes for the 5W1H labels factual adapter the six labels (WHAT, WHO, WHERE,
      WHEN, HOW, WHY) presented in the original data. For the Wikidata adapter we have as many
      labels as the model has output.
    • obj_label: A string that labels the object segment, categorizing it according to its role in the
      relationship. It includes for the 5W1H labels factual adapter the six labels (WHAT, WHO, WHERE,
      WHEN, HOW, WHY) presented in the original data. For the Wikidata adapter we have as many
      labels as the model has output.

   In addition, the dataset is provided in IOB format for the NER task. Each document is identified by a
unique "docid". For each token within a document’s token list, a corresponding label is assigned. These
labels denote whether a token is at the Inside, Outside, or Beginning of a particular span. These nuances
are prefixed accordingly with "I-", "O-", or "B-", respectively.

3.4. Experimental setup
All the experiments were conducted on a remote server, equipped with one NVIDIA A100-PCIE-
40GB GPU. To obtain the final model, we tried a series of configurations combining the two different
adapters, the factual adapter created with the 5W1H labels and the factual adapter created with
Wikidata, to determine the optimal architecture. For each adapter and the final model, specific training
hyperparameters were used to optimize the training process while avoiding overfitting. The following
Table 1 detail the hyperparameters that yielded the best results.
Figure 5: Overview of the preprocessing workflow of the Flares dataset for both adapters.


Table 1
Comparison of hyperparameters across 5W1H labels and Wikidata Factual Adapters, and Complete K-Adapter
configurations.

 Parameter                      5W1H labels adapter         Wikidata adapter         Complete K-Adapter
 epochs                                    5                         10                          8
 model                               roberta-large             Roberta-large                roberta-large
 per_gpu_train_batch_size                 64                         32                          8
 per_gpu_eval_batch_size                  64                         8                           8
 max_seq_length                           64                         64                         512
 scheduler                     WarmupLinearSchedule       WarmupLinearSchedule       WarmupLinearSchedule
 optimizer                             AdamW                      AdamW                       AdamW
 learning_rate                           5e-5                       2e-5                        5e-5
 warmup_steps                            1200                       500                         120
 freeze_adapter                          False                     False                        True
 adapter_size                             768                       768                         768
 adapter_list                          "0,11,22"                  "0,11,22"                   "0,11,22"
 adapter_transformer_layers                2                         2                            -
 fusion_mode                               -                          -                         add


4. Evaluation and Results
Once our model was fine-tuned for the NER (Named Entity Recognition) task, it generated predictions
that identified each token as the appropriate 5WH1 label. To fulfill the submission requirements of the
challenge, it was essential to convert the identified textual fragments into a structured format. This
format required the inclusion of both starting and ending indices of each identified text fragment under
the 5WH1 label. This was achieved by locating the exact match of the fragment within the original text
and extracting their positional indices. The final submission format, comprises the unique identifier
along with the indices and labels of the tags, which are submitted to the Kaggle challenge.
   We submitted three different configurations of our model, each differing in its use of adapters and
hyperparameters, the comparative results are detailed in Table 2. The model labeled "K-adapter KG",
which utilised only the 5W1H labels factual adapter, was trained in 8 epochs, and achieved the 1st
ranking in the competition. This underscores the effectiveness of this factual adapter in enhancing
model performance for the specified task. In the second model, "K-adapter KG 2A" the model used the
both adapters (5W1H and Wikidata), did not perform well. The noise produced by the external relations
of Wikidata do not improve the acquired knowledge for the NER task. Finally, the "K-adapter KG+" is
the same as the first one but trained with 12 epochs, which exhibited signs of overfitting during the
training process, achieving perfect score in training set, which likely caused the poorer performance
compared to the first configuration.

Table 2
Scores obtained in the submissions on Kaggle platform. Table shows the submission of the model, the obtained
ranking, the used adapters and the evaluation results. The group has submitted three different types of models,
being the first one the best of them.

            Submitted (Ranking)           Adapters         30% split test set   70% split test set
            K-adapter KG (1st)              5W1H                0.65957              0.66544
            K-adapter KG 2A (3rd)     5W1H + Wikidata           0.39649              0.40070
            K-adapter KG+ (3rd)             5W1H                0.39578              0.39572


5. Conclusions
This paper presents K-Flares, a K-Adapter-based approach for the FLARES challenge. This challenge is
composed of two different subtasks. The first subtask comprises the identification of the 5W1H labels
within a set of textual data extracted from pieces of news. In the second task, the goal is to classify each
5W1H according to their reliability level. K-Flares addresses the first subtask.
   For this task, two different adapters were devised, using a Spanish version of roBERTa as the baseline
language model. The task was devised as a Named-Entity Recognition problem, in which the entities to
be detected correspond to the the 5W1H labels. The first of the adapters, the 5W1H adapter, focuses
on the idea that knowledge graphs can be generated for each sample considering the WHAT as the
core and relating it with the rest of the elements in the sample. For the second adapter, the Wikidata
factual adapter, the mREBEL model was used to generate triples from the text samples according to
their knowledge in Wikidata. Both adapters were used independently and then conjunctively, with the
5W1H yielding the best results. This approach reached a 65.95% accuracy in the test set, achieving the
winning result of the challenge.
   The results show that the inclusion of knowledge of how the labels interact between them reinforce
the learning process of the NER task. Moreover, experiments also showed that the inclusion of factual
knowledge of external knowledge graphs such as Wikidata decreases the performance of the model.
Additionally, it has been demonstrated that treating the problem as a Named-Entity Recognition problem
in combination with external knowledge is a feasible and effective approach for this challenge.


Acknowledgments
This work has been partially founded by INESData (https://inesdata-project.eu/) project, funded by the
Spanish Ministry of Digital Transformation and Public Affairs and NextGenerationEU, in the framework
of the UNICO I+D CLOUD Program - Real Decreto 959/2022.


References
 [1] J. Pennington, R. Socher, C. Manning, GloVe: Global vectors for word representation, in: A. Mos-
     chitti, B. Pang, W. Daelemans (Eds.), Proceedings of the 2014 Conference on Empirical Methods in
     Natural Language Processing (EMNLP), Association for Computational Linguistics, Doha, Qatar,
     2014, pp. 1532–1543. URL: https://aclanthology.org/D14-1162. doi:10.3115/v1/D14-1162.
 [2] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Comput. 9 (1997) 1735–1780. URL:
     https://doi.org/10.1162/neco.1997.9.8.1735. doi:10.1162/neco.1997.9.8.1735.
 [3] M. F. Mridha, A. J. Keya, M. A. Hamid, M. M. Monowar, M. S. Rahman, A comprehensive review
     on fake news detection with deep learning, IEEE Access 9 (2021) 156151–156170. doi:10.1109/
     ACCESS.2021.3129329.
 [4] R. Sepúlveda-Torres, A. Bonet-Jover, I. Diab, I. Guillén-Pacho, I. Cabrera-de Castro, C. Badenes-
     Olmedo, E. Saquete, M. T. Martín-Valdivia, P. Martínez-Barco, L. A. Ureña-López, Overview of
     FLARES at IberLEF 2024: Fine-Grained Language-based Reliability Detection in Spanish News,
     Procesamiento del Lenguaje Natural 73 (2024).
 [5] L. Chiruzzo, S. M. Jiménez-Zafra, F. Rangel, Overview of IberLEF 2024: Natural Language Process-
     ing Challenges for Spanish and other Iberian Languages, in: Proceedings of the Iberian Languages
     Evaluation Forum (IberLEF 2024), co-located with the 40th Conference of the Spanish Society for
     Natural Language Processing (SEPLN 2024), CEUR-WS.org, 2024.
 [6] R. Wang, D. Tang, N. Duan, Z. Wei, X. Huang, J. Ji, G. Cao, D. Jiang, M. Zhou, K-Adapter:
     Infusing Knowledge into Pre-Trained Models with Adapters, in: Findings of the Association
     for Computational Linguistics: ACL-IJCNLP 2021, Association for Computational Linguistics,
     Online, 2021, pp. 1405–1418. URL: https://aclanthology.org/2021.findings-acl.121. doi:10.18653/
     v1/2021.findings-acl.121.
 [7] A. Q. Jiang, A. Sablayrolles, A. Mensch, C. Bamford, D. S. Chaplot, D. de las Casas, F. Bres-
     sand, G. Lengyel, G. Lample, L. Saulnier, L. R. Lavaud, M.-A. Lachaux, P. Stock, T. L. Scao,
     T. Lavril, T. Wang, T. Lacroix, W. E. Sayed, Mistral 7b, 2023. URL: https://arxiv.org/abs/2310.06825.
     arXiv:2310.06825.
 [8] W. Liu, P. Zhou, Z. Zhao, Z. Wang, Q. Ju, H. Deng, P. Wang, K-bert: Enabling language representa-
     tion with knowledge graph, in: Proceedings of the AAAI Conference on Artificial Intelligence,
     volume 34, 2020, pp. 2901–2908.
 [9] Y. Yao, S. Huang, L. Dong, F. Wei, H. Chen, N. Zhang, Kformer: Knowledge injection in trans-
     former feed-forward layers, in: Natural Language Processing and Chinese Computing, Springer
     International Publishing, Cham, 2022, pp. 131–143.
[10] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov,
     Roberta: A robustly optimized BERT pretraining approach, CoRR (2019).
[11] A. Bonet-Jover, R. Sepúlveda-Torres, E. Saquete, P. Martínez-Barco, M. Nieto-Pérez, RUN-AS: a
     novel approach to annotate news reliability for disinformation detection, Language Resources and
     Evaluation (2023) 1–31.
[12] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov,
     Roberta: A robustly optimized bert pretraining approach, 2019. arXiv:1907.11692.
[13] A. Gutiérrez-Fandiño, J. Armengol-Estapé, M. Pàmies, J. Llop-Palao, J. Silveira-Ocampo, C. P.
     Carrino, C. Armentano-Oller, C. Rodriguez-Penagos, A. Gonzalez-Agirre, M. Villegas, Maria:
     Spanish language models, Procesamiento del Lenguaje Natural (2022) 39–60. URL: https://doi.org/
     10.26342/2022-68-3. doi:10.26342/2022-68-3.
[14] P.-L. H. Cabot, S. Tedeschi, A.-C. N. Ngomo, R. Navigli, Redfm : a filtered and multilingual relation
     extraction dataset, arXiv preprint arXiv:2306.09802 (2023).