<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Towards Dataset for Extracting Relations in the Climate-Change Domain</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andrija Poleksić</string-name>
          <email>andrija.poleksic@uniri.hr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sanda Martinčić-Ipšić</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Informatics and Digital Technologies (University of Rijeka)</institution>
          ,
          <addr-line>Radmile Matejčić 2, Rijeka, 51000</addr-line>
          ,
          <country country="HR">Croatia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The impacts of global warming and climate change on ecosystems, weather patterns and human societies pose a significant threat to biodiversity and the sustainability of our planet. Despite the widespread scientific consensus, climate change denial persists among a segment of the population, either due to misconceptions or vested interests. Recent research shows that progress is being made in addressing climate denial as a majority acknowledges man-made climate change. However, the spread of misinformation remains a challenge, often perpetuated by corporate interests. To overcome these challenges, we propose constructing a dataset tailored for automated extraction and structuring of climate changerelated scientific findings, focusing on relation extraction (RE) from scientific papers. Our research outlines the steps involved, including the preparation of the dataset for further training of the BERT-based model and downstream relation extraction task formulation. We discuss the process of data collection, preprocessing techniques and preliminary dataset analysis. Additionally, we highlight the need for a specialized Named Entity Recognition model for the climate-change domain and underline the need for annotation of domain-specific relations.</p>
      </abstract>
      <kwd-group>
        <kwd>dataset</kwd>
        <kwd>climate change</kwd>
        <kwd>relation extraction</kwd>
        <kwd>scientific papers</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        Global warming and climate change have profound and far-reaching efects on global ecosystems,
weather patterns, sea levels, and human societies, constituting a critical threat to the planet’s
biodiversity and the prospect of a sustainable future [1]. Despite the widespread acceptance
and scientific backing of climate change concepts, there remains a segment of the population
that denies human impact on climate change, referred to as climate denial. Climate denial is
driven either by misguided beliefs [2] or vested corporate interests [
        <xref ref-type="bibr" rid="ref1 ref38">3</xref>
        ]. A study by Areni et
al. [2] investigates the dynamics between supporter and denier groups of Reddit users. They
observe that supporters frequently reference scientific work, whereas deniers tend to rely more
on alternative media and sources. Recent comprehensive research conducted by Andre et al. [
        <xref ref-type="bibr" rid="ref2">4</xref>
        ]
demonstrates significant strides in addressing the issue of climate change denial. Their findings
reveal that up to 86% of individuals acknowledge the reality of human-induced climate change
and endorse measures aimed at mitigating human impact on the climate. Substantial climate
denial stems from the dissemination of misinformation by large companies, often driven by
vested interests, such as oil companies [
        <xref ref-type="bibr" rid="ref3">5</xref>
        ] and false scientific doubt creations, as elaborated by
Oreskes and Conway [
        <xref ref-type="bibr" rid="ref4">6</xref>
        ]. Furthermore, the ever-increasing amount of data and information,
including scientific papers, propels the need for automated information processing to speed up
informed research decisions and facilitate fact-checking.
      </p>
      <p>
        Motivated by both these challenges - information deluge and climate change, in this paper, we
propose steps to construct the dataset that is fit to automatically extract and structure climate
change-related scientific findings using information extraction (IE) methods. Specifically, we
focus on the preliminary steps for relation extraction (RE) from scientific papers. Relation
extraction (RE) is tasked with the identification of relations between entities in sentences,
paragraphs or larger units of text. Sentence-level relation extraction involves identifying and
classifying relations between entities in a single sentence. The goal is to determine the relation
or association between two entities, typically represented by nouns or noun phrases such as
people, organizations, or locations - named entities [
        <xref ref-type="bibr" rid="ref5">7</xref>
        ]. Our overall research plan consists of
several steps:
• Preparation of the dataset of scientific papers for a climate-change domain suitable for
the training of a BERT-like model;
• Additional pretraining (training with available pretrained weights) of the BERT-like model
to adapt to the climate-change domain;
• Definition of relation types for relation extraction and construction of the dataset for the
ifne-tuning of the newly trained model(s) on the task of sentence-level relation extraction;
• Construction and curation of the climate-change knowledge graph from a high-quality
journal.
      </p>
      <p>In the next Section 2 is a short overview of the related work on pertained language models,
relation extraction datasets and relation annotation. Section 3 elaborates on data collection,
preprocessing and a preliminary analysis of the data. The final Sections 4 and 5, cover the
results, discussion and conclusions respectively.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Related Work</title>
      <p>
        Recent research eforts [
        <xref ref-type="bibr" rid="ref10 ref6 ref7 ref8 ref9">8, 9, 10, 11, 12</xref>
        ] report using pretrained models for text classification
and sequence labelling tasks. One of the prominent ones is BERT (Bidirectional Encoder
Representations from Transformers), an encoder-only transformer model trained on masked
language modelling (MLM) task [
        <xref ref-type="bibr" rid="ref11">13</xref>
        ]. Although it is shown that encoder-decoder architecture
models such as BART [
        <xref ref-type="bibr" rid="ref12">14</xref>
        ] and T5 [
        <xref ref-type="bibr" rid="ref13">15</xref>
        ] provide comparable and sometimes better results [
        <xref ref-type="bibr" rid="ref14">16</xref>
        ],
they require the training of a larger number of parameters, which ultimately requires a larger
amount of data and computational resources.
      </p>
      <p>
        Lee et al. [
        <xref ref-type="bibr" rid="ref6">8</xref>
        ] perform additional training of the original BERT deep neural model [
        <xref ref-type="bibr" rid="ref11">13</xref>
        ]
for the biomedical domain - BioBERT. They report that no new WordPeace vocabulary is
needed, ensuring the compatibility of the two pretrained models (BioBERT and BERT). BioBERT
achieves new SOTA results on benchmarks for relation extraction and named entity recognition.
ClinicalBERT model [
        <xref ref-type="bibr" rid="ref9">11</xref>
        ] follows the same principle and further trains the BERT and BioBERT
models on a large multicenter dataset.
      </p>
      <p>
        The other line of research by Beltagy et al. [
        <xref ref-type="bibr" rid="ref7">9</xref>
        ] is training a new model SciBERT from scratch,
which is also based on the BERT architecture [
        <xref ref-type="bibr" rid="ref11">13</xref>
        ], using scientific papers as the training data.
For SciBERT they construct a new vocabulary SciVocab. An overall improvement of 0.61
F1score on the downstream tasks using SciVocab compared to using the original BERT vocabulary
is achieved. Additionally, several SOTA results are reported, surpassing also the BioBERT
results on the ChemProt [
        <xref ref-type="bibr" rid="ref15">17</xref>
        ] benchmark by a fairly large margin. A similar strategy is applied
in Chalkidis et al. [
        <xref ref-type="bibr" rid="ref10">12</xref>
        ], where a family of LegalBERT models is trained to support legal NLP
research, computer-assisted law and legal technology applications.
      </p>
      <p>
        Webersinke et al. in [
        <xref ref-type="bibr" rid="ref8">10</xref>
        ] train the RoBERTa model [
        <xref ref-type="bibr" rid="ref16">18</xref>
        ], which was adapted using distillation
process [
        <xref ref-type="bibr" rid="ref17">19</xref>
        ], on the climate-change domain - ClimateBERT. The model is trained on
climaterelated news articles and posts on social media.
      </p>
      <p>
        In our research we will extend our previous research [
        <xref ref-type="bibr" rid="ref18">20</xref>
        ], as we plan to perform additional
training on two models: for SciBERT additional training for the climate-change domain
employing scientific papers; and for ClimateBERT extension of parametrized domain knowledge
by carefully curated high-quality dataset, surpassing their drawbacks of either
out-of-climatechange-domain vocabulary or improving the quality of media collected information with
scientifically obtained facts. To this end, in this paper, we propose the construction of a new
dataset for the climate-change domain obtained from scientific papers published in high-quality
journals.
      </p>
      <p>
        For joint entity and relation extraction downstream tasks [
        <xref ref-type="bibr" rid="ref19">21</xref>
        ] the model is trained to perform
both tasks simultaneously while benefiting from the use of interrelated signals. Relation
extraction can be set as a supervised task and requires a huge amount of labelled (i.e. annotated)
training data. To speed up the process, many researchers are turning to the idea of distant
supervision1 [
        <xref ref-type="bibr" rid="ref20">22</xref>
        ]. This includes datasets such as FewRel [
        <xref ref-type="bibr" rid="ref21">23</xref>
        ] and T-REx [
        <xref ref-type="bibr" rid="ref22">24</xref>
        ] for RE at sentence
level and datasets such as DocRED [
        <xref ref-type="bibr" rid="ref23">25</xref>
        ] and Wiki20m [
        <xref ref-type="bibr" rid="ref24">26</xref>
        ] for RE on larger text sections.
      </p>
      <p>
        Recently, the use of Large Language Models (LLMs) for the annotation of relations and entities
has been reported [
        <xref ref-type="bibr" rid="ref25">27</xref>
        ], either to augment and speed up the annotation process for human
annotators [
        <xref ref-type="bibr" rid="ref26 ref27">28, 29</xref>
        ] or to completely replace human eforts [
        <xref ref-type="bibr" rid="ref28">30</xref>
        ]. Besides annotation, LLMs are
considered as synthetic data generators [
        <xref ref-type="bibr" rid="ref29 ref30">31, 32</xref>
        ] or for assessing the LLM-annotation quality
[
        <xref ref-type="bibr" rid="ref31">33</xref>
        ]. In our research, we plan to engage LLMs for the relation annotation subtask, leveraging
of-the-shelf pretrained LLMs to speed up the process, as opposed to training specialised in-house
LLMs and using them directly for RE.
      </p>
    </sec>
    <sec id="sec-4">
      <title>3. Dataset Preparation</title>
      <p>Adapting one of the BERT models for the RE task for the climate-change domain requires the
construction of an appropriate dataset (e.g. scientific and high-quality source). To this end
we selected the highest-ranked scientific journals on climate change based on the Scimago
1Distant supervision assumes that the presence of a given entity pair in a given text implies a relation between them
such that it is found in a Knowledge Graph/Base.</p>
      <p>Journal &amp; Country Rank (SJR)2 and ScienceWatch Rank3 and open access MDPI journals that are
associated with the topic of climate change and in a substantial quantity of available papers and
consistent format for parsing. The Table 2 (Appendix A) lists information on 194,673 retrieved
research papers from selected journals, where 77.35% (150,583) are available in HTML format,
while the remaining 22.65% (44,090) are only available in PDF format.</p>
      <p>
        The PDF documents were first processed with pdfminer.six 4 library [
        <xref ref-type="bibr" rid="ref32">34</xref>
        ] for extracting
information from PDF documents. They were converted to HTML format retaining the available
information for each parsed element, including position, font and font size. This information was
obtained with the Layout analysis algorithm5 that groups characters into words and lines, lines
into boxes and finally textboxes hierarchically based on the position of each character. Hence,
we developed a parser fine-tuned to each journal formatting style and position information,
enabling correct and complete text extraction. For navigation through HTML files, we used
BeautifulSoup6 library [
        <xref ref-type="bibr" rid="ref33">35</xref>
        ].
      </p>
      <p>As already mentioned, for each journal a specific parser was needed. Next, we draw a random
sample of 100 papers for each journal to evaluate the parsing procedure. Based on the random
sample, we create a parser that successfully extracts the content of the papers in 100% of the
cases, ranging from pure content to metadata such as authors, afiliations, references and DOI
information. The parsing procedure allows extracting data to the full extent. This is manually
validated on a random sample of 10 papers per journal by comparing the texts from PDF/HTML
with the data stored in Pandas dataframes7. Table 3 (Appendix C) lights up some of the most
common problems encountered during PDF and HTML parsing. Still, despite many problems,
we obtained a well-documented, comprehensive dataset, which is appropriate for further model
training. In Table 1 the comparison of the total training data used for each of the neural models
(BERT, SciBERT and ClimateBERT) is reported. Our dataset contains ∼35% of tokens used
for training of SciBERT, and surpasses the number of tokens for ClimateBERT by six times.
The average number of sentences per paper in our dataset is ∼160% of the average reported
for SciBERT. These numbers are encouraging, suggesting that we have collected suficient
high-quality texts for training of BERT-based model.</p>
      <p>
        To further explore the dataset content we report statistics using a readily available
part-ofspeech (POS) tagger and a named entity recognition (NER) model from flair 8 framework [
        <xref ref-type="bibr" rid="ref34">36</xref>
        ].
First, we take a random sample of 10,000 research papers to perform the analysis. Then we
tokenize into sentences and perform POS tagging9 and NER. In each POS-tagged sentence, we
determine noun- and verb- phrases. Non traditionally, we define heuristic noun- and
verbphrases as a sequence of words with specific POS tags as listed:
• Noun phrase: Cardinal number (CD), Adjective (JJ), Determiner (DT), Noun (NN),
      </p>
      <p>Foreign word (FW), Possessive ending (POS), Hyphen (HYPH), Symbol (SYM) ,
2https://www.scimagojr.com/journalrank.php?category=2306
3http://archive.sciencewatch.com/ana/st/climate/journals/
4https://github.com/pdfminer/pdfminer.six/tree/master
5https://pdfminersix.readthedocs.io/en/latest/topic/converting_pdf_to_text.html#id1
6https://www.crummy.com/software/BeautifulSoup/bs4/doc/
7https://pandas.pydata.org/
8https://github.com/flairNLP/flair
9The full list of POS tags for the model used can be found here: https://huggingface.co/flair/pos-english.</p>
      <p>
        This modification, despite being imperfect, allows for analysis of the most frequent verb- and
noun- phrases, providing insights into possible types of relations between entities, possible
named entities and entity types (e.g. person, organization, location, etc.). With this
approximation, we further estimated the number of total and unique triples. Figure 1 shows the total
number of verb phrases, noun phrases, entities (tagged by the NER model) and possible triples
occurring in the sample of 10,000 papers. The sample consists of 2,406,799 sentences, from
which we extracted a total of 15,238,265 noun phrases and 1,790,745 entities. The ratio of noun
phrases to extracted entities (∼8:1) indicates the need for a NER model, that is better fitted to the
climate-change domain vocabulary. Table 4 (Appendix D) lists the top noun phrases consisting
of 1, 2 and 3 words respectively. Table 5 (Appendix E) lists the top entities for three entity
types: Location Name (LOC), Organization Name (ORG) and Other Name (MISC). Number
of entity types will be addressed in the future work, employing more recent methods such
as GLiNER [
        <xref ref-type="bibr" rid="ref35">37</xref>
        ]. Since the list contains many acronyms and abbreviations the expansion and
disambiguation problem needs to be addressed as well.
      </p>
      <p>Similarly, we analyze the occurrence of verb phrases: a total of 5,934,949 verb phrases forming
486,632 unique expressions. Although this is promising, the number of unique expressions
needs to be reduced to a feasible set enabling the training of the classifier to extract relations in
downstream tasks. Moreover, this is an indication that many climate-change-specific relations
are present, which needs to be addressed in the downstream training as well. Table 6 (Appendix
F) reports the 30 most frequently occurring verb phrases by number of words (1, 2 and 3
respectively). We observe a high similarity between many unique verb phrases, such as: ”is
shown”, ”shows”, ”are shown” and ”has been shown”; indicating the obvious next step of data
quality improvement by deduplication.</p>
    </sec>
    <sec id="sec-5">
      <title>4. Relation Annotation</title>
      <p>
        To efectively train and evaluate supervised relation extraction models, the annotated data is
needed [
        <xref ref-type="bibr" rid="ref22">24</xref>
        ]. To this end, we plan to engage the advanced LLM possibilities in the context of
automatic or enhanced annotation of relation triples. With POS tagging and NER on the sample
of 10,000 papers, we have established the foundation for possible triple detection. We anticipate
that a relation is expected to exist if there is a verb between two entities, where entities are
either approximated by noun phrases that we have heuristically recognised or named entities
recognised by the flair model. Moreover, we hypothesize that this will allow guided annotation
by providing better context to LLM-enabled annotation. In the remainder of this section, we
preview some examples of possible entities and relations in climate change domain10, which
remains an open question to be addressed in the future:
• ’For example, Atlantic cyclones have been well documented as causing high surge levels
and heavy precipitation.’ - (Atlantic cyclones, cause, high surge levels)
• ’El Niño–Southern Oscillation (ENSO) is another important factor for
winter temperature in China.’ - (ENSO, afects , winter temperature in China)
• ’The concentration map captured a significantly high hazard of groundwater arsenic in
the north and northeast India, particularly in Assam and West Bengal, ... .’ - (West
10Underlined words are suggested entities in the sentence, where the bold parts are recognized by the flair NER
model. Each sentence has a suggested triple in the form: (entity1, relation, entity2)
      </p>
      <p>Bengal, high hazard of, groundwater arsenic)</p>
    </sec>
    <sec id="sec-6">
      <title>5. Discussion and Conclusion</title>
      <p>In this paper, we report on the first steps towards creating a dataset suitable for training
the BERT-like model that will subsequently be used for downstream climate-change relation
extraction tasks. We have collected and analyzed a set of 200,000 carefully selected scientific
papers as the high-quality content of the climate-change domain. We discuss technical details
and common pitfalls in parsing PDF and HTML documents as the first steps needed to obtain
a suficient quantity of domain-specific data to train a BERT-based model. Next, we report
preliminary statistics of the dataset to ensure its appropriateness for downstream relation
extraction. During preliminary analysis, we identified a high number of possible diferent
relations, indicating that further distilling of relations and relation types should be implemented.
Moreover, our preliminary findings suggest that the new NER model tailored for the vocabulary
of the climate-change domain is required.</p>
      <p>
        With these preliminary results, we open several research directions. First, the collected
dataset will be used for additional training of the SciBERT and ClimateBERT models involving
diferent configurations of masked language modelling (MLM) principles. Second, to reduce the
abundance of diferent but similar domain-specific relations we will need to develop a method
for fine-tuning annotated relations for training sentence-level relation extraction (RE) model.
This will involve the disambiguation of related relations and relation types and LLM-enabled
annotation. Finally, as the main goal of this research is the construction and curation of a
knowledge graph for the climate-change content captured in a high-quality journal. In future
work, we plan to address KG construction-related challenges, relying on existing literature,
such as work of Dessi et al [
        <xref ref-type="bibr" rid="ref36">38</xref>
        ] and Chessa et al [
        <xref ref-type="bibr" rid="ref37">39</xref>
        ].
      </p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments References</title>
      <p>This work has been partially supported by the University of Rijeka under project number
uniri-drustv-18-20. Croatian Science Foundation supports AP under the project DOK-2021-02.
[1] H.-G. et al, Impacts of 1.5ºc global warming on natural and human systems, in: Global
Warming of 1.5°C. An IPCC Special Report on the impacts of global warming of 1.5°C
above pre-industrial levels and related global greenhouse gas emission pathways, in the
context of strengthening the global response to the threat of climate change, sustainable
development, and eforts to eradicate poverty, Cambridge University Press, Cambridge,
UK and New York, NY, USA, 2018, pp. 175–312. doi:10.1017/9781009157940.005.
[2] C. S. Areni, Motivated reasoning and climate change: Comparing news sources,
politicization, intensification, and qualification in denier versus believer subreddit comments,
Applied Cognitive Psychology 38 (2024). doi:10.1002/acp.4167, all Open Access, Hybrid
Gold Open Access.</p>
    </sec>
    <sec id="sec-8">
      <title>A. Data statistics</title>
      <p>#
831
3,943
355
387
B. Training data comparison calculations
• c: Approximation from</p>
      <p>
        ter/flair/splitter.py)
• a: Calculated from reported average number of words [
        <xref ref-type="bibr" rid="ref8">10</xref>
        ].
• b: Approximation from tokenizer trained on 10,000 papers sample according to The
Tokenization pipeline (https://huggingface.co/docs/tokenizers/python/latest/pipeline.html).
(https://github.com/flairNLP/flair/blob/mas
      </p>
    </sec>
    <sec id="sec-9">
      <title>C. Common extraction problems</title>
      <p>First line of paragraph missing</p>
    </sec>
    <sec id="sec-10">
      <title>D. Most common noun phrases</title>
    </sec>
    <sec id="sec-11">
      <title>E. Most common entities</title>
    </sec>
    <sec id="sec-12">
      <title>F. Most common verb phrases</title>
      <p>#</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Farrell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>McConnell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Brulle</surname>
          </string-name>
          ,
          <article-title>Evidence-based strategies to combat scientific misinformation</article-title>
          ,
          <source>Nature Climate Change</source>
          <volume>9</volume>
          (
          <year>2019</year>
          )
          <fpage>191</fpage>
          -
          <lpage>195</lpage>
          . doi:
          <volume>10</volume>
          .1038/s41558-018-0368-6.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Andre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Boneva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Chopra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Falk</surname>
          </string-name>
          ,
          <article-title>Globally representative evidence on the actual and perceived support for climate action</article-title>
          ,
          <source>Nature Climate Change</source>
          (
          <year>2024</year>
          ). doi:
          <volume>10</volume>
          .1038/ s41558-024-01925-3.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R.</given-names>
            <surname>Debnath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ebanks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Mohaddes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Roulet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Alvarez</surname>
          </string-name>
          ,
          <article-title>Do fossil fuel firms reframe online climate and sustainability communication? a data-driven analysis</article-title>
          ,
          <source>npj Climate Action</source>
          <volume>2</volume>
          (
          <year>2023</year>
          )
          <article-title>47</article-title>
          . doi:
          <volume>10</volume>
          .1038/s44168-023-00086-x.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>N.</given-names>
            <surname>Oreskes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. M.</given-names>
            <surname>Conway</surname>
          </string-name>
          , Merchants of Doubt:
          <article-title>How a Handful of Scientists Obscured the Truth on Issues From Tobacco Smoke to Global Warming</article-title>
          , Bloomsbury Press,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Pawar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. K.</given-names>
            <surname>Palshikar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bhattacharyya</surname>
          </string-name>
          , Relation extraction : A survey,
          <year>2017</year>
          . arXiv:
          <volume>1712</volume>
          .
          <fpage>05191</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yoon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. H.</given-names>
            <surname>So</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kang</surname>
          </string-name>
          ,
          <article-title>Biobert: a pre-trained biomedical language representation model for biomedical text mining</article-title>
          ,
          <source>Bioinformatics</source>
          <volume>36</volume>
          (
          <year>2019</year>
          )
          <fpage>1234</fpage>
          -
          <lpage>1240</lpage>
          . doi:
          <volume>10</volume>
          .1093/bioinformatics/btz682.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>I.</given-names>
            <surname>Beltagy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lo</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Cohan,
          <article-title>SciBERT: A pretrained language model for scientific text</article-title>
          ,
          <source>in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Hong Kong, China,
          <year>2019</year>
          , pp.
          <fpage>3615</fpage>
          -
          <lpage>3620</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D19</fpage>
          -1371.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>N.</given-names>
            <surname>Webersinke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kraus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bingler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Leippold</surname>
          </string-name>
          ,
          <article-title>Climatebert: A pretrained language model for climate-related text</article-title>
          ,
          <source>SSRN</source>
          (
          <year>2022</year>
          ). URL: https://ssrn.com/abstract=4229146. doi:
          <volume>10</volume>
          .2139/ssrn.4229146.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>E.</given-names>
            <surname>Alsentzer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Murphy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Boag</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.-H.</given-names>
            <surname>Weng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Naumann</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. B. A. McDermott</surname>
          </string-name>
          ,
          <source>Publicly available clinical bert embeddings</source>
          ,
          <year>2019</year>
          . arXiv:
          <year>1904</year>
          .03323.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>I.</given-names>
            <surname>Chalkidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fergadiotis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Malakasiotis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Aletras</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Androutsopoulos</surname>
          </string-name>
          ,
          <article-title>Legal-bert: The muppets straight out of law school</article-title>
          ,
          <year>2020</year>
          . arXiv:
          <year>2010</year>
          .02559.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          , in: J.
          <string-name>
            <surname>Burstein</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Doran</surname>
          </string-name>
          , T. Solorio (Eds.),
          <source>Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers),
          <source>Association for Computational Linguistics</source>
          , Minneapolis, Minnesota,
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          . URL: https://aclanthology.org/N19-1423. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>N19</fpage>
          -1423.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ghazvininejad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mohamed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          , L. Zettlemoyer, BART:
          <article-title>Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension, in: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>7871</fpage>
          -
          <lpage>7880</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .acl-main.
          <volume>703</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .acl-main.
          <volume>703</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>C.</given-names>
            <surname>Rafel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Narang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Matena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <article-title>Exploring the limits of transfer learning with a unified text-to-text transformer</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <year>1910</year>
          .10683.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>L. N.</given-names>
            <surname>Phan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. T.</given-names>
            <surname>Anibal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Tran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chanana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Bahadroglu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Peltekian</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>AltanBonnet, Scifive: a text-to-text transformer model for biomedical literature</article-title>
          ,
          <year>2021</year>
          . arXiv:
          <volume>2106</volume>
          .
          <fpage>03598</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>J. V.</given-names>
            <surname>Kringelum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Kjaerulf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Brunak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Lund</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. I.</given-names>
            <surname>Oprea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Taboureau</surname>
          </string-name>
          ,
          <year>Chemprot3</year>
          .
          <article-title>0: a global chemical biology diseases mapping</article-title>
          ,
          <source>Database</source>
          (Oxford)
          <year>2016</year>
          (
          <year>2016</year>
          )
          <article-title>bav123</article-title>
          . doi:
          <volume>10</volume>
          .1093/database/bav123.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <article-title>Roberta: A robustly optimized bert pretraining approach</article-title>
          ,
          <year>2019</year>
          . arXiv:
          <year>1907</year>
          .11692.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>V.</given-names>
            <surname>Sanh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Debut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chaumond</surname>
          </string-name>
          , T. Wolf,
          <article-title>Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter</article-title>
          , ArXiv abs/
          <year>1910</year>
          .01108 (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>A.</given-names>
            <surname>Poleksić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Martinčić-Ipšić</surname>
          </string-name>
          ,
          <article-title>Efects of pretraining corpora on scientific relation extraction using bert and scibert</article-title>
          , in: Joint Workshop Proceedings of 5th (
          <article-title>Sem4Tra) and 2nd NLP4KGC: Natural Language Processing for Knowledge Graph Construction co-located with the 19th</article-title>
          <source>International Conference on Semantic Systems (SEMANTiCS</source>
          <year>2023</year>
          ), volume Vol-
          <volume>3510</volume>
          <source>of CEUR Workshop Proceedings</source>
          , Leipzig, Germany,
          <year>2023</year>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3510</volume>
          /paper_nlp_3.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , H. Cheng, W. Lam,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <article-title>A comprehensive survey on deep learning for relation extraction: Recent advances</article-title>
          and new frontiers,
          <year>2023</year>
          . arXiv:
          <fpage>2306</fpage>
          .
          <year>02051</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>M.</given-names>
            <surname>Mintz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bills</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Snow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jurafsky</surname>
          </string-name>
          ,
          <article-title>Distant supervision for relation extraction without labeled data</article-title>
          , in: K.
          <string-name>
            <surname>-Y. Su</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Su</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Wiebe</surname>
          </string-name>
          , H. Li (Eds.),
          <source>Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Suntec, Singapore,
          <year>2009</year>
          , pp.
          <fpage>1003</fpage>
          -
          <lpage>1011</lpage>
          . URL: https://aclanthology.org/P09-1113.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>X.</given-names>
            <surname>Han</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          , M. Sun,
          <article-title>FewRel: A large-scale supervised few-shot relation classification dataset with state-of-the-art evaluation</article-title>
          ,
          <source>in: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing</source>
          , Association for Computational Linguistics, Brussels, Belgium,
          <year>2018</year>
          , pp.
          <fpage>4803</fpage>
          -
          <lpage>4809</lpage>
          . URL: https://aclanthology.org/D18-1514. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D18</fpage>
          - 1514.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>H.</given-names>
            <surname>Elsahar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Vougiouklis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Remaci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gravier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hare</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Laforest</surname>
          </string-name>
          , E. Simperl, T-REx:
          <article-title>A large scale alignment of natural language with knowledge base triples</article-title>
          , in: N.
          <string-name>
            <surname>Calzolari</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Choukri</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Cieri</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Declerck</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Goggi</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Hasida</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Isahara</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Maegaard</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mariani</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Mazo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Moreno</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Odijk</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Piperidis</surname>
          </string-name>
          , T. Tokunaga (Eds.),
          <source>Proceedings of the LREC</source>
          <year>2018</year>
          ,
          <article-title>European Language Resources Association (ELRA), Miyazaki</article-title>
          , Japan,
          <year>2018</year>
          . URL: https://aclanthology.org/L18-1544.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Han</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhou</surname>
          </string-name>
          , M. Sun,
          <article-title>DocRED: A large-scale document-level relation extraction dataset, in: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</article-title>
          , Florence, Italy,
          <year>2019</year>
          , pp.
          <fpage>764</fpage>
          -
          <lpage>777</lpage>
          . URL: https://aclanthology.org/P19-1074. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>P19</fpage>
          - 1074.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>X.</given-names>
            <surname>Han</surname>
          </string-name>
          ,
          <string-name>
            <surname>T</surname>
          </string-name>
          . Gao,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>More data, more relations, more context and more openness: A review and outlook for relation extraction</article-title>
          ,
          <source>in: Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing</source>
          , Association for Computational Linguistics, Suzhou, China,
          <year>2020</year>
          , pp.
          <fpage>745</fpage>
          -
          <lpage>758</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .aacl-main.
          <volume>75</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Beigi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bhattacharjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Karami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          , L. Cheng, H. Liu,
          <article-title>Large language models for data annotation: A survey</article-title>
          ,
          <year>2024</year>
          . arXiv:
          <volume>2402</volume>
          .
          <fpage>13446</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>A.</given-names>
            <surname>Goel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gueta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Gilon</surname>
          </string-name>
          , C. Liu,
          <string-name>
            <given-names>S.</given-names>
            <surname>Erell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. H.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Hao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Jaber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Reddy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kartha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Steiner</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Laish</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Feder</surname>
          </string-name>
          ,
          <article-title>Llms accelerate annotation for medical information extraction</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2312</volume>
          .
          <fpage>02296</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Jia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <article-title>Semi-automatic data enhancement for document-level relation extraction with distant supervision from large language models</article-title>
          , in: H.
          <string-name>
            <surname>Bouamor</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Pino</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          Bali (Eds.),
          <source>Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing</source>
          , Association for Computational Linguistics, Singapore,
          <year>2023</year>
          , pp.
          <fpage>5495</fpage>
          -
          <lpage>5505</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2023</year>
          .emnlp-main.
          <volume>334</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>R.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhou</surname>
          </string-name>
          , L. Zou,
          <article-title>LLMaAA: Making large language models as active annotators</article-title>
          , in: H.
          <string-name>
            <surname>Bouamor</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Pino</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          Bali (Eds.),
          <source>Findings of the Association for Computational Linguistics: EMNLP</source>
          <year>2023</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Singapore,
          <year>2023</year>
          , pp.
          <fpage>13088</fpage>
          -
          <lpage>13103</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2023</year>
          .findings-emnlp.
          <volume>872</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>R.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Han</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <article-title>Does synthetic data generation of llms help clinical text mining</article-title>
          ?,
          <year>2023</year>
          . arXiv:
          <volume>2303</volume>
          .
          <fpage>04360</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Qiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Improving unsupervised relation extraction by augmenting diverse sentence pairs</article-title>
          , in: H.
          <string-name>
            <surname>Bouamor</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Pino</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          Bali (Eds.),
          <source>Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing</source>
          , Association for Computational Linguistics, Singapore,
          <year>2023</year>
          , pp.
          <fpage>12136</fpage>
          -
          <lpage>12147</lpage>
          . URL: https://aclanthology. org/
          <year>2023</year>
          .emnlp-main.
          <volume>745</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2023</year>
          .emnlp-main.
          <volume>745</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>H.</given-names>
            <surname>Khorashadizadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Mihindukulasooriya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tiwari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Groppe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Groppe</surname>
          </string-name>
          ,
          <article-title>Exploring in-context learning capabilities of foundation models for generating knowledge graphs from text</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2305</volume>
          .
          <fpage>08804</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shinyama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Guglielmetti</surname>
          </string-name>
          , P. Marsman, pdfminer.six,
          <year>2018</year>
          . URL: https://pdfminersix. readthedocs.io/.
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>L.</given-names>
            <surname>Richardson</surname>
          </string-name>
          , Beautiful soup documentation,
          <year>2007</year>
          . URL: https://www.crummy.com/ software/BeautifulSoup/bs4/doc/.
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>A.</given-names>
            <surname>Akbik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Bergmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Blythe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Rasul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schweter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Vollgraf</surname>
          </string-name>
          ,
          <string-name>
            <surname>FLAIR:</surname>
          </string-name>
          <article-title>An easy-to-use framework for state-of-the-</article-title>
          <string-name>
            <surname>art</surname>
            <given-names>NLP</given-names>
          </string-name>
          ,
          <source>in: NAACL</source>
          <year>2019</year>
          ,
          <article-title>2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations</article-title>
          ),
          <year>2019</year>
          , pp.
          <fpage>54</fpage>
          -
          <lpage>59</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>U.</given-names>
            <surname>Zaratiana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tomeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Holat</surname>
          </string-name>
          , T. Charnois, Gliner:
          <article-title>Generalist model for named entity recognition using bidirectional transformer</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2311</volume>
          .
          <fpage>08526</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>D.</given-names>
            <surname>Dessí</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Osborne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. Reforgiato</given-names>
            <surname>Recupero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Buscaldi</surname>
          </string-name>
          , E. Motta,
          <article-title>Scicero: A deep learning and nlp approach for generating scientific knowledge graphs in the computer science domain</article-title>
          ,
          <source>Knowledge-Based Systems</source>
          <volume>258</volume>
          (
          <year>2022</year>
          )
          <article-title>109945</article-title>
          . URL: https://www.sciencedirect. com/science/article/pii/S0950705122010383. doi:https://doi.org/10.1016/j.knosys.
          <year>2022</year>
          .
          <volume>109945</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>A.</given-names>
            <surname>Chessa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Fenu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Motta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Osborne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. Reforgiato</given-names>
            <surname>Recupero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Salatino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Secchi</surname>
          </string-name>
          ,
          <article-title>Data-driven methodology for knowledge graph generation within the tourism domain</article-title>
          ,
          <source>IEEE Access 11</source>
          (
          <year>2023</year>
          )
          <fpage>67567</fpage>
          -
          <lpage>67599</lpage>
          . doi:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2023</year>
          .
          <volume>3292153</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          <volume>3</volume>
          ,825 tEicoonlsogical Applica-
          <volume>4</volume>
          ,
          <issue>469</issue>
          aEncodsSyustsetaminHabeialilttyh 1,
          <source>023 Journal of Climate 15,325 Climate Dynamics Journal of Geo- NPJ Climate and 7</source>
          ,103 physical Research:
          <volume>14</volume>
          ,512 Atmospheric SciAtmospheres ence 12 tNioPnJ Climate Ac- 39
          <source>CNhataunrgee Climate 560 PNAS 88,534 MDPI water 18 MDPI Atmosphere 8.705 MDPI Climate 184 MDPI Ecologies 115 MDPI Energies 988 MDPI Forests 10,674 MDPI Fuels 1</source>
          ,012 oMgDyPI Meteorol- 57
          <source>CMhDePmIisSturystainable 420 MDPI Oceans 126 Total</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>