<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Expanding the Vocabulary of BERT for Knowledge Base Construction</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dong Yang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xu Wang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Remzi Celebi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Data Science, Department of Advanced Computing Sciences, Maastricht University</institution>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Knowledge base construction entails acquiring structured information to create a knowledge base of factual and relational data, facilitating question answering, information retrieval, and semantic understanding. The challenge called ”Knowledge Base Construction from Pretrained Language Models” at International Semantic Web Conference 2023 defines tasks focused on constructing knowledge base using language model. Our focus was on Track 1 of the challenge, where the parameters are constrained to a maximum of 1 billion, and the inclusion of entity descriptions within the prompt is prohibited. Although the masked language model ofers suficient flexibility to extend its vocabulary, it is not inherently designed for multi-token prediction. To address this, we present Vocabulary Expandable BERT for knowledge base construction, which expand the language model's vocabulary while preserving semantic embeddings for newly added words. We adopt task-specific re-pre-training on masked language model to further enhance the language model. Through experimentation, the results show the efectiveness of our approaches. Our framework achieves F1 score of 0.323 on the hidden test set and 0.362 on the validation set, both data set is provided by the challenge. Notably, our framework adopts a lightweight language model (BERT-base, 0.13 billion parameters) and surpasses the model using prompts directly on large language model (Chatgpt-3, 175 billion parameters). Besides, Token-Recode achieves comparable performances as Re-pretrain. This research advances language understanding models by enabling the direct embedding of multi-token entities, signifying a substantial step forward in link prediction task in knowledge graph and metadata completion in data management. 1</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Knowledge bases have a profound impact across diverse domains, ofering transformative
benefits. They enhance information retrieval systems, leading to increased eficiency and accuracy,
thereby enabling users to swiftly locate relevant data [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In the context of natural language
processing, knowledge bases play a crucial role in elevating semantic comprehension and
facilitating a range of language-related tasks [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Moreover, these knowledge bases actively promote
data integration and interoperability, making substantial contributions to the advancement of
initiatives such as the Semantic Web and Linked Data [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. In this work, we present our approach
for the LM-KBC challenge [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] at ISWC 2023, which focuses on knowledge base construction
for 21 relations. The task of the challenge involves predicting objects based on given
subject1Our code and data are available at https://github.com/MaastrichtU-IDS/LMKBC-2023
LM-KBC’23: Knowledge Base Construction from Pre-trained Language Models, Challenge at ISWC 2023
£ dong.yang@maastrichtuniversity.nl (D. Yang); xu.wang@maastrichtuniversity.nl (X. Wang);
remzi.celebi@maastrichtuniversity.n (R. Celebi)
      </p>
      <p>© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CPWrEooUrckResehdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g CEUR Workshop Proceedings (CEUR-WS.org)
relation pairs. For example, given the subject-relation pair &lt;Canada, CountryBordersCountry&gt;,
the goal is to predict appropriate objects such as United States of America, Greenland. In this
challenge, each participant receives a set of subject-relation pairs and is tasked with identifying
the appropriate objects for these pairs. Each subject-relation pair can be associated with zero,
one, or multiple true objects, reflecting the complex nature of real-world scenarios.</p>
      <p>
        We participate in track 1 of the challenge [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], where the parameters of language model is
limited up to 1 billion. We selected the BERT [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] model as the encoder and performed the
Filled-Mask task to retrieve object candidates [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. For each [mask] token, the language model
independently assigns a confidence score to all tokens within its vocabulary. However, the
original filled-mask task was designed to select the best single candidate, rather than multiple
top candidates. Furthermore, the target object entity may consist of multiple tokens, and for a
given subject-relationship pair, the number of potential objects can vary. The original model
for filled-mask tasks is not inherently formulated to predict multiple objects comprised of
numerous tokens. For example (Figure 1 (a)), the correct answers of given subject-relation pair
&lt;Canada, CountryBordersCountry&gt; are combinations of tokens, such as (’Greenland’), (’United’,
’States’, ’of’, ’America’). However, extracting entities from the filled-mask task is not a
straightforward process, as it presents an permutation problem due to the language model’s
independent prediction of each token.
      </p>
      <p>To be able to predict multiple candidates with multiple tokens, we modify the token
embedding layer and the output embedding layer of BERT. We expand the vocabulary of language
model (e.g. BERT), and the object composed of multiple tokens (e.g. ”United States of America”)
is treated as a distinct token. As shown in Figure 1 (b), our approach entails grouping token
combinations into single tokens respectively. However, a drawback of this method is that the
newly created tokens cannot leverage the information provided by the language model.</p>
      <p>To address this challenge, we propose Vocabulary Expandable BERT (VE-BERT), which aims
to provide an initial semantic vector for newly added entities. We build a vocabulary by
querying entities on WikiData using the predicates defined by the challenge. We conduct
experiments to evaluate the efectiveness of TR, and the results demonstrate an obvious improvement
in the F1 score. This indicates that leveraging the Token Re-code task enhances the model’s
ability to align newly defined entities with their constituent tokens, thereby providing a more
accurate semantic representation for newly added entities than randomly initialized
embeddings. And compare TR with re-pretrain the model with additional raw text, TR do not need
extra training and achieves same improvement.</p>
      <p>We collect sentences from wikipedia, filter sentences according to the frequency of the
entities in our vocabulary. Experiments shows the task-specific pre–train is efective. We also
categorised the entities and find the best threshold of each predicates on valid set.</p>
      <p>The main contributions of this paper is:
• Proposing a method to expand the vocabulary of a language model (E.g. BERT) while
preserving the semantic meaning of the newly added entities.
• Conducting experiments to verify the efectiveness of re-pretrain on raw text for
knowledge base construction.</p>
      <p>We achieved 0.362 at validation set and 0.323 on hidden test set with bert-base-cased model,
which is a relative light language model, with only 110 million parameters. The performance
of</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <sec id="sec-2-1">
        <title>2.1. Masked Language Model</title>
        <p>
          The Bidirectional Encoder Representations from Transformers (BERT) [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] model has
transformed the field of natural language processing (NLP) since its introduction. BERT’s primary
contribution lies in its ability to generate contextualized word embeddings, capturing
bidirectional context information, and producing rich semantic representations. By pre-training on
large-scale unlabeled text using a masked language modeling objective, BERT learns a deep
representation of language structures, enabling it to capture complex linguistic patterns and
relationships. The BERT (Bidirectional Encoder Representations from Transformers) [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] model
uses two fundamental types of embeddings: input embeddings and output embeddings, each
of which serves a distinct yet interrelated role in the model’s functioning. Given an input
sequence of tokens  = (
        </p>
        <p>1,  2, ...,   ), where each   represents a token (word or sub-word) in
the sequence. The input tokens are first transformed into embeddings  = ( 1,  2, ...,   ), where
each   is the embedding representation of the token   . These embeddings are then processed
through multiple layers of Transformer architecture.</p>
        <p>(  | 1,  2, ...,   ) =  () × 
(1)
Where  (

| 1,  2, ...,   ) is the predicted probability distribution over the vocabulary for the
i-th position, conditioned on the embedding of all tokens in the sequence. The  refers to the
token embeddings of the input tokens  . The  refers to stacked transformer layers.  ∈  × ,
where l is the length of the width of last hidden layer of the stacked transformer layers  , v is
the number of the vocabulary.</p>
        <p>The input embeddings capture the inherent semantic and contextual information of the
input tokens. Specifically, BERT breaks down words into sub-word units (sub-tokens). These
subword embeddings are then combined with positional embeddings to encode both the content
and the position of the tokens within the input sequence. The input embeddings go through
a series of transformations as they pass through BERT’s layers. Initially, these embeddings
are fed into the model’s self-attention mechanism, which enables the model to capture
contextual relationships between tokens in both directions (left-to-right and right-to-left) in the
input sequence. This bidirectional context is a significant departure from previous models that
relied solely on left-to-right or right-to-left information flow. The output embeddings refer
to the representations of the tokens that are obtained after the input embeddings have been
processed through BERT’s layers. These output embeddings encapsulate the model’s learned
understanding of the input text’s semantics and context. Output embeddings can be utilized
for various downstream tasks, such as text classification, named entity recognition, question
answering, etc.</p>
        <p>
          The XLNet [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] and GPT (Generative Pre-trained Transformer) [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] models are two other
prominent advancements in the field of natural language processing (NLP). Both models have
made use of transformers and self-supervised learning techniques, leading to significant
contributions to language understanding and generation tasks. Gururangan [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] built separate
pretrained models for specific domains with a universal language model ROBERTA on four
domains (biomedical and computer science publications, news and reviews) and eight
classification tasks (two in each domain). Their experiments showed that continued pre-training with
additional corpus on the domain consistently improves performance on tasks from the target
domain, in both high- and low-resource settings.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Knowledge Base Construction</title>
        <p>
          Kalo [10] introduced a composite query-answering architecture that integrates knowledge
graphs with the BERT masked language model to augment the precision of query outcomes.
This approach fuses the structural and semantic attributes of knowledge graphs with the textual
knowledge from language models. The model uses multiple MASK tokens to predict tokens.
And then find the most likely entities from the combinations of the individual predictions. Li
[
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] is the winner of the LM-KBC 2023. They proposed a model based on BERT-large-cased, to
improve performance in the following three aspects: (1) LM representation of masked object
tokens;(2) entity generator; (3) candidate object selection. The author used additional triples
to train the language model and therefore, leading significant improvement. The skills related
to prompting can be categorized into four types: (1) incorporating type information to
entities; (2) simplifying and condensing prompts; (3) generating prompts by extracting relevant
sentences from Wikipedia; (4) selecting diferent prompts for the same relation based on the
type of entity. Besides, they remove pronouns and determiners from the candidates and find
the optimal threshold of each object-relation pair and use the original score of the predictions
rather than softmax.
        </p>
        <p>the country Canada
borders the country [mask]</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Model</title>
      <p>In Figure 2, we provide an overview of our framework. We introduce the Token Recode method,
which modify the token embedding layer and output embedding layer, to be able to encode
and predict the object entities with multiple tokens. The modified BERT is named the
Vocabulary Expandable BERT (VE-BERT). In Token Recode layer, we first initialize the token
embeddings and output embeddings for multi-token entities in a well-known Language Model,
specifically ”bert-base-cased” as referenced within this work. We re-pretrain the VE-BERT
based filled-masked model, using a corpus sourced from the Wikipedia pages, called
WikiCorpus. Sentences are chosen for pretraining based on how frequently the entities from our
WikiData-derived vocabulary appear together in the sentence. We describe the details about
the vocabulary generation in the following subsection. Finally, we disambiguate the predicted
entities by WikiData-API.</p>
      <p>We then fine-tune this filled-masked model using the training set provided within the
challenge and use the model to predict the object candidates for the validation set and test set. In
Inference step, we select the best thresholds for relation types based on the performance on the
validation set, and use the selected thresholds to predict the object candidates for the test set. In
addition, for the relations ”PersonHasNumberOfChildren” and ”SeriesHasNumberOfEpisodes”,
which have numbers in their range, we check whether the resulting candidates for an object is
a number.</p>
      <sec id="sec-3-1">
        <title>3.1. Vocabulary</title>
        <p>We categorized the entity by their roles in a triple (subject or object) as shown in Table 1.</p>
        <p>We created a task-specific vocabulary using entities from the challenge dataset (i.e., training,
validation and test set ). Additionally, we extended the entity set from WikiData Knowledge
Graph. We queried all the entities that has at least one relation listed in the challenge dataset.
Table 2 gives the number of collected entities for each type in the vocabulary.</p>
        <p>The BERT model is primarily designed for a ”filled-mask” or ”masked language modeling”
task, where certain tokens in a sentence are masked, and the model is trained to predict the
original tokens. The objective is to learn contextualized representations of words that take into
account their surrounding context.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Token Recode</title>
        <p>We introduce modifications to the token embedding and output embedding components of the
BERT model, enabling the generation of embeddings for newly introduced phrases based on
their constituent tokens. For instance, consider the entity ’United States of America’, which
is partitioned into individual tokens as [’United’, ’States’, ’of’, ’America’]. The token and
output embeddings for the phrase ’United States of America’ are computed as the average of the
respective embeddings for its tokens [’United’, ’States’, ’of’, ’America’].</p>
        <p>In a more general context, we denote the newly introduced phrases as  , with its associated
tokens represented as  = ( 1,  2,  3, … ,   ). The initial token embeddings for these tokens
are denoted as  = ( 1,  2,  3, … ,   , … ,   ), while the original output embeddings are denoted as
 = ( 1,  2,  3, … ,   , … ,   ). Subsequently, the new token embedding for word  is denoted
as  , and the new output embedding is denoted as  . Then we obtain the token embedding
and output embedding of a word  by averaging the original token embedding  and output
embedding  of its corresponding tokens  :</p>
        <p>Then we normalize the new token embedding  and new output embedding  of word  .
Where  ∈   ,  ∈   , The subscript  serves as the index corresponding to a particular
embedding vector.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Pre-training on Wikipedia</title>
        <p>We generate the embedding of a word of BERT model by deriving the embedding of its
constituent tokens. Furthermore, we pre-train the model by conducting filled-mask task on our
collected Wikipedia corpus. The sentence in our corpus is selected based on the criterion that
the sentences include the entities listed in our vocabulary. Table 3 indicates the number of
sentence that include specific entity type. Please note that a sentence can contain multiple
entities, resulting in the cumulative count of sentences for each entity type being greater than
the actual overall sentence count.
(2)
(3)
(4)
(5)</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Fine-tune on knowledge base construction</title>
        <p>
          Contemporary language models, exemplified by BERT [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], have undergone extensive training
on a corpus of diverse textual data at a significant scale. This inherent capacity for
comprehensiveness suggests that a process of ”rekindling” the models’ awareness of the specific categories
of information they are expected to recall during fine-tuning could potentially yield
performance enhancements. In congruence with this perspective, [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] have empirically demonstrated
the utility of fine-tuning in augmenting the performance of language models in knowledge
base construction.
        </p>
        <p>The process of fine-tuning for the knowledge bases construction can be summarized as
follows: given a subject-relation-object triple, this triple is transformed into a coherent sentence
using a corresponding prompt template. Relevant tokens related to the object entity are
hidden within the sentence. The subsequent task involves training BERT models using the masked
sentence as input, aiming to efectively uncover the hidden tokens.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <p>We use transformers on PyTorch to build our system, conducting experiments on V100 32G.
For pre-training on wiki sentence task, the learning-rate is set to 2 −5 and epoch numbers is 20
. For fine-tune task, the learning-rate is set to 2 −5 and the number of epoch is 5. We adopt the
disambiguation function released by the challenge. But for the predicates of which the type of
object entities is ”Number”, we used the predicted numbers directly. The code is available at
https://github.com/MaastrichtU-IDS/LMKBC-2023</p>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>
        Parameter quantity (billion)
0.11
0.345
1.3
175
We conducted experiments to evaluate the performance of diferent methods for knowledge
base construction. Table 4 summarizes the results of our experiments.   −  
models initialize prompt templates using subject-predicate pairs and subsequently predict missing
object entities. These models and hyperparameters is provided by the challenge LM-KBC 2023
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>The  −   method successfully improves extracting correct object entities, achieves
2 percentages improvement. Re-pretraining on Wikipedia sentences improves constructing
knowledge bases by almost 2 percentages. The performance of the method  −   is
comparable to the method   −    , without additional computation.</p>
      <p>Our final model, Vocabulary Expandable BERT (VE-BERT) achieves nearly 5 percentages
improvement, due to the combination of the methods  −   and   −    . This
shows that  −   method can provide a high quality initial embeddings for newly
added entities, and thus help the process of re-pretrain.</p>
      <p>In our final experiment, the Vocabulary Expandable BERT (VE-BERT) model shows a nearly
5% improvement. This increase is due to the combined use of two methods: token-recode and
re-pretrain. Interestingly, this combined improvement is greater than the sum of improvements
from using each method separately. This suggests that the token-recode method ofers
highquality initial embeddings for new entities, making the re-pretrain process more efective.</p>
      <p>
        Table 5 shows the final results for our model VE-BERT on the valid set of the challenge
LMKBC 2023 [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The result of VE-BERT demonstrates proficiency in predicting general
knowledge predicates, such as CompoundHasParts and CountryHasOficialLanguage . However, it
exhibits limitations in predicting privacy information, exemplified by predicates like
PersonHasAutobiography and PersonHasSpouse. Additionally, VE-BERT is constrained to predicting
multi-token entities present within its vocabulary. Owing to a relatively low occurrence of
Name entities within the vocabulary, the framework underperforms in predicting predicates
where the object type is Name, such as in PersonHasSpouse and BandHasMember.
      </p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>Our research aims to enhance the construction of knowledge bases through the employment of
lightweight Language Models, adhering to the constraints delineated in Track 1, which restricts
the model parameters to one billion. We introduce the Vocabulary Expandable BERT
(VEBERT), a modification of the standard BERT architecture that involves alterations to both the
input and output embedding layers. The model is further enriched by pre-training on a
taskspecific corpus and subsequent fine-tuning on a training set. Experimental results validate the
eficacy of our proposed token-recode method, which exhibits augmented performance when
coupled with a re-pretraining task.</p>
      <p>As we navigate the complexities of factual statement extraction from language models, we
identify two pivotal areas warranting future investigation: the application of VE-BERT in
linkprediction tasks within knowledge graphs and in missing value prediction within data
management.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>The authors thank the challenge organizers for their timely and helpful response to inquiries,
and the reviewers for their valuable comments. This work is supported by China Scholarship
Council (202207010004).
the 58th Annual Meeting of the Association for Computational Linguistics, Association
for Computational Linguistics, Online, 2020, pp. 8342–8360. URL: https://aclanthology.
org/2020.acl-main.740. doi:10.18653/v1/2020.acl- main.740.
[10] J.-C. Kalo, L. Fichtel, P. Ehler, W.-T. Balke, KnowlyBERT - Hybrid Query Answering over
Language Models and Knowledge Graphs, in: J. Z. Pan, V. Tamma, C. d’Amato, K.
Janowicz, B. Fu, A. Polleres, O. Seneviratne, L. Kagal (Eds.), The Semantic Web – ISWC 2020,
volume 12506, Springer International Publishing, Cham, 2020, pp. 294–310. URL: https://link.
springer.com/10.1007/978-3-030-62419-4_17. doi:10.1007/978- 3- 030- 62419- 4_17,
series Title: Lecture Notes in Computer Science.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>H. D.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.-V.</given-names>
            <surname>Tran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.-T.</given-names>
            <surname>Pham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. T.</given-names>
            <surname>Huynh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. T.</given-names>
            <surname>Pham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <article-title>Design intelligent educational chatbot for information retrieval based on integrated knowledge bases</article-title>
          ,
          <source>IAENG International Journal of Computer Science</source>
          <volume>49</volume>
          (
          <year>2022</year>
          )
          <fpage>531</fpage>
          -
          <lpage>541</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Qiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Huang,</surname>
          </string-name>
          <article-title>DKPLM: Decomposable Knowledge-Enhanced Pre-trained Language Model for Natural Language Understanding</article-title>
          ,
          <source>Proceedings of the AAAI Conference on Artificial Intelligence</source>
          <volume>36</volume>
          (
          <year>2022</year>
          )
          <fpage>11703</fpage>
          -
          <lpage>11711</lpage>
          . URL: https://ojs.aaai.org/index.php/AAAI/article/view/21425. doi:
          <volume>10</volume>
          . 1609/aaai.v36i10.21425, number:
          <fpage>10</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Bouaicha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Ghemmaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A Semantic</given-names>
            <surname>Interoperability</surname>
          </string-name>
          <article-title>Approach for Heterogeneous Meteorology Big IoT Data</article-title>
          , in: M.
          <string-name>
            <surname>R. Laouar</surname>
            ,
            <given-names>V. E.</given-names>
          </string-name>
          <string-name>
            <surname>Balas</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Lejdel</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Eom</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          <string-name>
            <surname>Boudia</surname>
          </string-name>
          (Eds.),
          <source>12th International Conference on Information Systems and Advanced Technologies “ICISAT 2022”, Lecture Notes in Networks and Systems</source>
          , Springer International Publishing, Cham,
          <year>2023</year>
          , pp.
          <fpage>214</fpage>
          -
          <lpage>225</lpage>
          . doi:
          <volume>10</volume>
          .1007/978- 3-
          <fpage>031</fpage>
          - 25344- 7_
          <fpage>20</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Singhania</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-C.</given-names>
            <surname>Kalo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Razniewski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Z.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <article-title>LM-KBC: Knowledge base construction from pre-trained language models, semantic web challenge @ ISWC, CEUR-WS (</article-title>
          <year>2023</year>
          ). URL: https://lm-kbc.github.io/challenge2023/.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pre-training of Deep Bidirectional Transformers for Language Understanding, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics</article-title>
          , Minneapolis, Minnesota,
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          . URL: https: //aclanthology.org/N19-1423. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>N19</fpage>
          - 1423.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Papasarantopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Vougiouklis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Z.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <article-title>Task-specific Pretraining and Prompt Decomposition for Knowledge Graph Population with Language Models</article-title>
          ,
          <year>2022</year>
          . URL: http://arxiv.org/abs/2208.12539, arXiv:
          <fpage>2208</fpage>
          .12539 [cs].
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          , J. Carbonell,
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Salakhutdinov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <article-title>Xlnet: Generalized autoregressive pretraining for language understanding</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>32</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>T. B. Brown</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Mann</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Ryder</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Subbiah</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Kaplan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Dhariwal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Neelakantan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Shyam</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Sastry</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Askell</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Agarwal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Herbert-Voss</surname>
            , G. Krueger,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Henighan</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Child</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Ramesh</surname>
            ,
            <given-names>D. M.</given-names>
          </string-name>
          <string-name>
            <surname>Ziegler</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Winter</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Hesse</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
            , E. Sigler,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Litwin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Gray</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Chess</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Berner</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>McCandlish</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Radford</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Amodei</surname>
          </string-name>
          ,
          <article-title>Language models are few-shot learners</article-title>
          ,
          <source>in: Proceedings of the 34th international conference on neural information processing systems</source>
          ,
          <source>NIPS'20</source>
          , Curran Associates Inc.,
          <string-name>
            <surname>Red</surname>
            <given-names>Hook</given-names>
          </string-name>
          ,
          <string-name>
            <surname>NY</surname>
          </string-name>
          , USA,
          <year>2020</year>
          . Number of pages:
          <volume>25</volume>
          Place: Vancouver, BC, Canada tex.
          <source>articleno: 159.</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Gururangan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Marasović</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Swayamdipta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Beltagy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Downey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. A.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <surname>Don't Stop</surname>
          </string-name>
          <article-title>Pretraining: Adapt Language Models to Domains and Tasks</article-title>
          , in: Proceedings of
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>