<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Knowledge Base Construction from Pre-trained Language Models by Prompt learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Xiao Ning</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Remzi Celebi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Data Science, Faulty of Science and Engineering, Maastricht University</institution>
          ,
          <country country="NL">the Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Biological Science and Medical Engineering, Southeast University</institution>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Pre-trained language models (LMs) have advanced the state-of-the-art for many semantic tasks and have also been proven efective for extracting knowledge from the models itself. Although several works have explored the capability of the LMs for constructing knowledge bases, including prompt learning, this potential has not yet been fully explored. In this work, we propose a method of extracting factual knowledge from LMs for given subject-relation pairs and explore the most efective strategy to generate blank object entities for each relation of triples. We design prompt templates for each relation using personal knowledge and the descriptive information available on the web such as WikiData. The probing approach of our proposed LMs is tested on the dataset provided by the International Semantic Web Conference (ISWC 2022) LM-KBC Challenge. To cope with the problem of varying performance for each relation, we designed a parameter selection strategy for each relation. Using the test dataset, we obtain an F1-score of 0.4935%, which is higher than the baseline of 31.08%.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Prompt learning</kwd>
        <kwd>Pre-trained language model</kwd>
        <kwd>Information Extraction</kwd>
        <kwd>Link Prediction</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        tasks based on a gradient-guided search [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Prompt learning does not require a large amount of
labeled data or introduce a large number of additional parameters, which leads to a more useful
analysis tool and has been widely used in many domains, such as name entity recognition [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ],
information extraction [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], question answer [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Nevertheless, prompting requires the manual
design of the context to feed into the model, designing eficient prompt templates directly afects
the performance of the model.
      </p>
      <p>
        In this work, we develop a system for track 1 of the LM-KBC challenge, a challenge that aims
to explore the viability of knowledge base construction from BERT [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] with low computational
requirements. We propose an automatic method to systematically improve the performance of
the prompts used to query the relations from pre-trained model. Our method is based on
bertlarge-cased 1 due to existing studies demonstrating its outstanding performance. This method
is based on mining or paraphrasing that takes one prompt feed to the model. Considering that
diferent prompts may have performance diferences when used to query diferent relations, we
also combined answers from diferent prompts together. The data, code and learned models
associated with this work can be accessed in the Github repository 2.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Prompt Generation</title>
      <p>
        We define prompt generation as the task of generating a set of prompts ,=1 for each relation r,
where at least some of the prompts efectively trigger LMs to predict ground-truth object-entities.
Our method is inspired by template-based relation extraction methods, which are based on
the observation that words in the vicinity of the subject s and object o in a large corpus often
describe the relation r. We got the corresponding alternative description for each relation from
the descriptive information in WikiData. Inspired by a template-based approach for relation
extraction, we created prompt templates based on diferent descriptive information combined
with professional knowledge. The three main method [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] we used in this challenge are below.
      </p>
      <p>Middle-word Prompt Based on the observation that words in the middle of the subject and
object are often indicative of the relation, we directly use those words as prompts. For example,
Sergey Brin set up Google is converted into a prompt s set up o by replacing the subject and object
with placeholders. For CountryBordersWithCountry relation, we design " {_}
shares border with {_}." as one prompt.</p>
      <p>
        Dependency-based Prompt In cases of templates where words do not appear in the middle,
templates based on syntactic analysis of the sentence can be more efective for relation extraction
tasks [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. For instance, the dependency path in The capital of China is Beijing giving a prompt of
capital of s is o. For CompanyParentOrganization relation, we designed "The parent organisation
of {_} is {_}." as one prompt.
      </p>
      <p>Paraphrasing-based Generation To improve lexical diversity while remaining relatively
faithful to the original prompt, we paraphrased the original prompt with other semantically
similar or identical expressions. When the prompt is s shares a border with o, it may be
paraphrased as s borders with o and s is next to o. This is conceptually similar to the query expansion</p>
      <sec id="sec-2-1">
        <title>1https://huggingface.co/bert-large-cased</title>
        <p>
          2https://github.com/xiao-nx/LMKBC_2022
techniques which are used in information retrieval to reformulate a given query to improve
retrieval performance [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Prompt Selection and Ensemble</title>
      <p>In the previous section, we describe methods of generating a set of candidate prompts {,}=1
for a particular relation r. Each of these prompts may be more or less efective in eliciting
knowledge from the LMs, and thus it is necessary to decide how to use these generated prompts
during the test. In this section, we discuss the approaches explored for generating better
candidate objects by prompt-based link-prediction. Our eforts here can be broadly classified
into two categories: using better prompts and ensemble the prompts.</p>
      <sec id="sec-3-1">
        <title>3.1. Selection of the Top-k Prompts</title>
        <p>To find the prompts which better elicit the pre-trained model better, we designed prompts
considering both a priori knowledge and synonyms as potential prompts. For each prompt, we
can measure its precision, recall and F1-score of predicting the ground-truth objects on the
training data, and keep several the top-performing prompts based on F1-score.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Ensemble Prompts</title>
        <p>We do not observe the same scale of improvement with increasing number of prompts involved;
in fact, most of the time the best F-1 score is achieved with one prompt template. We argue that
this diference is due to the diference in the evaluation metrics: we pay attention to the F-1
scores rather than the macro-averaged accuracy scores, which give higher importance to the
precision of methods. Therefore, considering that having a variety of prompts may allow for
elicitation of knowledge that appeared in these diferent contexts, we rank all the prompts based
on their performance of predicting the objects in the training set and keep the prompts with an
F1-score higher than 0.1 or top 5. Although treating the top-k prompts equally is sub-optimal
as some prompts are more reliable than others.</p>
        <p>For every relation in the dataset, we use all filtered prompts to query the pre-trained language
model, and every prompt will return a set of object entities. Then it is important to select the
most accurate object entities. Here, we developed an algorithm that considers synthetically the
frequency and probability of each predicted object-entities, and finally keep the top 5 candidates.
Note that there often exist pronouns, such as him, them, it, or determiners, such as the, a, any
in the top predicted objects, or other symbols, such as ?, 1970s, -s, so we removed these words.
In addition to that, we mapped the music in the predicted result into producer, acting into actor,
teacher into professor, water into hydrogen.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <sec id="sec-4-1">
        <title>4.1. Dataset</title>
        <p>The dataset for this challenge is divided into a training data, development data and test data,
each covering a diferent set of subject-entities and along with complete list ground-truth
object-entities per subject-relation-pair. The training data subject-relation-object triples can be
used for training or probing the language models in any form, while development can be used
for hyper-parameter tuning, and the test data is used to measure the performance of the final
submitted system. Our proposed method is free from finetuning, so we just use the training
data to test the performance of system tool and adjust parameters manually, then submitted the
developed system tool.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Experimental Settings</title>
        <p>Single Prompt Experiments For each prompt we designed, its corresponding performance
was tested on the training set. The performance of top-3 is shown in Table 1.</p>
        <p>Ensemble Prompts Experiments For some relations with low recall, we combined several
prompts and rank them as the final results to get more object entities. We labeled the top
3 prompts as prompt1, prompt2, prompt3, and tried to evaluate the performance of
ensemble [prompt1, prompt2, prompt3], [prompt1, prompt1], [prompt1, prompt3], [prompt2, prompt3]
prompts and then took the best performing combination on the training data.</p>
        <p>Search Threshold Experiments Another observation is that the threshold strongly afects
the recall of the prediction results, and it is possible to obtain more object entities by lowering
the threshold. Thus, we searched various thresholds to optimize the F-1 scores, and select the
best thresholds based on the training data. According to the formula of F1 score, it is known
that the F1 score achieves its maximum value when the accuracy and recall are close to each
other, therefore we adjusted the threshold to search for the F1 score of optimal performance.
Actually, in our experiments we performed only a small range of searching, but in order to
show the efect of threshold on F1 socore clearly, we search the thresholds between 0.01 and
0.99 by steps of 0.01 and plotted in the Figure 1.</p>
        <p>System Tool In this section, We present the prompt or the combination of prompts used for
each relation and the corresponding threshold value, as shown in Table 2.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Final Test Results</title>
        <p>As for the models to probe, in our main experiments, we use the BERT-large models. We use
three metrics to evaluate the success of the prompts in probing LMs, precision, recall, and
F1-score. The final performance of our proposed method on the test data can be seen in Table 3,
as recorded on CodaLab 3.</p>
        <sec id="sec-4-3-1">
          <title>3https://codalab.lisn.upsaclay.fr/competitions/5815</title>
          <p>s denotes {_}, o denotes {_}.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>Prompt learning exploits the powerful capability of pre-trained language models, and
significantly minimizes the dependence on supervised data. Prompt learning enables shot learning and
even zero shot learning, which is a promising application for NLP downstream tasks, especially
1.0
0.9
0.8
0.1
ChemicalCompoundElement
CompanyParentOrganization
CountryBordersWithCountry
CountryOfficialLanguage
PersonCauseOfDeath
PersonEmployer
PersonInstrument
PersonLanguage
PersonPlaceOfDeath
PersonProfession
RiverBasinsCountry
StateSharesBorderState
information extraction. In this paper, we have applied diferent prompting techniques to extract
factual knowledge from pre-trained language models. We also designed various templates to
generate diverse prompts to query specific pieces of relational knowledge. Experiments show
that LMs are indeed reliable knowledge sources than initially indicated by previous results,
but they are also quite sensitive to the way we query them. We have made significant success
compared to the baseline method by generating more efective prompts, ensemble prompts
and search diferent thresholds. It is promising to improve the accuracy of factual knowledge
retrieval by prompt design strategies for each relation. However, how to create a prompt, how
to select the language model, how to construct answer candidates, how to map answers to final
outputs, and how to find an optimal configuration for downstream tasks is still an on-going
exploration.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Acknowledgments</title>
      <p>Thanks to Shuai Wang, an excellent software engineer from Amazon, he introduced several
practical scripts for me to automate run the code, which significantly increased the eficiency
of experiments. Furthermore, the experiment part of this research was made possible, in part,
using the Data Science Research Infrastructure (DSRI) hosted at Maastricht University.</p>
      <p>Relations
ChemicalCompoundElement
CompanyParentOrganization
CountryBordersWithCountry</p>
      <p>CountryOficialLanguage</p>
      <p>PersonCauseOfDeath</p>
      <p>PersonEmployer
PersonInstrument</p>
      <p>PersonLanguage
PersonPlaceOfDeath</p>
      <p>PersonProfession</p>
      <p>RiverBasinsCountry
StateSharesBorderState</p>
      <p>Average</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          , arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>04805</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <article-title>Roberta: A robustly optimized bert pretraining approach</article-title>
          , arXiv preprint arXiv:
          <year>1907</year>
          .
          <volume>11692</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hayashi</surname>
          </string-name>
          , G. Neubig,
          <article-title>Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing</article-title>
          ,
          <source>arXiv preprint arXiv:2107.13586</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fisch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>Making pre-trained language models better few-shot learners</article-title>
          , arXiv preprint arXiv:
          <year>2012</year>
          .
          <volume>15723</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. F.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Araki</surname>
          </string-name>
          , G. Neubig,
          <article-title>How can we know what language models know?, Transactions of the Association for Computational Linguistics 8 (</article-title>
          <year>2020</year>
          )
          <fpage>423</fpage>
          -
          <lpage>438</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>Shin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Razeghi</surname>
          </string-name>
          , R. L.
          <string-name>
            <surname>Logan</surname>
            <given-names>IV</given-names>
          </string-name>
          , E. Wallace,
          <string-name>
            <given-names>S.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <surname>Autoprompt:</surname>
          </string-name>
          <article-title>Eliciting knowledge from language models with automatically generated prompts</article-title>
          , arXiv preprint arXiv:
          <year>2010</year>
          .
          <volume>15980</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>L.</given-names>
            <surname>Cui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          , J. Liu,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y. Zhang,</surname>
          </string-name>
          <article-title>Template-based named entity recognition using bart</article-title>
          ,
          <source>arXiv preprint arXiv:2106.01760</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Han</surname>
          </string-name>
          ,
          <string-name>
            <surname>L</surname>
          </string-name>
          . Sun, H. Wu,
          <article-title>Unified structure generation for universal information extraction</article-title>
          ,
          <source>arXiv preprint arXiv:2203.12277</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Lazaridou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Gribovskaya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Stokowiec</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Grigorev</surname>
          </string-name>
          ,
          <article-title>Internet-augmented language models through few-shot prompting for open-domain question answering</article-title>
          ,
          <source>arXiv preprint arXiv:2203.05115</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pantel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Poon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Choudhury</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gamon</surname>
          </string-name>
          ,
          <article-title>Representing text for joint embedding of text and knowledge bases</article-title>
          ,
          <source>in: Proceedings of the 2015 conference on empirical methods in natural language processing</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>1499</fpage>
          -
          <lpage>1509</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>C.</given-names>
            <surname>Carpineto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Romano</surname>
          </string-name>
          ,
          <article-title>A survey of automatic query expansion in information retrieval, Acm Computing Surveys (CSUR) 44 (</article-title>
          <year>2012</year>
          )
          <fpage>1</fpage>
          -
          <lpage>50</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>