<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Improving Language Model Predictions via Prompts Enriched with Knowledge Graphs⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ryan Brate</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Minh-Hoang Dang</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fabian Hoppe</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yuan He</string-name>
          <xref ref-type="aff" rid="aff6">6</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Albert Meroño-Peñuela</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vijay Sadashivaiah</string-name>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>FIZ Karlsruhe, Leibniz Institute for Information Infrastructure</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>KNAW Humanities Cluster, Digital Humanities Lab</institution>
          ,
          <addr-line>Amsterdam</addr-line>
          ,
          <country country="NL">Netherlands</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Karlsruhe Institute of Technology, Institute AIFB</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>King's College London</institution>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>LS2N, Université de Nantes, Faculté des Sciences et Techniques (FST)</institution>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>Rensselaer Polytechnic Institute</institution>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff6">
          <label>6</label>
          <institution>University of Oxford</institution>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Despite advances in deep learning and knowledge graphs (KGs), using language models for natural language understanding and question answering remains a challenging task. Pre-trained language models (PLMs) have shown to be able to leverage contextual information, to complete cloze prompts, next sentence completion and question answering tasks in various domains. Unlike structured data querying in e.g. KGs, mapping an input question to data that may or may not be stored by the language model is not a simple task. Recent studies have highlighted the improvements that can be made to the quality of information retrieved from PLMs by adding auxiliary data to otherwise naive prompts. In this paper, we explore the efects of enriching prompts with additional contextual information leveraged from the Wikidata KG on language model performance. Specifically, we compare the performance of naive vs. KG-engineered cloze prompts for entity genre classification in the movie domain. Selecting a broad range of commonly available Wikidata properties, we show that enrichment of cloze-style prompts with Wikidata information can result in a significantly higher recall for the investigated BERT and RoBERTa large PLMs. However, it is also apparent that the optimum level of data enrichment difers between models.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Prompt Learning</kwd>
        <kwd>Pre-trained Language Model</kwd>
        <kwd>Knowledge Graph</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Pre-trained language models (PLMs) [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ], based on deep learning attention-based architectures,
have shown to have outstanding performance at various natural language processing (NLP)
tasks predicated on natural language understanding. However, the extent to which they capture
domain knowledge and empirical semantics [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] — i.e. the use of formal domain properties
in practice — is not well understood. In this work, we narrow down the focus to cloze-style
completion, the task of predicting the masked entity text in a sentence. For example, given:
“The Klingons are a species in the franchise [MASK]”, the PLM is expected to predict “Star
Trek” for [MASK]. It aims to extract the implicit knowledge entailed by the PLMs, since such
knowledge can be used for downstream NLP applications like sentiment analysis [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], dialogue
systems [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], and natural language inference [6], as well as for completing the missing information
of knowledge graphs (KGs) or ontologies [7], and even constructing new ones [8].
      </p>
      <p>In recent years, PLMs have improved on the state of the art in many NLP tasks by leveraging
large text corpora [9], but most of time they still require annotated data for task-specific
ifne-tuning [ 10]. However, the empirical semantics gathered by these models is limited to
distributional aspects [11]. Therefore, the performance, especially in the few- and zero-shot
setting, highly depends on the provided prompt, i.e. snippets of contextual information for a
specific task. However, in many cases the engineering of the prompts is naive and simplistic,
giving the PLM too little context to provide an accurate answer, and unsystematic, providing
little principles on how exactly these prompts need to be composed in order to have a predictable
behaviour. Indeed, recent studies [12] have highlighted the improvements that can be made to
the quality of information retrieved from PLMs by performing amendments to these prompts.
This casts doubts on some studies [13] that claim that a PLM cannot answer easy questions about
e.g. culture (movies, books, music, ...), it is reasonable to postulate that PLMs could perhaps
answer those questions accurately if they were provided with systematically engineered prompts
that contained richer contexts.</p>
      <p>
        Existing approaches of prompt engineering include: (i) learn-by-example, where the prompt
consists of the concatenation of correct examples we expect a PLM to predict [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]; (ii) manually
designed prompts of diferent granularities [ 13]; (iii) automatically searched prompts optimized
on few-shot samples [14], all of which rely on the implicit semantics of natural language texts.
In this paper, we investigate how incorporating explicit knowledge from external sources like
KGs can help prompt engineering and thus enhance the cloze-style question answering of PLMs.
Specifically, we explore cloze-style prompts with respect to the movie domain in respect of the
performance of the BERT and RoBERTa large PLMs.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Studies towards prompt learning are based on the hypothesis that pre-trained language models
(PLMs) have learnt abundant knowledge and just require suficiently detailed contexts for
predictions [
        <xref ref-type="bibr" rid="ref2">2, 10, 15</xref>
        ] — and in this way, it is possible to apply PLMs without data-driven
ifne-tuning. A (hard 1) prompt is the conditioning text which is combinded with the input to
provide contexts or hints for the PLM. A template (i.e. pattern) is a function that integrates the
inputs and prompts. Answers are then given by the PLMs conditioned on the prompts, and a
further function (i.e. verbalizer) is often required to map the answers to the final outputs. The
reason for that is, the prompt learning paradigm is typically formulated as a similar task to the
PLM’s pre-training task, which does not necessarily yield the desired outputs of downstream
applications.
1Soft prompts are learnt at the embedding level.
      </p>
      <p>
        An important part of prompt learning is prompt engineering, i.e., to design template(s), either
manually or automatically, to support downstream applications. In [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], Brown et al. proposed
to use demonstrations, i.e., a sequence of input-output texts, as the prompts, expecting that the
PLM can implicitly learn to predict from examples. For instance, if we want the PLM to predict
the masked position in “[MASK] is the capital of China.”, we can demonstrate by appending
“London is the capital of the UK” after the masked sentence. Schick et al. [16] manually designed
diferent templates, each corresponding to an individual PLM trained on few-shot examples. The
predictions of downstream text classification and natural language inference tasks were then
made according to an ensemble of trained PLMs. Shin et al. [14] argued that manually designed
templates sufer from the uncertainty of guesswork or the lack of domain expertise. Therefore,
they proposed to search for templates using gradient-based optimization. More recently, Lu
et al. [17] have shown that PLMs performance varies with the order of these prompts, and
use generative language models and entropy statistics on the prompt permutations to identify
prompts with good performance.
      </p>
      <p>
        KGs or ontologies are excellent sources for providing explicit knowledge to enrich prompts or
verbalizers. West et al. [18] considered distilling a student model in the common sense domain
from the enormously large PLM GPT-3 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], which serves as the teacher model. They adopted
the prompt learning scheme to extract triples from the teacher model with templates created
and examples extracted from the common sense KG Atomic [19]. Hu et al. [7] argued that the
label word space (i.e., the answer space) can be well expanded by adding in external knowledge
about related words. They employed diferent refinement heuristics to shortlist candidates
to benefit the downstream classification task. For instance, if some “Person” is classified as a
“Physicist” in the ground truth data, then answers like “Scientist” will also be accepted.
      </p>
      <p>
        Our work was motivated by the probing study of Penha et al. [13] that investigates whether
BERT (a well-known PLM consisting of stacked transformer encoders [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]) actually knows
superficial cultural knowledge about books, movies, and music. Cloze-style questions for
classifying the genre of entities (from Wikidata) of diferent books, movies, and music were given
for the PLM to answer, often with unsatisfying performance. However, their work considered
naive prompts without suficient contexts, while ours attempts to examine if KGs can enrich
these prompts, especially giving additional contexts (e.g., attributes, -hop neighbours) of the
entities in order to help the PLM to generate better predictions.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>The basic idea of our method is to use the information about entities in KGs to expand cloze-style
prompts with richer entity descriptions. It is summarized in Figure 1. We enrich the naive
prompt, for example Die Hard is of genre [MASK], through matching the movie Die Hard
to the corresponding Wikidata item and extract auxiliary knowledge with SPARQL queries,
and generating an enriched prompt using this auxiliary data. We use datatype properties and
verbalize entities using rdfs:label to compose valid phrases. As a result, we obtain e.g. Die
Hard is a movie, starring Bruce Willis, directed by John McTier- nan,
of the genre [MASK].</p>
      <p>We then use both (a) the naive prompts and (b) the KG-enriched prompts to query various
language models, and compare their performance on the entity genre classification task. In
the following paragraphs the enrichment by KG querying and the prompt engineering step are
described in detail.</p>
      <sec id="sec-3-1">
        <title>3.1. Knowledge Graph Querying</title>
        <p>The auxiliary data for each movie is extracted from Wikidata. This is done in a simplistic
two-step-process using SPARQL queries. The queries operate on a batch of input records to
reduce the number of requests and avoid timeout errors.</p>
        <p>First, the movies are linked to their respected Wikidata entities by IMDb or TMDB ID utilizing
the Wikidata properties IMDb ID (wdt:P345) and TMDb movie ID (wdt:P4947). If this does not
provide an entity, an exact string matching given the title is attempted as well.
SELECT ?mlId ?imdbId ?tmdbId ?movie
WHERE {</p>
        <p>VALUES (?mlId ?imdbId ?tmdbId) {("1" "tt0114709" "862" ) ... }
{?movie wdt:P345 ?imdbId . }
UNION
{?movie wdt:P4947 ?tmdbId .}
}</p>
        <p>Listing 1: SPARQL query used for entity linking with the IMDb or TMDB ID.</p>
        <p>The second step queries the entities for the auxiliary data used to enrich the prompts with
additional contextual information. Overall, a set of 28 properties was extracted and investigated
for each entity. A simplified version of the utilized SPARQL query is given in 2. This query can
easily be adapted to query other properties by adding these properties to the ?property values.
From this set of properties a subset of 10 manually selected domain-specific properties are used
to constract the enriched prompts. The properties are selected based on human intuition and
the most frequent co-occurrence for the given entities.</p>
        <p>SELECT ?mlId (SAMPLE(?movieLabel) AS ?movieLabel) (SAMPLE(?propertyLabel) AS ?propertyLabel)
(GROUP_CONCAT(DISTINCT ?objectLabel; SEPARATOR=", ") AS ?objectList)
WHERE{</p>
        <p>VALUES (?mlId ?movie) { ("1" ) ...}
?movie rdfs:label ?movieLabel .</p>
        <p>FILTER (LANG(?movieLabel)="en")
VALUES ?property {wdt:P144 wdt:P179 ...}
?p1 wikibase:directClaim ?property .
?p1 rdfs:label ?propertyLabel .</p>
        <p>FILTER (LANG(?propertyLabel)="en")
OPTIONAL {
?movie ?property ?object .
?object rdfs:label ?objectLabel .</p>
        <p>FILTER (LANG(?objectLabel)="en")
}
GROUP BY ?mlId ?property
Listing 2: Simplified SPARQL query used to retrieve additional movie knowledge from Wikidata.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Prompt Engineering</title>
        <p>Similarly to [13], we consider an entity genre classification task. The prompts are of the
form: “&lt;title&gt; is a movie &lt;Wikidata enrichment&gt;, of the genre [MASK].”, where &lt;Wikidata
enrichment&gt; is an aggregation of movie properties and corresponding values extracted from
Wikidata pertaining the title in question, in some natural language format. Table 1 lists the
Wikidata properties used to assemble values for &lt;Wikidata enrichment&gt;.</p>
        <p>Wikidata property
wdt:P161
wdt:P57
wdt:P162
wdt:P58
wdt:P86
wdt:P1040
wdt:P577
wdt:P750
wdt:P495</p>
        <p>Property Label
cast member
director
producer
screenwriter
composer
film editor</p>
        <p>year
distributed by
country of origin</p>
        <p>Enrichment Text</p>
        <p>starring
directed by
produced by
screenwriter
music by
edited by
released
distributed by
originating from</p>
        <p>The Wikidata properties listed in Table 1 are broadly ranked in descending information
specificity. It was in this order, that ten variations for a probe were constructed, by sequentially
adding Wikidata properties to prompts, building gradually more contextual-information dense
prompts. In adding property information, only the first value of each Wikidata property was
used where more than one was available (e.g., the first listed cast member). E.g., as follows; the
unenriched prompt, the first two successive prompt enrichments, and the final enriched form
pertaining to the movie Die Hard.:
• non-enriched prompt: Die Hard is a movie, of the genre [MASK].
• enriched Prompt 1(A): Die Hard is a movie, starring Bruce Willis, of the genre [MASK].
• enriched Prompt 2(A): Die Hard is a movie, starring Bruce Willis, directed by John
McTiernan, of the genre [MASK].
• enriched Prompt 9(A): Die Hard is a movie, starring Bruce Willis, directed by John
McTiernan, produced by Joel Silver, screenwriter Roderick Thorp, music by Michael Kamen,
edited by John F. Link, released 1988, distributed by Netflix, originating from United States
of America, of the genre [MASK].</p>
        <p>Given the potential for sensitivity of PLMs to the verbalisation strategy used in the construction
of cloze-stype prompts, we considered two verbalisation strategies for aggregation of the
additional Wikidata properties. Whereas the above verbalisation strategy A form is aggregated
with commas, the verbalisation strategy B form is aggregated with and tokens. E.g.:
• enriched Prompt 1(B): Die Hard is a movie and starring Bruce Willis, of the genre [MASK].
• enriched Prompt 2(B): Die Hard is a movie and starring Bruce Willis and directed by John</p>
        <p>McTiernan, of the genre [MASK].
• enriched Prompt 9(B): Die Hard is a movie and starring Bruce Willis and directed by
John McTiernan and produced by Joel Silver and screenwriter Roderick Thorp and music by
Michael Kamen and edited by John F. Link and released 1988 and distributed by Netflix and
originating from United States of America, of the genre [MASK].</p>
        <p>Thus, in total 19 prompt variations are considered for each movie.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Evaluation</title>
      <sec id="sec-4-1">
        <title>4.1. Dataset</title>
        <p>
          In order to test our approach, we use the BERT [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] and RoBERTa large [20] pre-trained models.
The test dataset we are using is a subset of ML25M from IMDB [21]. ML25M contains title
and ground truth genre classification of a range of 54,758 movies. A subset of this dataset was
then assembled, as those movies for which the Wikidata properties as listed in Table 1 were
present in full. This resulted in a test set of 9,596 movie titles. The Wikidata properties, and
thus the corresponding data subset, were selected as a compromise between a large dataset,
and a diverse set of domain-relevant Wikidata properties, following exploratory analysis of the
ML25M dataset.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Results</title>
        <p>0.448
0.487
0.498
0.305
0.258
0.090
0.062
0.214
0.065
0.083
0.198
0.264
0.297
0.180
0.100
0.115
0.053
0.556
0.466
0.536</p>
        <sec id="sec-4-2-1">
          <title>Prompt R@1</title>
          <p>non-enriched 0.136
1 0.139
2 0.161
3 0.092
4 0.024
5 0.017
6 0.004
7 0.055
8 0.047
9 0.020</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Discussion</title>
        <p>The results and analysis of Section 4.2 give support to the position that, when considered
enmasse, enrichment of prompts with domain-relevant information from Wikidata can improve
cloze-style genre prediction in the movie domain. This is the case for both of the investigated
verbalisation strategies.</p>
        <p>mean test
diference statistic
p-values</p>
        <sec id="sec-4-3-1">
          <title>BERT</title>
        </sec>
        <sec id="sec-4-3-2">
          <title>RoBERTa large Verbalisation Strategy A</title>
          <p>0.0245
0.0672</p>
          <p>Note: * denotes that the p-value is 0 to at least 3 significant figures.</p>
          <p>It is noteworthy, however, that the BERT and RoBERTa large models behave very diferently
in terms of both their non-enriched performance and their performance when subject to varying
levels of enrichment. This is demonstrative of the potential for PLM improvement via prompt
enrichment as being highly specific to the model in question. BERT demonstrates optimum recall
performance in aggregate for those enriched prompts with relatively low levels of information
enrichment, followed by a very rapid reduction in recall@n for further enriched prompts.
Whereas RoBERTa large demonstrates fluctuating performance relative to the non-enriched
prompt, with the greatest performance shown in the more information-rich prompts.</p>
          <p>It is beyond the scope of this paper to disentangle the role of information variety and the
specific information types themselves, as to the influence on prediction outcomes. However,
there are preliminary indications of complex interactions. For example, as shown in Table 2,
prompt 7 (verbalisation strategy A) applied to RoBERTa large shows a huge spike in improved
performance over the worst performing prompt 6, which adds the release date information.
Analysis of a verbalisation strategy A prompt enriched only by release date alone, explains a large
portion of the improvement (recall@1 = 0.167, recall@5 = 0.48). However, the overall context
provided by prompt 7 results in the best performance overall: A one-tail dependent t test between
prompt 7 and the case of enrichment by only release date, demonstrates significant non-zero
diferences, in the direction of greater prompt 7 performance for each of recall@1 and recall@5.
Both tests reporting a p–value close to 0, with respect to a 0.05 significance. Accordingly, the
results are suggestive of further investigative work being required to understand better the
interactive efect of information enrichment on whatever model, domain, and task to which
such enriched prompts may be applied.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>Given that PLMs are limited in performance for domain-specific cloze-style question answering
prompts, in this paper we examine how adding additional context to naive prompts from KGs can
improve the performance of PLMs on a movie genre prediction task. Through our experiments,
we show a statistically significant improvement in recall on prompts enriched with information
from the Wikidata KG in comparison to non-enriched prompts on the BERT and RoBERTa large
PLMs.</p>
      <p>As future work, we plan to expand our study to include more domains such as books, music
etc. to better understand domain-specific optimum characteristics for enrichment, and cover
the same domains as similar previous work [13]. Additionally, we look forward to enriching
prompts using web entities [22]. These entities are embedded in HTML pages on the web
using Microformat, Microdata and RDFa from the Common Crawl web corpus, the largest and
most up-to-date data web corpus available to the public. As more and more websites embed
structured data describing for instance products, people, organizations, places, events, resumes,
and cooking recipes, the engineered prompts covered domain-specific knowledge that is not
present in the encyclopedic Wikidata.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <p>We would like to thank the International Semantic Web Summer School 2022, which initiated
the collaboration between the authors in producing this paper. This work was funded in-part
by: ‘Culturally Aware AI’ funded by NWO, the ANR-19-CE23-0014 DeKaloG project (CE23
- Intelligence artificielle) and the CominLabs MiKroloG project, Samsung Research UK. This
project has received funding from the European Union’s Horizon 2020 research and innovation
programme under grant agreement No 101004746.
[6] K. Qi, H. Wan, J. Du, H. Chen, Enhancing cross-lingual natural language inference by
promptlearning from cross-lingual templates, in: Proceedings of the 60th Annual Meeting of the Association
for Computational Linguistics (Volume 1: Long Papers), 2022, pp. 1910–1923.
[7] S. Hu, N. Ding, H. Wang, Z. Liu, J. Wang, J. Li, W. Wu, M. Sun, Knowledgeable prompt-tuning:
Incorporating knowledge into prompt verbalizer for text classification, in: Proceedings of the 60th
Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022,
pp. 2225–2240.
[8] B. Heinzerling, K. Inui, Language models as knowledge bases: On entity representations, storage
capacity, and paraphrased queries, ArXiv abs/2008.09036 (2021).
[9] S. Ruder, M. E. Peters, S. Swayamdipta, T. Wolf, Transfer learning in natural language processing,
in: Proceedings of the 2019 conference of the North American chapter of the association for
computational linguistics: Tutorials, 2019, pp. 15–18.
[10] P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, G. Neubig, Pre-train, prompt, and predict: A systematic
survey of prompting methods in natural language processing, arXiv preprint arXiv:2107.13586
(2021).
[11] T. Mickus, D. Paperno, M. Constant, K. van Deemter, What do you mean, bert? assessing bert as a
distributional semantics model, ArXiv abs/1911.05758 (2019).
[12] Z. Jiang, F. F. Xu, J. Araki, G. Neubig, How can we know what language models know?, 2019. URL:
https://arxiv.org/abs/1911.12543. doi:10.48550/ARXIV.1911.12543.
[13] G. Penha, C. Hauf, What does bert know about books, movies and music? probing bert for
conversational recommendation, Fourteenth ACM Conference on Recommender Systems (2020).
[14] T. Shin, Y. Razeghi, R. L. L. IV, E. Wallace, S. Singh, Eliciting knowledge from language models
using automatically generated prompts, ArXiv abs/2010.15980 (2020).
[15] S. Min, M. Lewis, H. Hajishirzi, L. Zettlemoyer, Noisy channel language model prompting for
few-shot text classification, in: Proceedings of the 60th Annual Meeting of the Association for
Computational Linguistics (Volume 1: Long Papers), 2022, pp. 5316–5330.
[16] T. Schick, H. Schütze, Exploiting cloze-questions for few-shot text classification and natural language
inference, in: Proceedings of the 16th Conference of the European Chapter of the Association for
Computational Linguistics: Main Volume, 2021, pp. 255–269.
[17] Y. Lu, M. Bartolo, A. Moore, S. Riedel, P. Stenetorp, Fantastically ordered prompts and where to
ifnd them: Overcoming few-shot prompt order sensitivity, arXiv preprint arXiv:2104.08786 (2021).
[18] P. West, C. Bhagavatula, J. Hessel, J. D. Hwang, L. Jiang, R. L. Bras, X. Lu, S. Welleck, Y. Choi,
Symbolic knowledge distillation: from general language models to commonsense models, ArXiv
abs/2110.07178 (2021).
[19] M. Sap, R. Le Bras, E. Allaway, C. Bhagavatula, N. Lourie, H. Rashkin, B. Roof, N. A. Smith, Y. Choi,
Atomic: An atlas of machine commonsense for if-then reasoning, in: Proceedings of the AAAI
conference on artificial intelligence, volume 33, 2019, pp. 3027–3035.
[20] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov,
Roberta: A robustly optimized BERT pretraining approach, CoRR abs/1907.11692 (2019). URL:
http://arxiv.org/abs/1907.11692. arXiv:1907.11692.
[21] F. M. Harper, J. A. Konstan, The movielens datasets: History and context, Acm transactions on
interactive intelligent systems (tiis) 5 (2015) 1–19.
[22] H. Mühleisen, C. Bizer, Web data commons-extracting structured data from two large web corpora,
in: LDOW, 2012.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          , ArXiv abs/
          <year>1810</year>
          .04805 (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T.</given-names>
            <surname>Brown</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ryder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Subbiah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Kaplan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dhariwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Neelakantan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shyam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sastry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Askell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Herbert-Voss</surname>
          </string-name>
          , G. Krueger,
          <string-name>
            <given-names>T.</given-names>
            <surname>Henighan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Child</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ramesh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ziegler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Winter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hesse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chen</surname>
          </string-name>
          , E. Sigler,
          <string-name>
            <given-names>M.</given-names>
            <surname>Litwin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chess</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Berner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>McCandlish</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Sutskever</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Amodei</surname>
          </string-name>
          ,
          <article-title>Language models are few-shot learners</article-title>
          , in: H.
          <string-name>
            <surname>Larochelle</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Ranzato</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Hadsell</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Balcan</surname>
          </string-name>
          , H. Lin (Eds.),
          <source>Advances in Neural Information Processing Systems</source>
          , volume
          <volume>33</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          , Inc.,
          <year>2020</year>
          , pp.
          <fpage>1877</fpage>
          -
          <lpage>1901</lpage>
          . URL: https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Asprino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Beek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ciancarini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. v.</given-names>
            <surname>Harmelen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Presutti</surname>
          </string-name>
          ,
          <article-title>Observing lod using equivalent set graphs: it is mostly flat and sparsely linked</article-title>
          , in: International Semantic Web Conference, Springer,
          <year>2019</year>
          , pp.
          <fpage>57</fpage>
          -
          <lpage>74</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , T. Chai,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <article-title>Adaptive prompt learning-based few-shot sentiment analysis</article-title>
          ,
          <source>ArXiv abs/2205</source>
          .07220 (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kasahara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kawahara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Shinzato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Sato</surname>
          </string-name>
          ,
          <article-title>Building a personalized dialogue system with prompt-tuning</article-title>
          ,
          <source>ArXiv abs/2206</source>
          .05399 (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>