<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Challenge on Knowledge Base Construction from Pre-trained Language Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jan-Christoph Kalo</string-name>
          <email>j.c.kalo@uva.nl</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sneha Singhania</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Simon Razniewski</string-name>
          <email>Simon.Razniewski@de.bosch.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jef Z. Pan</string-name>
          <email>j.z.pan@ed.ac.uk</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Large Language Models, Knowledge Base Construction, Information Extraction, Prompting</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Bosch Center for AI</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Max Planck Institute for Informatics</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>The University of Edinburgh</institution>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University of Amsterdam</institution>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Large language models (LLMs) like chatGPT [1] have advanced a range of semantic tasks and are being ubiquitously used for knowledge extraction. Although several works have explored this ability by crafting prompts with in-context or instruction learning, the viability of complete and precise knowledge base construction from LMs is still in its nascent form. In the 2nd edition of this challenge, we invited participants to extract disambiguated knowledge triples from LMs for a given set of subjects and relations. In crucial diference to existing probing benchmarks like LAMA [ 2], we made no simplifying assumptions on relation cardinalities, i.e., a subject-entity can stand in relation with zero, one, or many object-entities. Furthermore, submissions needed to go beyond just ranking predicted surface strings, and materialize disambiguated entities in the output, which were evaluated using established KB metrics of precision, recall, and F1-score. The challenge had two tracks: (1) a small model track, where models with &lt; 1 billion parameters could be probed, and (2) an open track, where participants could use any LM of their choice. We received seven submissions, two for track 1 and five for track 2. We present the contributions and insights of the submitted peer-reviewed submissions and lay out the possible paths for future work. All the details related to the challenge can be found on our website at https://lm-kbc.github.io/challenge2023/.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. About the Challenge</title>
      <sec id="sec-2-1">
        <title>Background</title>
        <p>
          Large-scale Language Models (LLMs) like BERT [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], LLaMA [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], and
chatGPT [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] have been optimized to predict missing parts of textual inputs or complete sentences
and have significantly improved performances on various NLP tasks such as question answering
and machine translation. Lately, these LLMs have been recognized for their potential ability to
generate structured knowledge directly from their parameters. This is a promising development
since existing knowledge bases (KBs) like Wikidata [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], DBpedia [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], YAGO [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], and
ConceptNet [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] are crucial components of the Semantic Web ecosystem but are inherently limited and
incomplete due to manual or (semi) automatic construction.
        </p>
        <p>LGOBE
https://people.mpi-inf.mpg.de/~ssinghan/ (S. Singhania); http://simonrazniewski.com/ (S. Razniewski)</p>
        <p>© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
Workshop
Proceedings</p>
        <p>
          The groundbreaking LAMA paper by Petroni et al. [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] demonstrated promising results on
knowledge extraction from pre-trained language models. Over the years, there have been
several advancements and criticisms in follow-up research, investigating the possibility of using
LLMs for KB construction [
          <xref ref-type="bibr" rid="ref10 ref11 ref12 ref13 ref9">9, 10, 11, 12, 13</xref>
          ]. New datasets, but also a variety of new techniques
to either probe LLMs for factual knowledge or to construct KBs directly from the language
model, have been proposed. To establish a competitive venue for evaluating this promising
research, we introduced the Knowledge Bases from Pre-trained Language Models (LM-KBC)
challenge at the International Semantic Web Conference 2022 [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ].
        </p>
        <p>This paper describes the 2nd edition of the LM-KBC challenge at ISWC 2023, with the
following changes compared to the previous year: (1) we added an entity disambiguation
component to overcome problems associated with entity aliases and also to enable participants
to probe LLMs directly for entity identity, (2) a new and larger dataset, comprising diverse
relationships incorporating popular, long-tail entities, and literal values, and (3) a small track
for LMs with up to 1B parameters, and an open track for any LM choice.</p>
        <p>Task Description Given an input pair consisting of a subject-entity  and a relation  , the
objective is to generate accurate object-entities [ 1,  2,  3, ...,   ] by probing the language model.</p>
        <p>In contrast to last year, we added an entity disambiguation component to the task. While
language models are working on natural language, KBs usually work with abstract identifiers
to prevent ambiguities among labels. Hence, a KB can distinguish between synonym entities.
For example, in Wikidata Athens, the capital of Greece is identified by Q1524, while the city of
Athens in Ohio, USA, is identified by Q755420. Hence, in this edition, we ask the participants
to predict Wikidata identifiers, instead of just names.</p>
        <p>For instance, Table 1 illustrates GPT-3 performance when probed with a sample prompt
containing a subject-entity and relation pair, and a set of few-shot examples. The model returns
the predictions in the format provided in the few-shot examples, in this case, a list.</p>
        <p>Similar to last year, the challenge had two tracks:
• Track 1: Small model track, where LMs with up to 1 billion parameters were allowed;
• Track 2: Open track, where LMs of arbitrary size, including retrieval-augmented models,
could be used.</p>
        <p>
          LM-KBC’23 Dataset Compared to last year, where we had only 12 relations, this edition
presents 21 relations, comprising of diverse set of subjects and a complete list of ground-truth
objects. Each relation has a maximum of 100 unique subject entities in all data splits. Table 2
provides more details. The object entities could be person, organization, country, count entities,
or even “none”. The ground truth consists of the exact identifiers from Wikidata [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
Evaluation Each test instance prediction is evaluated using precision, recall, and f1-score. In
the following, we show the metrics calculation. Let P be the prediction list of object entities for
a test subject-entity, and GT be its corresponding ground-truth list of object entities, then the
metrics are:
(Japan,
CountryOficialLanguage)
        </p>
        <p>Cologne, CityLocatedAtRiver, [Rhine]
Hexadecane, CompoundHasParts, [carbon, hydrogen]
Antoine Griezmann, FootballerPlaysPosition, [forward]</p>
        <p>Japan, CountryOficialLanguage,
(Italy, Coun- State of Palestine, CountryBordersCountry, [Q801]
tryBorder- Paraguay, CountryBordersCountry, [Q155, Q414, Q750]
sCountry) Lithuania, CountryBordersCountry, [Q34, Q36, Q159,</p>
        <p>Q184, Q211]
Italy, CountryBordersCountry,
LM Prediction
[Japanese]</p>
        <p>[Q5287]
[Q142]
[Q142]
When  is empty, and  is not, precision = 1 and recall = 0, leading to  1 = 0 . On the other
hand, when  is empty, recall = 1 but precision = 1 only when  is empty, else precision = 0,
leading to either 1 or 0  1 -score. Scores were macro-averaged across subjects and relations,
and the final macro-  1 -score ranked systems.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>2. Systems</title>
      <p>Participants submitted their predictions on CodaLab at https://codalab.lisn.upsaclay.fr/
competitions/14777 to get scores on the private test dataset. The leaderboard can be seen
in Table 4Below, we explain the baselines and provide insights from the seven submissions.</p>
      <sec id="sec-3-1">
        <title>2.1. Baselines</title>
        <p>We provide several baselines:
• Standard prompt for HuggingFace models with Wikidata default disambiguation: These
baselines can be instantiated with various HuggingFace models (e.g., BERT, OPT), generate
entity surface forms, and use the Wikidata entity disambiguation API to generate IDs.
• Few-shot GPT-3 directly predicting IDs: This baseline uses a few samples to instruct</p>
        <p>GPT-3 to directly predict Wikidata IDs.
• Few-shot GPT-3 w/ NED: Like above, but predicting surface forms disambiguated via</p>
        <p>Wikidata’s default disambiguation.</p>
        <p>Baseline Performance is shown in Table 3.</p>
        <p>Method
GPT-3 NED (Curie model)
GPT-3 IDs directly (Curie model)
BERT</p>
        <p>Avg. Precision</p>
        <p>Avg. Recall Avg. F1-score
2.2. Track 1
Winner: Expanding the Vocabulary of BERT for Knowledge Base Construction
Dong Yang, Xu Wang and Remzi Celebi</p>
        <p>The submission utilizes the BERT language model to improve knowledge base construction,
specifically addressing the challenge of multi-token object extraction. A novel approach called
“Token Recode” is introduced to expand the model’s vocabulary while retaining semantic
context. This method shows a noticeable improvement in f1 scores, confirming its efectiveness.
The study also employs task-specific pre-training and category-based filtering to enhance
performance further, achieving promising results even with a lightweight BERT model. The
code for this system is available at https://github.com/MaastrichtU-IDS/LMKBC-2023.
Broadening BERT vocabulary for Knowledge Graph Construction using Wikipedia2Vec
Debanjali Biswas, Stephan Linzbach, Dimitar Dimitrov, Hajira Jabeen and Stefan Dietze</p>
        <p>
          The paper proposes an interesting novel method for predicting multi-token entities with
the BERT model by using Wikipedia2Vec [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] embeddings. The resulting model is trained
with the prompt tuning technique OptiPrompt [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. While this is an exciting new idea, the
performance of the resulting system is similar to the baseline model BERT. Further experiments
would be needed to understand these results further. The code for this system is available at
https://github.com/debanjali05/LM-KBC2023-GESIS.
2.3. Track 2
Winner: Using Large Language Models for Knowledge Engineering (LLMKE): A Case
Study on Wikidata
Bohui Zhang, Ioannis Reklos, Nitisha Jain, Albert Meroño Peñuela and Elena Simperl
        </p>
        <p>The paper outlines a two-step pipeline for KBC: knowledge probing and entity mapping. In
the probing step, GPT-3.5 Turbo and GPT-4 are utilized. Three types of settings are tested:
question prompting, triple completion prompting, and context-enriched prompting. Few-shot
learning techniques are used across all settings to improve result formatting. In the entity
mapping step, the MediaWiki Action API is used to obtain candidate Wikidata entities for object
strings. Three methods are employed for final disambiguation: case-based, keyword-based, and
LM-based. The code for this system is available at https://github.com/bohuizhang/LLMKE.
Limits of Zero-shot Probing on Object Prediction
Shrestha Ghosh</p>
        <p>The paper introduces a system called “Minimal Probe”, which mainly emphasizes two areas:
prompt design and answer post-processing. For prompt design, the study constructs prompts
consisting of a task description, optional demonstrations, and the task itself. It shows that
introducing a well-framed task description can improve model performance substantially over
a baseline. In answer post-processing, the paper employs manually designed cleaning steps
to ensure the answers align with the desired format. The code for this system is available at
https://github.com/ghoshs/LM-KBC2023.</p>
        <p>Knowledge-centric Prompt Composition for Knowledge Base Construction from
Pre-trained Language Models
Xue Li, Anthony Hughes, Majlinda Llugiqi, Fina Polat, Paul Groth and Fajar J. Ekaputra</p>
        <p>The authors introduce a pipeline for constructing knowledge bases using large language
models, specifically GPT-3.5 and GPT-4. The research explores various configurations involving
in-context learning, employing an example selector and knowledge-enriched prompts for
better contextual relevance. Findings indicate that rule-based example selectors, which
consider cardinality per relation, significantly improve performance. Additionally, augmenting
entities and relations with extra properties sourced from GPT-4 further enhances the system’s
efectiveness. The code for this system is available at https://github.com/effyli/lm-kbc/.
Enhancing Knowledge Base Construction from Pre-trained Language Models using
Prompt Ensembles
Fabian Biester, Daniel Del Gaudio, and Mohamed Abdelaal</p>
        <p>
          The paper centers on the idea of “prompt ensembles” for improving knowledge base
construction from language models. The researchers initially used baseline prompts with
ChatGPT and then only the top-performing ones. Then, a few shot learning approach on
LLAMA2 [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] with 70b parameters is performed. As a last step, a fact-checking step is performed.
The code for this system is available at https://github.com/asdfthefourth/lmkbc.
LLM2KB: Constructing Knowledge Bases using instruction tuned context aware Large
Language Models
Anmol Nayak and Hari Prasad Timmapathini
        </p>
        <p>
          This paper uses instruction-fine-tuned LLAMA2 [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] and StableBeluga [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] models with
LoRa [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] together with dense passage retrieval to extend the prompt. The authors prepared a
Wikipedia corpus with textual data about the subject entities of interest and put them into the
FAISS index for usage with DPR [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. Then, they fine-tune two LLMs with three instruction
prompts on the training dataset. Wikipedia paragraphs extend these instruction prompts
via DPR. A similar process is performed at inference time. However, an additional entity
disambiguation step is performed, where the Wikidata API baseline disambiguation method
is used to retrieve candidate entities that are then sent to an LLM to perform disambiguation
via prompting. The code for this system is available at https://github.com/anmoln94/Team_
LLM2KB_LM-KBC-2023.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3. Discussions</title>
      <p>The second edition of our LM-KBC challenge received encouraging uptake, with seven teams
going past the finish line and submitting both code and system descriptions. Table 4 presents
the final leaderboard of our challenge.</p>
      <sec id="sec-4-1">
        <title>3.1. Takeaway</title>
        <p>The main findings across all the submissions are:
1. Larger models beat the smaller models by a large margin. We observe that the
submissions on track 2, which mainly consist of relation-specific prompts for probing the
GPT-4 model, have a considerably higher performance than track 1 submissions.</p>
        <sec id="sec-4-1-1">
          <title>System</title>
        </sec>
        <sec id="sec-4-1-2">
          <title>Track</title>
        </sec>
        <sec id="sec-4-1-3">
          <title>Precision</title>
        </sec>
        <sec id="sec-4-1-4">
          <title>Recall F1-score</title>
          <p>Zhang et al.</p>
          <p>Li et al.</p>
          <p>Biester et al.</p>
          <p>Nayak and Timmapathini</p>
          <p>Ghosh</p>
          <p>GPT-3 Baseline
GPT-3 Baseline (IDs directly)</p>
          <p>Yang et al.</p>
          <p>Biswas et al.</p>
          <p>BERT Baseline</p>
          <p>2. Possible saturation using GPT-4 probing. The top submissions on the leaderboard,
which correspond to track 2, have a similar performance range across all three metrics.
Since the prompts are manually crafted and relation-specific, the similarity in performance
could be due to similar prompt formulations. However, post-processing the prediction
list leads to the highest performance.
3. Easy vs hard relations. Across submissions, we notice that the 21 relations could be
categorized as (1) top-5 easy relations: [PersonHasNoblePrize, CompoundHasParts,
CountryHasOficialLanguage, RiverBasinsCountry, PersonCauseOfDeath] , and (2)
top-5 hard relations: [PersonHasEmployer, PersonHasAutobiography,
PersonHasProfession, StateBordersState, BandHasMember]. This insight could help us better curate
the next edition dataset by considering the object list cardinality and entity popularity.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>3.2. Possible Extensions</title>
        <p>Deciding on the challenge complexity required navigating a trade-of between ease of access
and realism. Several avenues for extension are:
1. Only small scale models: Given the easy API access to GPT-4 model variants and
the associated monetary cost, it might be interesting only to host the challenge with a
small-scale model track. This might lead to innovative solutions focusing on retrieval
augmentation, model compression, and knowledge transfer.
2. Temporal object list: Real-world knowledge bases with factual information keep
evolving with new information. A special track incorporating the time component into the
object entities to study the consistency and stability of LMs for knowledge base
construction could be helpful.
3. Other metrics: Our evaluation focused on macro-averaged f1-scores, which give equal
weight to precision and recall. It could be interesting to explore other trade-ofs; as for
KBs, precision is often way more critical than recall. Also, as subjects with no objects
dominate many domains (e.g., very few people hold political ofices), a higher presence
or more weight on no-object subjects might be interesting.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Acknowledgments</title>
      <p>We thank Huawei for sponsoring a 1000 Euro prize for the winning systems. We also thank
the semantic web challenge chairs, Valentina Ivanova and Wen Zhang, for helping us host a
successful second edition of our challenge. Finally, we thank the participating teams for their
enthusiasm and contributions.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1] OpenAI, GPT-4
          <source>technical report</source>
          ,
          <year>2023</year>
          . URL: https://doi.org/10.48550/arXiv.2303.08774. doi:
          <volume>10</volume>
          .48550/arXiv.2303.08774. arXiv:
          <volume>2303</volume>
          .
          <fpage>08774</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>F.</given-names>
            <surname>Petroni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rocktäschel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Riedel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bakhtin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <article-title>Language models as knowledge bases?</article-title>
          ,
          <source>in: Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>2463</fpage>
          -
          <lpage>2473</lpage>
          . URL: https://aclanthology.org/D19-1250. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D19</fpage>
          -1250.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          ,
          <source>NAACL</source>
          (
          <year>2019</year>
          ). URL: https://doi.org/10.18653/ v1/n19-
          <fpage>1423</fpage>
          . doi:
          <volume>10</volume>
          .18653/v1/n19-
          <fpage>1423</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>H.</given-names>
            <surname>Touvron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lavril</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Izacard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Martinet</surname>
          </string-name>
          , M.
          <article-title>-</article-title>
          <string-name>
            <surname>A. Lachaux</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Lacroix</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Rozière</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Hambro</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Azhar</surname>
          </string-name>
          , et al.,
          <article-title>Llama: Open and eficient foundation language models</article-title>
          ,
          <source>arXiv</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D.</given-names>
            <surname>Vrandečić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Krötzsch</surname>
          </string-name>
          ,
          <article-title>Wikidata: A free collaborative knowledgebase</article-title>
          ,
          <source>Commun. ACM</source>
          <volume>57</volume>
          (
          <year>2014</year>
          )
          <fpage>78</fpage>
          -
          <lpage>85</lpage>
          . URL: https://doi.org/10.1145/2629489. doi:
          <volume>10</volume>
          .1145/2629489.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          , G. Kobilarov,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cyganiak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ives</surname>
          </string-name>
          ,
          <article-title>Dbpedia: A nucleus for a web of open data</article-title>
          ,
          <source>in: The semantic web</source>
          ,
          <year>2007</year>
          , pp.
          <fpage>722</fpage>
          -
          <lpage>735</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Suchanek</surname>
          </string-name>
          , G. Kasneci, G. Weikum,
          <article-title>Yago: A core of semantic knowledge</article-title>
          ,
          <source>in: Proceedings of the 16th International Conference on World Wide Web, WWW '07</source>
          ,
          <year>2007</year>
          , p.
          <fpage>697</fpage>
          -
          <lpage>706</lpage>
          . URL: https://doi.org/10.1145/1242572.1242667. doi:
          <volume>10</volume>
          .1145/1242572.1242667.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>H.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <article-title>Commonsense reasoning in and over natural language</article-title>
          ,
          <source>in: KnowledgeBased Intelligent Information and Engineering Systems</source>
          ,
          <year>2004</year>
          , pp.
          <fpage>293</fpage>
          -
          <lpage>306</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>C.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <article-title>Language models are open knowledge graphs</article-title>
          , arXiv preprint arXiv:
          <year>2010</year>
          .
          <volume>11967</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. P.</given-names>
            <surname>Xing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Hu</surname>
          </string-name>
          , Bertnet:
          <article-title>Harvesting knowledge graphs from pretrained language models</article-title>
          ,
          <source>arXiv preprint arXiv:2206.14268</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>B.</given-names>
            <surname>Veseli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Singhania</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Razniewski</surname>
          </string-name>
          , G. Weikum,
          <article-title>Evaluating language models for knowledge base completion</article-title>
          ,
          <source>in: European Semantic Web Conference</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>227</fpage>
          -
          <lpage>243</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>B.</given-names>
            <surname>Veseli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Razniewski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-C.</given-names>
            <surname>Kalo</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Weikum, Evaluating the knowledge base completion potential of gpt, Findings of EMNLP (</article-title>
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J. Z.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Razniewski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-C.</given-names>
            <surname>Kalo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Singhania</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dietze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jabeen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Omeliyanenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lissandrini</surname>
          </string-name>
          , et al.,
          <article-title>Large language models and knowledge graphs: Opportunities and challenges</article-title>
          ,
          <source>TGDK</source>
          (to appear) (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S.</given-names>
            <surname>Singhania</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.-P.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Razniewski</surname>
          </string-name>
          ,
          <article-title>LM-KBC: Knowledge base construction from pre-trained language models, Semantic Web challenge (</article-title>
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>I.</given-names>
            <surname>Yamada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Asai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sakuma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Shindo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Takeda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Takefuji</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y. Matsumoto,</surname>
          </string-name>
          <article-title>Wikipedia2Vec: An eficient toolkit for learning and visualizing the embeddings of words and entities from Wikipedia, in: Empirical Methods in Natural Language Processing: System Demonstrations</article-title>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>23</fpage>
          -
          <lpage>30</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          . emnlp-demos.4. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .emnlp-demos.
          <volume>4</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Friedman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          , Factual probing is [MASK]
          <article-title>: Learning vs. learning to recall, in: North American Chapter of the Association for Computational Linguistics: Human Language Technologies</article-title>
          ,
          <year>2021</year>
          , pp.
          <fpage>5017</fpage>
          -
          <lpage>5033</lpage>
          . URL: https://aclanthology.org/
          <year>2021</year>
          . naacl-main.
          <volume>398</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2021</year>
          .naacl-main.
          <volume>398</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>S. AI</given-names>
            ,
            <surname>Stable</surname>
          </string-name>
          <string-name>
            <surname>beluga</surname>
          </string-name>
          ,
          <year>2023</year>
          . URL: https://stability.ai/blog/ stable-beluga
          <article-title>-large-instruction-fine-tuned-models.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>E. J.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Wallis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Allen-Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Chen</surname>
          </string-name>
          , Lora:
          <article-title>Low-rank adaptation of large language models</article-title>
          ,
          <year>2021</year>
          . arXiv:
          <volume>2106</volume>
          .
          <fpage>09685</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>V.</given-names>
            <surname>Karpukhin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Oguz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Min</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Edunov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          , W.-t. Yih,
          <article-title>Dense passage retrieval for open-domain question answering</article-title>
          , in: B.
          <string-name>
            <surname>Webber</surname>
            , T. Cohn,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>He</surname>
          </string-name>
          , Y. Liu (Eds.),
          <source>Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>6769</fpage>
          -
          <lpage>6781</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .emnlp-main.
          <volume>550</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .emnlp-main.
          <volume>550</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>