=Paper=
{{Paper
|id=Vol-3841/Paper19
|storemode=property
|title=Ontology-grounded Automatic Knowledge Graph Construction by LLM under Wikidata schema
|pdfUrl=https://ceur-ws.org/Vol-3841/Paper19.pdf
|volume=Vol-3841
|authors=Xiaohan Feng,Xixin Wu,Helen Meng
|dblpUrl=https://dblp.org/rec/conf/hi-ai/FengWM24
}}
==Ontology-grounded Automatic Knowledge Graph Construction by LLM under Wikidata schema==
<pdf width="1500px">https://ceur-ws.org/Vol-3841/Paper19.pdf</pdf>
<pre>
                                Ontology-grounded Automatic Knowledge Graph
                                Construction by LLM under Wikidata schema
                                Xiaohan Feng1 , Xixin Wu1 and Helen Meng1,*
                                1
                                    Department of System Engineering and Engineering Management, Chinese University of Hong Kong


                                              Abstract
                                              We propose an ontology-grounded approach to Knowledge Graph (KG) construction using Large Lan-
                                              guage Models (LLMs) on a knowledge base. An ontology is authored by generating Competency Questions
                                              (CQ) on knowledge base to discover knowledge scope, extracting relations from CQs, and attempt to
                                              replace equivalent relations by their counterpart in Wikidata. To ensure consistency and interpretability
                                              in the resulting KG, we ground generation of KG with the authored ontology based on extracted relations.
                                              Evaluation on benchmark datasets demonstrates competitive performance in knowledge graph construc-
                                              tion task. Our work presents a promising direction for scalable KG construction pipeline with minimal
                                              human intervention, that yields high quality and human-interpretable KGs, which are interoperable with
                                              Wikidata semantics for potential knowledge base expansion.

                                              Keywords
                                              Knowledge Graph, Relation Extraction, Large Language Model, Wikidata, Interpretable AI


                                1. Introduction
                                Knowledge Graphs (KGs) are structured representations of information that capture entities
                                and their relationships in a graph format. By organizing knowledge in a machine-readable
                                way, KGs enable a wide range of intelligent applications, such as semantic search, question
                                answering, recommendation systems, and decision support [1]. The ability to construct high-
                                quality, comprehensive KGs is thus critical for harnessing the power of these technologies
                                across various domains.
                                   Traditionally, the process of constructing KGs has relied heavily on manual effort by domain
                                experts to define the relevant entities and relationships, populate the graph with valid facts,
                                and ensure logical consistency [2]. However, this manual curation approach is time-consuming,
                                expensive, and difficult to scale to large, evolving domains. There is a strong need for (semi-
                                )automatic methods that can aid the KG construction process by extracting structured knowledge
                                from unstructured data sources such as text.
                                   Recent years have seen growing interest in leveraging Large Language Models (LLMs) for
                                various knowledge capture and reasoning tasks [3]. Pre-trained on vast amounts of text data,
                                LLMs can generate fluent natural language and have been shown to memorize and recall
                                factual knowledge [4], [5]. However, directly applying LLMs to KG construction still faces
                                several challenges. First, LLMs may generate inconsistent or redundant facts due to the lack

                                HI-AI@KDD, Human-Interpretable AI Workshop at the KDD 2024, 26𝑡ℎ of August 2024, Barcelona, Spain
                                *
                                 Corresponding author.
                                $ xhfeng@se.cuhk.edu.hk (X. Feng); wuxx@se.cuhk.edu.hk (X. Wu); hmmeng@se.cuhk.edu.hk (H. Meng)
                                            © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
of an explicit, unified schema [6]. Second, the generated KGs may be incomplete or biased
towards the knowledge present in the LLM’s training data, which may not fully cover the
target domain, especially for proprietary documents not included in pre-training set. Finally,
it can be challenging to integrate LLM-generated KGs with existing knowledge bases due to
misalignment with standard ontologies.
   In this work, we propose a novel approach
that harnesses the reasoning power of LLMs
and the structured schema of Wikidata to
construct high-quality KGs for proprietary
knowledge domains. Our approach begins by
discovering the scope of knowledge through
the generation of Competency Questions (CQ)
and answers from unstructured documents.
We then summarize the relations and prop-
erties from these QA pairs into an ontology,
matching candidate properties against those
defined in Wikidata and extending the schema
as needed. Finally, we use the resulting on-      Figure 1: Flowchart of proposed approach.
tology to ground the transformation of CQ-
answer pairs into a structured KG. By incorporating the Wikidata schema into our pipeline
and grounding generation of KG on the same ontology, we aim to reduce redundancy, leverage
the implicit knowledge captured during LLM pretraining while improving interpretability, and
ensure interoperability with public knowledge bases. The generated KGs could be parsed with
RDF parsers and used in downstream applications, or audited for correctness.
   The main contributions of this work are as follows:
   1. We propose a novel ontology-grounded approach to LLM-based KG construction that
      leverages ontology based on Wikidata schema to guide the extraction and integration of
      knowledge from unstructured text.
   2. We introduce a pipeline that combines competency question generation, ontology align-
      ment, and KG grounding to systematically construct high-quality KGs that are consistent,
      complete, and interoperable with existing knowledge bases.
   3. We demonstrate the effectiveness of our approach through experiments on benchmark
      datasets, showing improvements in KG quality compared to traditional methods alongside
      with interpretability and utility of generated KGs.


2. Literature Review
Knowledge graph construction has been an active area of research in recent years, with a wide
range of approaches proposed for extracting structured knowledge from unstructured data
sources [2]. Early methods relied heavily on rule-based systems and hand-crafted features to
identify entities and relations in text [7]. With the advent of deep learning, neural network-
based approaches have become increasingly popular, enabling more flexible and scalable KG
construction [8].
   One prominent line of work focuses on using distant supervision to automatically generate
training data for relation extraction [9]. These methods assume that if two entities are mentioned
together in a sentence and also appear in a knowledge base as subject and object of a relation,
then that sentence is likely to express the relation. While distant supervision has been shown
to be effective at scale, it often suffers from noise and incomplete coverage.
   Another important direction is the development of unsupervised and semi-supervised meth-
ods for KG construction [10]. These approaches aim to reduce the reliance on large amounts
of labeled data by leveraging techniques such as bootstrapping, graph-based inference, and
representation learning. However, they often struggle with consistency and quality control
issues. More recently, there has been growing interest in using large language models for
KG construction [11], [12], [13]. These methods take advantage of the vast knowledge cap-
tured in pretrained Language Models (LM) to generate KG triples through prompt engineering
and fine-tuning. While promising, these LM-based approaches only produces triplets without
canocalization, which makes portability and interoperability difficult. Additionally, some meth-
ods rely on vector-based similarity measures to deduce relationships between entities in KG,
which yields good performance but falls short in interpretability [14].
   As mentiond in Introduction, despite the significant progress in KG construction and LLM
applications, performance, interpretability, coverage of proprietary documents, and interaction
with other knowledge base remain issues. Our pipeline address these by grouding KG generation
on ontology based on Wikidata schema, which ensures that output KG is human-readable and
makes integrating with Wikidata or other KG easier; In the experiments below we show that
these benefits can be also achieved on private documents with decent performance.


3. Method: Ontology-grounded KG Construction
Our proposed approach for ontology-grounded KG construction using LLMs consists of four
main stages: 1) Competency Question Generation, 2) Relation Extraction and Ontology Matching,
3) Ontology Formatting, and 4) KG Construction. Figure 1 provides an overview of the pipeline.

3.1. Competency Question (CQ)-Answer Generation
The first step in our pipeline is to generate a set of competency questions (CQs) and answers
that capture the key information needs of the target domain. We employ an LLM to generate
CQs based on the input documents. The LLM is provided with a set of instructions and examples
to guide the generation process, encouraging the creation of well-formed, relevant questions
that can be answered using the given documents. This step helps to scope the KG construction
task within the knowledge domain, and ensure that the resulting KG aligns with the intended
use cases. This also allows further ontology expansion by incorporating user-submitted domain-
defining questions when interacting with the knowledge base, which serves as a user friendly
interface of refining ontology by submitting new CQs and use our proposed pipeline to attach
the incremental knowledge scope to existing ontology.
3.2. Relation Extraction and Ontology Matching
During our preliminary experiments of prompting LLMs to directly generate ontology on
documents, we noted that the LLM spontaneously recalled Wikidata knowledge in response,
consistent with previous works [15]. In our preliminary experiments, this behaviour also
transfers to small 7B/14B models.
   Following this direction, in the second step we extract relations from CQs and match them
against Wikidata properties to better elicit model memories on Wikidata when constructing and
using ontology. We first prompt LLMs to extract properties from CQ and write brief description
on usage of extracted properties, including their domain and range, following editing guidelines
of Wikidata. To match these properties against existing entries in Wikidata ontology, we pre-
populate a candidate property list with all Wikidata properties after filtering out properties
related to external database/knowledge base IDs. These extracted properties are then matched
against the candidate list by a vector similarity search between description of properties. The
representation for property is sentence embedding constructed from description of properties,
and the top 1 closest candidate is retrieved for each extracted property. This matching result
between each pair of extracted property and matched top 1 candidate is then vetted by LLM to
see if they are really semantically similar as a final deduplication step. If a match is validated,
the candidate property is added to the final property list; otherwise, the newly minted property
is kept in the final list if we allow expansion from the candidate property list derived from
Wikidata, and discarded when the final property list is required to be a subset of candidate
property list. The first scenario is suitable for cases when no prior schema is known for the
domain and some new properties outside of common ontology are expected, whereas the latter
is for a known target list of possible properties.

3.3. Ontology Formatting
In the third stage, we use LLM to generate an OWL ontology based on the matched and
newly created properties. We copy the description, domain and range field from all properties
under Wikidata semantics. For new properties, LLM is prompted to infer and summarize classes
for the domain and range of the relations to output a complete OWL ontology, following the
format of copied Wikidata properties. This step ensures that the resulting KG is grounded in a
formal, machine-readable ontology that captures relationships between entities, and close to
the semantics of Wikidata for interoperability.

3.4. KG Construction
In the final stage, we use the LLM to construct a KG based on the CQs and related answers
grounded by the generated ontology in the previous stage. For each (CQ, answer) pair, LLM
extracts relevant entities and maps them to the ontology using the defined properties. The
output is a set of RDF triples that constitutes the final KG.
4. Experiments and Discussion
4.1. Experiment settings
We evaluate our ontology-grounded approach to KG construction (KGC) on three datasets for
KGC datasets: Wiki-NRE [16], SciERC [17], and WebNLG [18]. As Wiki-NRE and WebNLG
are partially based on Wikidata and DBpedia (derived from Wikipedia contents), and in our
proposed pipeline, Wikidata schema is utilized, we include SciERC for a more robust evaluation,
since SciERC contains relation types that are not equivalent by nature to properties in Wikidata.
   We used a subset Wiki-NRE’s test dataset containing 1,000 samples with 45 relation types
following the split in [19], due to cost constraints. SciERC’s test set contain 974 samples under
a schema with 7 relation types. For WebNLG, we used test set in Semantic Parsing (SP) task,
with 1,165 samples and 159 relation types. For evaluation, we adopt partial F1 on KG triplets
based on standards in [18]. All experiments are conducted for one-pass
   We note in the previous reports that annotation in KGC reports may be incomplete in terms
of both possible relation types and KG triplets [19], [20].
   As our pipeline is designed to autonomously uncover knowledge structure with no prior
assumption on knowledge schema, we report our result in two ways, corresponding to the two
configurations of final de-duplication step in Section 3.2:

      1. Target schema constrained: In this setting, we match all relation types in test sets to its
         closest equivalent in Wikidata and constrict ontology to the relation universe in test set.
      2. No schema constraint: In this setting, we do not filter matched ontology, even if they are
         not in schema of test dataset. This setting is close to real-life applications when processing
         documents with unknown schema.

   For property conjunction, evaluate for, compare, feature of in SciERC, we select the closest
properties proposed by LLM based on our subjective opinion.
   To highlight our system’s competency, rather than directly prompting triplets, we parse
output KG with RDF parser and extract all valid RDF triples for KG related to each document in
test set, and present triplets to evaluation script for assessment. This ensures that our evaluation
is on the generated KG ready to be consumed in downstream application.
   We test our pipeline on both Mistral-7B-instruct [21] and GPT-4o1 . Due to cost constraints,
we have only tested GPT-4o on target schema constrained setting. For embedding property usage
comment, we select bge-small-en [22]. We use GenIE [23], PL-Marker [24], and ReGen [25]
as fine-tuned baseline for Wiki-NRE, SciERC, and WebNLG dataset, respectively (collectively
named Non-LLM Baseline). For LLM-based systems, we use results reported in [19] for Wiki-
NRE and WebNLG on the same Mistral model, and GPT-4 results in [3] for SciERC. (collectively
named LLM Baseline). We note that it is highly unlikely that Mistral-7B poses an advantage
over an earlier version of GPT-4, when interpreting result of SciERC.

4.2. Result


1
    https://openai.com/index/hello-gpt-4o/
Table 1 shows the performance of our Method                        Wiki-NRE      SciERC      WebNLG
method compared to state-of-the-art Non-LLM Baseline                  0.484       0.532       0.767
baselines on this subset. Our proposed LLM Baseline                   0.647        0.07       0.728
approach exceeds all baseline under tar- Proposed (Mistral)         0.66/0.60   0.73/0.58 0.74/0.68
get schema constrained setting on Wiki- Proposed (GPT-4o) 0.71/N/A 0.77/N/A 0.76/N/A
NRE and SciERC datasets, while display-
ing a small performance regression when Table 1: Partial F1 scores on test datasets. Best result
without schema constraint. On WebNLG               is bolded. Results of proposed pipeline under
dataset, our pipeline maintained compet-           two settings are presented as Target schema
itiveness against fine-tuned SOTA when             constrained/no schema constraint.
constrained on target schema. These re-
sults validate the quality of KG generated by our pipeline, especially SciERC, whose semantics
contains properties that are not native to Wikidata. We also note performance improvement
when using GPT-4o.

4.3. Discussion
4.3.1. Performance discrepancy on different grounding ontology
It is worth noting that the relatively lower performance on no schema constraint setting across
all datasets is due to the fact that the LLM discovers a richer ontology than the predefined
target schema. While this expanded schema may capture additional relevant information, it can
hinder extraction performance when evaluated solely against the limited target schema. This
showcases the trade-off between schema completeness and strict adherence to a predefined
ontology, and our pipeline performs best on a large set of documents with a limited scope of
knowledge, requiring a concise schema.
    Furthermore, the flipside of performance deficit in an absence of schema constraints, i.e.
additional ontology entries outside of dataset-defined properties, cannot be evaluated against
the dataset directly, as the ontology is not entirely covered by test set annotations. Hence, the
virtue of no schema constraint setting is to demonstrate that our pipeline can indeed provide a
coverage of the properties in test set, though somewhat limited compared to baselines, when also
capturing ontology outside test set schema, which is potentially more useful when discovering
ontology on a novel document set with no expert knowledge in its schema conposition. This
ability may be validated by manual evaluation on the full set of captured ontology in a future
work down the line.
    Nevertheless, the marginal performance deficit leaves room for improvement. Recent reports
explored that long input context may pose challenge to LLMs even if such long context length is
technically supported [26]. We conjecture that aside from trimming grounding ontology, which
hinders the knowledge coverage of our pipeline, few-shot fine-tuning on the new ontology or
general pretraining in KG construction task may be helpful. We leave these as possible future
directions.
4.3.2. Utility of generated KG
It should be emphasized that, while the selected evaluation tasks evaluate the correctness
of extracted triplets, the extracted knowledge graph can do more than that. With ongoing
discussion related to grounding LLM knowledge on trusted knowledge sources to reduce
hallucination [6], explicitly generating KG provides a path to audit knowledge elicited when
interacting with LLM, and with evidence demonstrating that LLM has the potential to reason
on graph and generate an explicit path to retrieve required knowledge [27], our pipeline may
serve as a foundation for an interpretable QA system, where an LLM autonomously extracts
ontology and deduces correct retrieval query based on the ontology when handling a set of
unstructured document. The interpretability arise from the fact that KG and query could be
understood and verified by users. Moreover, our usage of Wikidata schema offers potential
interoperability with the whole Wikidata knowledge base, which safely expands the knowledge
scope of QA system. We propose to continue research on this significant direction.

4.3.3. Computational resources
We note the growing concern of sustainability in LLM applications due to intensive requirement
on computational resources. This pipeline consumes three separate LLM calls per document,
plus one call per extracted relation. It is not straight forward to compare the carbon footprint
of our approach compared to Non-LLM baselines, as our work at this stage does not require
model fine-tuning, whereas all of the Non-LLM baselines employed various tuning techniques
when producing the result. On the other hand, our smallest model adopted, Mistral-7B, is more
than 10x larger in terms of parameter size compared to T5 models used in Non-LLM baselines.
Larger models naturally require more powerful GPU clusters in terms of both GPU quantity
and capability, but our zero-shot approach may provide an advantage in terms of resource
cost compared to Non-LLM baselines when processing a small number of documents with no
training requirement.
   When comparing with LLM baselines, we note that the approach by [14], [3] consumes 1 and
2 LLM calls per document, respectively. However, we note that these baselines treat knowledge
triplet as evaluation target, while we generate a formatted ontology at the end, which is more
useful. Nevertheless, we recognize the performance burden and propose to explore techniques
in fine-tuning and guided decoding to achieve better performance with smaller model and better
reproducibility.


5. Conclusion
We have demonstrated the effectiveness of our ontology-grounded approach to KG construction
using LLMs. By leveraging the structured knowledge in Wikidata, pretrained on LLM, and
grounding KG construction with generated ontology, our pipeline is able to construct high-
quality KGs across various domains while maintaining competitive performance with state-of-
the-art baselines. Generated KGs that are conformant with Wikidata schema leaves possibly
wide open, of building an interpretable QA system that has robust access to both common
knowledge and proprietary knowledge base.
Acknowledgments
This work is supported by Centre for Perceptual and Interactive Intelligence (CPII) Ltd, a
CUHK-led InnoCentre under InnoHK scheme of Innovation and Technology Commision.


References
 [1] A. Hogan, E. Blomqvist, M. Cochez, C. D’amato, G. D. Melo, C. Gutierrez, S. Kirrane,
     J. E. L. Gayo, R. Navigli, S. Neumaier, A.-C. N. Ngomo, A. Polleres, S. M. Rashid, A. Rula,
     L. Schmelzeisen, J. Sequeda, S. Staab, A. Zimmermann, Knowledge graphs, ACM Comput.
     Surv. 54 (2021). URL: https://doi.org/10.1145/3447772. doi:10.1145/3447772.
 [2] S. Ji, S. Pan, E. Cambria, P. Marttinen, P. S. Yu, A survey on knowledge graphs: Represen-
     tation, acquisition, and applications, IEEE Transactions on Neural Networks and Learning
     Systems 33 (2020) 494–514. URL: https://api.semanticscholar.org/CorpusID:211010433.
 [3] Y. Zhu, X. Wang, J. Chen, S. Qiao, Y. Ou, Y. Yao, S. Deng, H. Chen, N. Zhang, LLMs for
     Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportu-
     nities, 2024. doi:10.48550/arXiv.2305.13168. arXiv:2305.13168.
 [4] F. Petroni, T. Rocktäschel, P. Lewis, A. Bakhtin, Y. Wu, A. H. Miller, S. Riedel, Language
     models as knowledge bases?, in: Conference on Empirical Methods in Natural Language
     Processing, 2019. URL: https://api.semanticscholar.org/CorpusID:202539551.
 [5] OpenAI, J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida,
     J. Altenschmidt, S. Altman, S. Anadkat, R. Avila, I. Babuschkin, S. Balaji, V. Balcom,
     P. Baltescu, H. Bao, M. Bavarian, J. Belgum, I. Bello, J. Berdine, G. Bernadett-Shapiro,
     C. Berner, L. Bogdonoff, O. Boiko, M. Boyd, A.-L. Brakman, G. Brockman, T. Brooks,
     M. Brundage, K. Button, T. Cai, R. Campbell, A. Cann, B. Carey, C. Carlson, R. Carmichael,
     B. Chan, C. Chang, F. Chantzis, D. Chen, S. Chen, R. Chen, J. Chen, M. Chen, B. Chess,
     C. Cho, C. Chu, H. W. Chung, D. Cummings, J. Currier, Y. Dai, C. Decareaux, T. Degry,
     N. Deutsch, D. Deville, A. Dhar, D. Dohan, S. Dowling, S. Dunning, A. Ecoffet, A. Eleti,
     T. Eloundou, D. Farhi, L. Fedus, N. Felix, S. P. Fishman, J. Forte, I. Fulford, L. Gao, E. Georges,
     C. Gibson, V. Goel, T. Gogineni, G. Goh, R. Gontijo-Lopes, J. Gordon, M. Grafstein, S. Gray,
     R. Greene, J. Gross, S. S. Gu, Y. Guo, C. Hallacy, J. Han, J. Harris, Y. He, M. Heaton,
     J. Heidecke, C. Hesse, A. Hickey, W. Hickey, P. Hoeschele, B. Houghton, K. Hsu, S. Hu,
     X. Hu, J. Huizinga, S. Jain, S. Jain, J. Jang, A. Jiang, R. Jiang, H. Jin, D. Jin, S. Jomoto,
     B. Jonn, H. Jun, T. Kaftan, Łukasz Kaiser, A. Kamali, I. Kanitscheider, N. S. Keskar, T. Khan,
     L. Kilpatrick, J. W. Kim, C. Kim, Y. Kim, J. H. Kirchner, J. Kiros, M. Knight, D. Kokotajlo,
     Łukasz Kondraciuk, A. Kondrich, A. Konstantinidis, K. Kosic, G. Krueger, V. Kuo, M. Lampe,
     I. Lan, T. Lee, J. Leike, J. Leung, D. Levy, C. M. Li, R. Lim, M. Lin, S. Lin, M. Litwin,
     T. Lopez, R. Lowe, P. Lue, A. Makanju, K. Malfacini, S. Manning, T. Markov, Y. Markovski,
     B. Martin, K. Mayer, A. Mayne, B. McGrew, S. M. McKinney, C. McLeavey, P. McMillan,
     J. McNeil, D. Medina, A. Mehta, J. Menick, L. Metz, A. Mishchenko, P. Mishkin, V. Monaco,
     E. Morikawa, D. Mossing, T. Mu, M. Murati, O. Murk, D. Mély, A. Nair, R. Nakano, R. Nayak,
     A. Neelakantan, R. Ngo, H. Noh, L. Ouyang, C. O’Keefe, J. Pachocki, A. Paino, J. Palermo,
     A. Pantuliano, G. Parascandolo, J. Parish, E. Parparita, A. Passos, M. Pavlov, A. Peng,
     A. Perelman, F. de Avila Belbute Peres, M. Petrov, H. P. de Oliveira Pinto, Michael, Pokorny,
     M. Pokrass, V. H. Pong, T. Powell, A. Power, B. Power, E. Proehl, R. Puri, A. Radford, J. Rae,
     A. Ramesh, C. Raymond, F. Real, K. Rimbach, C. Ross, B. Rotsted, H. Roussez, N. Ryder,
     M. Saltarelli, T. Sanders, S. Santurkar, G. Sastry, H. Schmidt, D. Schnurr, J. Schulman,
     D. Selsam, K. Sheppard, T. Sherbakov, J. Shieh, S. Shoker, P. Shyam, S. Sidor, E. Sigler,
     M. Simens, J. Sitkin, K. Slama, I. Sohl, B. Sokolowsky, Y. Song, N. Staudacher, F. P. Such,
     N. Summers, I. Sutskever, J. Tang, N. Tezak, M. B. Thompson, P. Tillet, A. Tootoonchian,
     E. Tseng, P. Tuggle, N. Turley, J. Tworek, J. F. C. Uribe, A. Vallone, A. Vijayvergiya, C. Voss,
     C. Wainwright, J. J. Wang, A. Wang, B. Wang, J. Ward, J. Wei, C. Weinmann, A. Welihinda,
     P. Welinder, J. Weng, L. Weng, M. Wiethoff, D. Willner, C. Winter, S. Wolrich, H. Wong,
     L. Workman, S. Wu, J. Wu, M. Wu, K. Xiao, T. Xu, S. Yoo, K. Yu, Q. Yuan, W. Zaremba,
     R. Zellers, C. Zhang, M. Zhang, S. Zhao, T. Zheng, J. Zhuang, W. Zhuk, B. Zoph, Gpt-4
     technical report, 2024. arXiv:2303.08774.
 [6] G. Agrawal, T. Kumarage, Z. Alghamdi, H. Liu, Can Knowledge Graphs Reduce
     Hallucinations in LLMs? : A Survey, 2024. doi:10.48550/arXiv.2311.07914.
     arXiv:2311.07914.
 [7] E. Agichtein, L. Gravano, Snowball: extracting relations from large plain-text collections,
     in: Proceedings of the Fifth ACM Conference on Digital Libraries, DL ’00, Association for
     Computing Machinery, New York, NY, USA, 2000, p. 85–94. URL: https://doi.org/10.1145/
     336597.336644. doi:10.1145/336597.336644.
 [8] Y. Zhang, V. Zhong, D. Chen, G. Angeli, C. D. Manning, Position-aware attention and
     supervised data improve slot filling, in: M. Palmer, R. Hwa, S. Riedel (Eds.), Proceedings of
     the 2017 Conference on Empirical Methods in Natural Language Processing, Association
     for Computational Linguistics, Copenhagen, Denmark, 2017, pp. 35–45. URL: https://
     aclanthology.org/D17-1004. doi:10.18653/v1/D17-1004.
 [9] M. Mintz, S. Bills, R. Snow, D. Jurafsky, Distant supervision for relation extraction without
     labeled data, in: K.-Y. Su, J. Su, J. Wiebe, H. Li (Eds.), Proceedings of the Joint Conference
     of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on
     Natural Language Processing of the AFNLP, Association for Computational Linguistics,
     Suntec, Singapore, 2009, pp. 1003–1011. URL: https://aclanthology.org/P09-1113.
[10] X. L. Dong, E. Gabrilovich, G. Heitz, W. Horn, N. Lao, K. Murphy, T. Strohmann, S. Sun,
     W. Zhang, Knowledge vault: A web-scale approach to probabilistic knowledge fusion,
     in: The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data
     Mining, KDD ’14, New York, NY, USA - August 24 - 27, 2014, 2014, pp. 601–610. URL: http:
     //www.cs.cmu.edu/~nlao/publication/2014.kdd.pdf, evgeniy GabrilovichWilko HornNi
     LaoKevin MurphyThomas StrohmannShaohua SunWei ZhangGeremy Heitz.
[11] A. Bosselut, H. Rashkin, M. Sap, C. Malaviya, A. Celikyilmaz, Y. Choi, COMET: Com-
     monsense transformers for automatic knowledge graph construction, in: A. Korhonen,
     D. Traum, L. Màrquez (Eds.), Proceedings of the 57th Annual Meeting of the Association for
     Computational Linguistics, Association for Computational Linguistics, Florence, Italy, 2019,
     pp. 4762–4779. URL: https://aclanthology.org/P19-1470. doi:10.18653/v1/P19-1470.
[12] Y. Chen, Y. Liu, L. Dong, S. Wang, C. Zhu, M. Zeng, Y. Zhang, AdaPrompt: Adap-
     tive model training for prompt-based NLP, in: Y. Goldberg, Z. Kozareva, Y. Zhang
     (Eds.), Findings of the Association for Computational Linguistics: EMNLP 2022, Associ-
     ation for Computational Linguistics, Abu Dhabi, United Arab Emirates, 2022, pp. 6057–
     6068. URL: https://aclanthology.org/2022.findings-emnlp.448. doi:10.18653/v1/2022.
     findings-emnlp.448.
[13] S. Yu, T. He, J. Glass, AutoKG: Constructing Virtual Knowledge Graphs from Unstruc-
     tured Documents for Question Answering, 2021. doi:10.48550/arXiv.2008.08995.
     arXiv:2008.08995.
[14] B. Chen, A. L. Bertozzi, AutoKG: Efficient Automated Knowledge Graph Generation for
     Language Models, 2023. doi:10.48550/arXiv.2311.14740. arXiv:2311.14740.
[15] S. J. Semnani, V. Z. Yao, H. C. Zhang, M. S. Lam, WikiChat: Stopping the Hallucina-
     tion of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia, 2023.
     arXiv:2305.14292.
[16] B. D. Trisedya, G. Weikum, J. Qi, R. Zhang, Neural relation extraction for knowledge base
     enrichment, in: A. Korhonen, D. Traum, L. Màrquez (Eds.), Proceedings of the 57th Annual
     Meeting of the Association for Computational Linguistics, Association for Computational
     Linguistics, Florence, Italy, 2019, pp. 229–240. URL: https://aclanthology.org/P19-1023.
     doi:10.18653/v1/P19-1023.
[17] Y. Luan, L. He, M. Ostendorf, H. Hajishirzi, Multi-task identification of entities, relations,
     and coreferencefor scientific knowledge graph construction, in: Proc. Conf. Empirical
     Methods Natural Language Process. (EMNLP), 2018.
[18] T. Castro Ferreira, C. Gardent, N. Ilinykh, C. van der Lee, S. Mille, D. Moussallem, A. Shi-
     morina, The 2020 bilingual, bi-directional WebNLG+ shared task: Overview and evalua-
     tion results (WebNLG+ 2020), in: T. Castro Ferreira, C. Gardent, N. Ilinykh, C. van der
     Lee, S. Mille, D. Moussallem, A. Shimorina (Eds.), Proceedings of the 3rd International
     Workshop on Natural Language Generation from the Semantic Web (WebNLG+), Asso-
     ciation for Computational Linguistics, Dublin, Ireland (Virtual), 2020, pp. 55–76. URL:
     https://aclanthology.org/2020.webnlg-1.7.
[19] B. Zhang, H. Soh, Extract, Define, Canonicalize: An LLM-based Framework for Knowledge
     Graph Construction, 2024. arXiv:2404.03868.
[20] R. Han, T. Peng, C. Yang, B. Wang, L. Liu, X. Wan, Is information extraction solved by
     chatgpt? an analysis of performance, evaluation criteria, robustness and errors, 2023.
     arXiv:2305.14450.
[21] A. Q. Jiang, A. Sablayrolles, A. Mensch, C. Bamford, D. S. Chaplot, D. de las Casas, F. Bres-
     sand, G. Lengyel, G. Lample, L. Saulnier, L. R. Lavaud, M.-A. Lachaux, P. Stock, T. L. Scao,
     T. Lavril, T. Wang, T. Lacroix, W. E. Sayed, Mistral 7b, 2023. arXiv:2310.06825.
[22] J. Chen, S. Xiao, P. Zhang, K. Luo, D. Lian, Z. Liu, Bge m3-embedding: Multi-lingual,
     multi-functionality, multi-granularity text embeddings through self-knowledge distillation,
     2023. arXiv:2309.07597.
[23] M. Josifoski, N. De Cao, M. Peyrard, F. Petroni, R. West, GenIE: Generative information
     extraction, in: M. Carpuat, M.-C. de Marneffe, I. V. Meza Ruiz (Eds.), Proceedings of the
     2022 Conference of the North American Chapter of the Association for Computational
     Linguistics: Human Language Technologies, Association for Computational Linguistics,
     Seattle, United States, 2022, pp. 4626–4643. URL: https://aclanthology.org/2022.naacl-main.
     342. doi:10.18653/v1/2022.naacl-main.342.
[24] D. Ye, Y. Lin, P. Li, M. Sun, Packed levitated marker for entity and relation extraction, in:
     S. Muresan, P. Nakov, A. Villavicencio (Eds.), Proceedings of the 60th Annual Meeting of
     the Association for Computational Linguistics (Volume 1: Long Papers), Association for
     Computational Linguistics, Dublin, Ireland, 2022, pp. 4904–4917. URL: https://aclanthology.
     org/2022.acl-long.337. doi:10.18653/v1/2022.acl-long.337.
[25] P. Dognin, I. Padhi, I. Melnyk, P. Das, ReGen: Reinforcement learning for text and
     knowledge base generation using pretrained language models, in: M.-F. Moens, X. Huang,
     L. Specia, S. W.-t. Yih (Eds.), Proceedings of the 2021 Conference on Empirical Methods
     in Natural Language Processing, Association for Computational Linguistics, Online and
     Punta Cana, Dominican Republic, 2021, pp. 1084–1099. URL: https://aclanthology.org/2021.
     emnlp-main.83. doi:10.18653/v1/2021.emnlp-main.83.
[26] T. Li, G. Zhang, Q. D. Do, X. Yue, W. Chen, Long-context llms struggle with long in-context
     learning, 2024. arXiv:2404.02060.
[27] F. Brei, J. Frey, L.-P. Meyer, Leveraging small language models for text2sparql tasks to
     improve the resilience of ai assistance, 2024. arXiv:2405.17076.


A. Sample generated KG
This KG was generated under no schema constraint setting for this document: Mohammad
Firouzi ( Born 1958 Tehran ) is a prolific Iranian musician , whose primary instrument is the
barbat .
< P r e f i x e s and d e f i n i t i o n o f d e p e n d e n c i e s o m i t t e d >
wd : Mohammad_Firouzi a wd : human ;
        r d f s : l a b e l " Mohammad F i r o u z i " @en ;
       wdt : o c c u p a t i o n wd : M u s i c i a n ;
       wdt : C o u n t r y O f C i t i z e n s h i p wd : I r a n ;
       wdt : P l a c e O f B i r t h wd : Tehran ;
       wdt : D a t e O f B i r t h " 1 9 5 8 " ^ ^ x s d : d a t e .
Note that in official annotation, only triplets related to place of birth and nationality exist, hence
the evaluation will be penalized with low precision.


B. Preprocessing of Wikidata schema
To save space in LLM input context and mitigate performance drop on selected target schema
when ontology is large, we only include commonly used properties by restricting data type
on item, quantity, string, monolingual text, point in time.2 To align with common pretraining
objectives of LLM, we substitute entity identifiers (e.g. P19) with its literal label (rdfs:label in
PascalCase (e.g. PlaceOfBirth).


2
    https://www.wikidata.org/wiki/Help:Data_type
C. Prompts
All prompts are reused across all datasets.

C.1. CQ generation
We prompt LLM to generate up to 3 CQs per document for efficiency considering nature of test
datasets, but note that this may be adjusted.
W r i t e competency q u e s t i o n s b a s e d on t h e a b s t r a c t l e v e l c o n c e p t s
        i n t h e document . W r i t e q u e s t i o n s t h a t can be a nswered u s i n g
        t h e document o n l y .
W r i t e up t o 3 q u e s t i o n s p e r document .
Below a r e t h e e x a m p l e s and f o l l o w t h e same f o r m a t when
      g e n e r a t i n g competency q u e s t i o n s :

####
Document : D o u g l a s Noel Adams ( 1 1 March 1 9 5 2 − 11 May 2 0 0 1 ) was
   an E n g l i s h a u t h o r , h u m o u r i s t , and s c r e e n w r i t e r , b e s t known
   f o r The H i t c h h i k e r ’ s Guide t o t h e G a l a x y (HHGTTG) .
   O r i g i n a l l y a 1 9 7 8 BBC r a d i o comedy , The H i t c h h i k e r ’ s Guide
   t o t h e G a l a x y d e v e l o p e d i n t o a " t r i l o g y " o f f i v e b oo ks t h a t
   s o l d more t h a n 15 m i l l i o n c o p i e s i n h i s l i f e t i m e . I t was
   f u r t h e r developed into a t e l e v i s i o n series , s e v e r a l stage
   p l a y s , comics , a v i d e o game , and a 2 0 0 5 f e a t u r e f i l m . Adams ’
   s c o n t r i b u t i o n t o UK r a d i o i s commemorated i n The R a d i o
   Academy ’ s H a l l o f Fame .

####
Questions :
CQ1 . What i s t h e d a t e o f b i r t h o f D o u g l a s Noel Adams ?
CQ2 . What i s t h e d a t e o f d e a t h o f D o u g l a s Noel Adams ?
CQ3 . What i s t h e o c c u p a t i o n o f D o u g l a s Noel Adams ?
CQ4 . What i s t h e c o u n t r y o f c i t i z e n s h i p o f D o u g l a s Noel Adams ?
CQ5 . What i s t h e most n o t a b l e work o f D o u g l a s Noel Adams ?
CQ6 . What i s t h e o r i g i n a l medium o f The H i t c h h i k e r ’ s Guide t o
    the Galaxy ?
CQ7 . I n what y e a r was The H i t c h h i k e r ’ s Guide t o t h e G a l a x y
    originally broadcast ?
CQ8 . How many bo ok s a r e i n The H i t c h h i k e r ’ s Guide t o t h e G a l a x y
     " trilogy "?
CQ9 . What o t h e r media a d a p t a t i o n s were c r e a t e d b a s e d on The
    H i t c h h i k e r ’ s Guide t o t h e G a l a x y ?

####
Document :
{ document t o be p r o c e s s e d }

####
Questions :


C.2. CQ answering

Use t h e p r o v i d e d document t o answer u s e r q u e r y . I f you don ’ t
      know t h e answer , j u s t s a y t h a t you don ’ t know , don ’ t t r y t o
      make up an answer .
P a s s a g e : { doc }
Query : { q u e r y }


C.3. Relation extraction

You a r e an a s s i s t a n t i n b u i l d i n g a knowledge g r a p h . A n a l y z e t h e
        f o l l o w i n g competency q u e s t i o n s and i d e n t i f y a l l
      r e l a t i o n s h i p s and c o n c e p t s c o n c e p t s m e n t i o n e d i n t h e
      question .
E x t r a c t r e l a t i o n f i r s t , then d e s c r i b e the usage of each
      r e l a t i o n b a s e d on your u n d e r s t a n d i n g g i v e n t h e c o n t e x t o f
      competency q u e s t i o n s .
Afterwards , e x t r a c t a l l r e l a t i o n − r e l a t e d concepts .
You s h o u l d o n l y e x t r a c t p r o p e r t i e s between e n t i t i e s and
      l i t e r a l s , not e n t i t i e s themselves , or c l a s s e s of e n t i t i e s .
      T h e r e f o r e , n o t a l l CQs c o n t a i n v a l i d p r o p e r t i e s .
I f you don ’ t know t h e answer , j u s t s a y t h a t you don ’ t know , don
      ’ t t r y t o make up an answer .
Merge a l l r e l a t i o n s i n t o one l i s t and a l l c o n c e p t s i n t o one
      list .
Do n o t r e p l y u s i n g a c o m p l e t e s e n t e n c e , and o n l y g i v e t h e
      answer i n t h e f o l l o w i n g f o r m a t .

Below a r e t h e e x a m p l e s and f o l l o w t h e same f o r m a t t o e x t r a c t
    the r e l a t i o n s :

####
Document : D o u g l a s Noel Adams ( 1 1 March 1 9 5 2 − 11 May 2 0 0 1 ) was
   an E n g l i s h a u t h o r , h u m o u r i s t , and s c r e e n w r i t e r , b e s t known
   f o r The H i t c h h i k e r ’ s Guide t o t h e G a l a x y (HHGTTG) .
   O r i g i n a l l y a 1 9 7 8 BBC r a d i o comedy , The H i t c h h i k e r ’ s Guide
   t o t h e G a l a x y d e v e l o p e d i n t o a " t r i l o g y " o f f i v e b oo ks t h a t
    s o l d more t h a n 15 m i l l i o n c o p i e s i n h i s l i f e t i m e . I t was
    f u r t h e r developed into a t e l e v i s i o n series , s e v e r a l stage
    p l a y s , comics , a v i d e o game , and a 2 0 0 5 f e a t u r e f i l m . Adams ’
    s c o n t r i b u t i o n t o UK r a d i o i s commemorated i n The R a d i o
    Academy ’ s H a l l o f Fame .

####
Questions :
CQ1 . What i s t h e d a t e o f b i r t h o f D o u g l a s Noel Adams ?
CQ2 . What i s t h e d a t e o f d e a t h o f D o u g l a s Noel Adams ?
CQ3 . What i s t h e o c c u p a t i o n o f D o u g l a s Noel Adams ?
CQ4 . What i s t h e c o u n t r y o f c i t i z e n s h i p o f D o u g l a s Noel Adams ?
CQ5 . What i s t h e most n o t a b l e work o f D o u g l a s Noel Adams ?
CQ6 . What i s t h e o r i g i n a l medium o f The H i t c h h i k e r ’ s Guide t o
    the Galaxy ?
CQ7 . I n what y e a r was The H i t c h h i k e r ’ s Guide t o t h e G a l a x y
    originally broadcast ?
CQ8 . How many bo ok s a r e i n The H i t c h h i k e r ’ s Guide t o t h e G a l a x y
     " trilogy "?
CQ9 . What o t h e r media a d a p t a t i o n s were c r e a t e d b a s e d on The
    H i t c h h i k e r ’ s Guide t o t h e G a l a x y ?

####
Relations :
( d a t e o f b i r t h , The d a t e on which t h e s u b j e c t was born . )
( d a t e o f d e a t h , The d a t e on which t h e s u b j e c t d i e d . )
( o c c u p a t i o n , The o c c u p a t i o n o f a p e r s o n . )
( c o u n t r y o f c i t i z e n s h i p , The c o u n t r y o f which t h e s u b j e c t i s a
      citizen .)
( n o t a b l e work , The most n o t a b l e work o f a p e r s o n . )
( genre , The g e n r e o r t y p e o f work . )
( p u b l i c a t i o n d a t e , The d a t e o r p e r i o d when a work was f i r s t
      p u b l i s h e d or r e l e a s e d . )
( has part , I n d i c a t e s t h a t the s u b j e c t has a c e r t a i n part ,
      component , o r e l e m e n t . )
( s e r i e s , I n d i c a t e s t h a t the s u b j e c t i s p a r t of a s e r i e s , such
      a s a book s e r i e s , f i l m s e r i e s , o r t e l e v i s i o n s e r i e s . )

####
Document :
{ document t o be p r o c e s s e d }

####
Questions :
{ CQs }

####
Relations :


C.4. Ontology matching

D e c i d e i f t h e two p r o p e r t i e s a r e s e m a n t i c a l l y s i m i l a r i n an
      ontology .
You s h o u l d s a y y e s i f you d e c i d e t h a t t h e s e p r o p t i e s a r e
      s i m i l a r , or i f they are i n v e r s e p r o p e r t i e s .
Answer i n " y e s " o r " no " o n l y .
P r o p e r t y 1 : { p1 }
P r o p e r t y 2 : { p2 }


C.5. Ontology formatting
For properties under Wikidata schema, we retrieve schema:description, rdfs:domain, rdfs:range
for each property and include it in resulting ontology. Otherwise LLM is prompted to author
ontology as so:
Use t h e r e l a t i o n s ( p r o p e r t i e s ) and t h e i r u s a g e comments t o
      b u i l d an o n t o l o g y i n RDF f o r m a t .
I f you don ’ t know t h e answer , j u s t s a y t h a t you don ’ t know , don
      ’ t t r y t o make up an answer .
Don ’ t p r o v i d e a n y t h i n g o t h e r t h a n an o n t o l o g y i n RDF f o r m a t .
I n f e r and summarize c l a s s e s f o r domain and r a n g e o f t h e
      r e l a t i o n s a c r o s s t h e c o n c e p t s p r o v i d e d , and add t h e s e
      c l a s s e s to r e l a t i o n s only i f r e q u i r e d f o r c l o u s r e of
      relations .
F o r e a c h r e l a t i o n , add r e l e v a n t o n t o l o g y e n t r y f o r i t .
Add r d f s : comment b a s e d on t h e u s a g e comments .
Use wdt : namespace f o r a l l r e l a t i o n s d i s c o v e r e d . Use e n t i t i e s
      un d e r t h e s e p r e f i x e s i f n e c e s s a r y :
@ p r e f i x r d f : < h t t p : / / www. w3 . o r g / 1 9 9 9 / 0 2 / 2 2 − r d f − s y n t a x − ns #> .
@ p r e f i x x s d : < h t t p : / / www. w3 . o r g / 2 0 0 1 / XMLSchema#> .
@ p r e f i x r d f s : < h t t p : / / www. w3 . o r g / 2 0 0 0 / 0 1 / r d f − schema #> .
@ p r e f i x owl : < h t t p : / / www. w3 . o r g / 2 0 0 2 / 0 7 / owl #> .
@ p r e f i x w i k i b a s e : < h t t p : / / w i k i b a . s e / o n t o l o g y #> .
@ p r e f i x schema : < h t t p : / / schema . o r g / > .
@ p r e f i x wd : < h t t p : / / www. w i k i d a t a . o r g / e n t i t y / > .
@ p r e f i x wdt : < h t t p : / / www. w i k i d a t a . o r g / prop / d i r e c t / > .
Use t u r t l e s y n t a x .
Below i s an example :

####
Relations :
( r e s u l t s , r e s u l t s : r e s u l t s of a c o m p e t i t i o n such as s p o r t s or
      elections )

####
Ontology :
wdt : R e s u l t s a w i k i b a s e : P r o p e r t y ;
      schema : d e s c r i p t i o n " r e s u l t s o f a c o m p e t i t i o n s u c h a s s p o r t s
              or e l e c t i o n s " ;
      rdfs : label " results " ;
      r d f s : domain wd : r e f e r e n d u m , wd : c o m p e t i t i o n , wd : p a r t y
            c o n f e r e n c e , wd : s p o r t i n g e v e n t ;
      r d f s : r a n g e wd : e l e c t o r a l r e s u l t , wd : v o t i n g r e s u l t , wd : s p o r t
            r e s u l t , wd : r a c e r e s u l t .

####
Relations :
{ relation }

####
Ontology :


C.6. KG generation

Your t a s k i s t o c o n s t r u c t a knowledge g r a p h b a s e d on t h e
      provided ontology .
F o c u s on u n d e r s t a n d i n g r e l a t i o n s h i p s from t h e q u e s t i o n answer
      p a i r and document ,
and e x t r a c t r e l a t e d e n t i t i e s , t h e n mapping them t o t h e o n t o l o g y
        using the p r o p e r t i e s defined in the ontology .
Do n o t i n c l u d e new p r o p e r t i e s o t h e r t h a n t h o s e i n o n t o l o g y .
      Only u s e t h o s e p r o p e r t i e s i n t h e o n t o l o g y .
Output i n t u r t l e f o r m a t f o l l o w i n g t h e o n t o l o g y p r o v i d e d .
You s h o u l d o n l y i n c l u d e knowledge i n q u e s t i o n answer p a i r s and
      t h e document .
Do n o t make up a n s w e r s .

Use t h i s o n t o l o g y b a s e d on W i k i d a t a a s t h e s t a r t i n g p o i n t :
{ ont }
Below i s an example :

####
Document :
D o u g l a s Noel Adams ( 1 1 March 1 9 5 2 − 11 May 2 0 0 1 ) was an E n g l i s h
       a u t h o r , h u m o u r i s t , and s c r e e n w r i t e r , b e s t known f o r The
     H i t c h h i k e r ’ s Guide t o t h e G a l a x y (HHGTTG) . O r i g i n a l l y a 1 9 7 8
       BBC r a d i o comedy , The H i t c h h i k e r ’ s Guide t o t h e G a l a x y
     d e v e l o p e d i n t o a " t r i l o g y " o f f i v e b oo ks t h a t s o l d more t h a n
       15 m i l l i o n c o p i e s i n h i s l i f e t i m e . I t was f u r t h e r d e v e l o p e d
       i n t o a t e l e v i s i o n s e r i e s , s e v e r a l s t a g e p l a y s , comics , a
     v i d e o game , and a 2 0 0 5 f e a t u r e f i l m . Adams ’ s c o n t r i b u t i o n t o
      UK r a d i o i s commemorated i n The R a d i o Academy ’ s H a l l o f
     Fame .

####
Q u e s t i o n answer p a i r s :
Q : What i s D o u g l a s Adams an i n s t a n c e o f ?
A : D o u g l a s Adams i s an i n s t a n c e o f human .

Q : What i s D o u g l a s Adams ’ s e x o r g e n d e r ?
A : D o u g l a s Adams ’ s e x o r g e n d e r i s male .

Q : Where was D o u g l a s Adams born ?
A : D o u g l a s Adams was born i n Cambridge .

Q : Where d i d D o u g l a s Adams d i e ?
A : D o u g l a s Adams d i e d i n S a n t a B a r b a r a , C a l i f o r n i a .

Q : When was D o u g l a s Adams born ?
A : D o u g l a s Adams was born on 1 9 5 2 − 0 3 − 1 1 .

Q : On what d a t e d i d D o u g l a s Adams d i e ?
A : D o u g l a s Adams d i e d on 2 0 0 1 − 0 5 − 1 1 .

Q : What o c c u p a t i o n d i d D o u g l a s Adams have ?
A : D o u g l a s Adams was a w r i t e r , comedian , and d r a m a t i s t .

Q : What l a n g u a g e s d i d D o u g l a s Adams speak , w r i t e , o r s i g n ?
A : D o u g l a s Adams spoke , wrote , o r s i g n e d E n g l i s h .

Q : Where was D o u g l a s Adams e d u c a t e d ?
A : D o u g l a s Adams was e d u c a t e d a t S t John ’ s C o l l e g e , Cambridge
     and Brentwood S c h o o l , E s s e x .

Q : What i n s t i t u t i o n i s D o u g l a s Adams an a l u m n i o f ?
A : D o u g l a s Adams i s an a l u m n i o f S t John ’ s C o l l e g e .

Q : What a r e some n o t a b l e works by D o u g l a s Adams ?
A : Some n o t a b l e works by D o u g l a s Adams i n c l u d e The H i t c h h i k e r ’ s
     Guide t o t h e G a l a x y and D i r k G e n t l y ’ s H o l i s t i c D e t e c t i v e
    Agency .

Q : Was D o u g l a s Adams a member o f any n o t a b l e o r g a n i z a t i o n s ?
A : Yes , D o u g l a s Adams was a member o f Monty Python and The
    I n d e p e n d e n t on Sunday .

Q : What award d i d D o u g l a s Adams r e c e i v e ?
A : D o u g l a s Adams r e c e i v e d t h e L o c u s Award f o r B e s t S c i e n c e
    F i c t i o n Novel .

Q : What i s t h e Commons C a t e g o r y f o r D o u g l a s Adams ?
A : The Commons C a t e g o r y f o r D o u g l a s Adams i s " D o u g l a s Adams " .

####
Ontology :
@ p r e f i x r d f : h t t p : / / www. w3 . o r g / 1 9 9 9 / 0 2 / 2 2 − r d f − s y n t a x − ns # .
@ p r e f i x r d f s : h t t p : / / www. w3 . o r g / 2 0 0 0 / 0 1 / r d f − schema # .
@ p r e f i x wdt : h t t p : / / www. w i k i d a t a . o r g / prop / d i r e c t / .
@ p r e f i x wd : h t t p : / / www. w i k i d a t a . o r g / e n t i t y / .
@ p r e f i x x s d : h t t p : / / www. w3 . o r g / 2 0 0 1 / XMLSchema# .
wd : Douglas_Adams r d f s : l a b e l " D o u g l a s Adams " @en ;
       wdt : I n s t a n c e O f wd : human ;
       wdt : SexOrGender wd : male ;
       wdt : P l a c e O f B i r t h wd : Cambridge ;
       wdt : P l a c e O f D e a t h wd : S a n t a _ B a r b a r a _ C a l i f o r n i a ;
       wdt : D a t e O f B i r t h " 1 9 5 2 − 0 3 − 1 1 " ^ ^ x s d : d a t e ;
       wdt : D a t e O f D e a t h " 2 0 0 1 − 0 5 − 1 1 " ^ ^ x s d : d a t e ;
       wdt : O c c u p a t i o n wd : w r i t e r ;
       wdt : O c c u p a t i o n wd : comedian ;
       wdt : O c c u p a t i o n wd : d r a m a t i s t ;
       wdt : L a n g u a g e s S p o k e n W r i t t e n O r S i g n e d wd : E n g l i s h ;
       wdt : E d u c a t e d A t wd : S t _ J o h n s _ C o l l e g e _ C a m b r i d g e ;
       wdt : E d u c a t e d A t wd : B r e n t w o o d _ S c h o o l _ E s s e x ;
       wdt : AlumniOf wd : S t _ J o h n s _ C o l l e g e ;
       wdt : NotableWork wd : T h e _ H i t c h h i k e r s _ G u i d e _ t o _ t h e _ G a l a x y ;
       wdt : NotableWork wd : D i r k _ G e n t l y s _ H o l i s t i c _ D e t e c t i v e _ A g e n c y ;
     wdt : MemberOf wd : Monty_Python ;
     wdt : M e m b e r O f O r g a n i z a t i o n wd : T h e _ I n d e p e n d e n t _ o n _ S u n d a y ;
     wdt : Award wd : L o c u s _ A w a r d _ f o r _ B e s t _ S c i e n c e _ F i c t i o n _ N o v e l ;
     wdt : CommonsCategory " D o u g l a s Adams " @en .

####
Document :
{ doc }

####
Q u e s t i o n s and Answer p a i r s :
{ qa }

####
Ontology :

</pre>