<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Prompt-Time Ontology-Driven Symbolic Knowledge Capture with Large Language Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tolga Çöplü</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Arto Bendiken</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrii Skomorokhov</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eduard Bateiko</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stephen Cobb</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Haltia</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>In applications such as personal assistants, large language models (LLMs) must consider the user's personal information and preferences. However, LLMs lack the inherent ability to learn from user interactions. This paper explores capturing personal information from user prompts using ontology and knowledge-graph approaches. We use a subset of the KNOW ontology, which models personal information, to train the language model on these concepts. We then evaluate the success of knowledge capture using a specially constructed dataset. Our code and datasets are publicly available at https://github.com/HaltiaAI/paperPTODSKC</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;ontology-driven symbolic knowledge capture</kwd>
        <kwd>KNOW ontology</kwd>
        <kwd>symbolic representation</kwd>
        <kwd>knowledge graphs</kwd>
        <kwd>large language models</kwd>
        <kwd>fine-tuning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1. Introduction
termine which personal information will be
captured and how it will be associated with other
captured knowledge.
• Ontology rules can help identify inconsistencies
in the captured knowledge, allowing for
validation before storage.
• Ontology relationships allow the extraction of
implicit information from captured knowledge,
efectively enabling automatic inference that
expands the knowledge graph.
• A robust, personalized knowledge graph forms a
reliable foundation for facilitating personalized
interactions with the application through
language models.</p>
      <p>Currently, many generative artificial intelligence (AI)
applications, particularly personal assistants, strive to ofer
users personalized experiences. To achieve this, AI
applications must learn personal information and preferences
from user interactions (knowledge capture) and use this
learned knowledge in future conversations (knowledge
utilization). Implementing this fundamental personal AI
approach depends on addressing several complex
subproblems, such as discerning which user prompt
information is personal, extracting it, determining whether
the extracted information is duplicate, and associating it
with other personal data.</p>
      <p>These challenges have been the focus of extensive
research within the AI field for many years. However, the In this paper, we address a specific aspect of the AI
emergence of neurosymbolic approaches through the col- personalization challenge by focusing on prompt-time,
laboration between large language models (LLMs) and ontology-driven symbolic knowledge capture using
lansymbolic AI has provided researchers with new perspec- guage models. We explore the extraction from user
tives [1, 2, 3, 4]. LLMs’ capabilities in natural language prompts of subject-predicate-object triples1 that conform
processing can be integrated with the representational to a specified ontology. We have investigated various
and factual reasoning abilities of knowledge graphs, en- methods to enable the underlying language model to
hanced by the structure, rules, and inference mechanisms comprehend a pre-defined ontology, ensuring efective
ofered by an ontology. For targeted personal AI applica- symbolic knowledge capture. By utilizing a specially
tions, this ontology approach presents several benefits: designed dataset, we evaluate the efectiveness of these
methods, emphasizing their strengths and identifying
• Ontology schemas enable language models to de- potential areas for improvement.</p>
      <p>The structure of this paper is as follows: Section 2
3K0itLh’2A4C:MWoKrDksDhoCponofnerKenncoew,Aleudggue-sitn2f6u,s2e0d2L4,eaBranricnegloncoa-,lSopcaaitned with discusses in-context learning and fine-tuning approaches
* Corresponding author. for ontology-driven symbolic knowledge capture and
$ tolga@haltia.ai (T. Çöplü); arto@haltia.ai (A. Bendiken); focuses on the details of the fine-tuning approach.
Secandriy@haltia.ai (A. Skomorokhov); eduard@haltia.ai (E. Bateiko); tion 3 describes the experimental setup by presenting the
steve@haltia.ai (S. Cobb) development framework, the language model selection,
(A.0B0e0n9d-0ik00en4-)9;401040-00-5080802(-T5.6Ç96ö-p6l7ü2);30(0A0.9S-0k0o0m2-o0r7o2k5h-o4v8)7;4 and the ontology and dataset creation process. Section 4
0000-0002-6729-5611 (E. Bateiko); 0009-0004-0476-6000 (S. Cobb)
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License 1https://www.w3.org/TR/rdf12-concepts/</p>
      <p>Attribution 4.0 International (CC BY 4.0).
outlines our performance evaluation framework and the
test results. Finally, Section 5 concludes the paper and
suggests future directions.</p>
      <p>The following points highlight the key aspects of
ontology fine-tuning:</p>
    </sec>
    <sec id="sec-2">
      <title>2. Ontology-Driven Symbolic</title>
    </sec>
    <sec id="sec-3">
      <title>Knowledge Capture</title>
      <p>• The training dataset’s coverage and diversity are
vital for successful fine-tuning. These
characteristics greatly influence the LLM’s ability to
effectively capture knowledge. Details about the
dataset and how it is constructed are discussed in</p>
      <p>Section 3.4.
• The training dataset must include a variety of
examples for each element of the predefined
ontology. This approach avoids scalability issues
typically associated with in-context learning and
ensures comprehensive learning coverage.
• If the LLM encounters a user prompt that is not
relevant to the predefined ontology concepts, it
should not attempt to capture knowledge.
Therefore, the dataset should also contain suficient
out-of-context samples to enable the LLM to
distinguish between relevant and irrelevant
information for capture.</p>
      <p>In the literature, language models have demonstrated
their capability to transform unstructured text into a
knowledge graph [5, 6, 7, 8, 9]. However, the process
of populating a knowledge graph from user prompts
in alignment with a pre-defined ontology has been
explored only marginally [10, 11, 12, 13, 14]. Research
typically centers on in-context learning, which heavily
relies on prompt engineering. A significant limitation
of this approach is the requirement to incorporate the
entire custom ontology into the prompt. This
necessity not only slows down the knowledge capture
process, because of the high token overhead but also
restricts the use of larger ontologies due to the constraint
on context-window length. Given these constraints,
incontext learning methods do not provide a scalable solu- 3. Experimental Setup
tion for ontology-driven symbolic knowledge capture.</p>
      <p>An alternative approach involves training a language This section explores the components of our
experimenmodel with a pre-defined ontology, so that the model tal setup.
internalizes it. There are two strategies to consider:
pretraining the LLM on the ontology or fine-tuning it. This
paper does not explore pre-training due to its extensive 3.1. Development Framework
data, computational resources, energy, and time require- The methods suggested in this paper have been
implements. Additionally, pre-training does not ofer a flexible mented using the Apple MLX framework [15]. MLX is a
response to ongoing changes or expansions in the ontol- specialized array framework designed for machine
learnogy. Therefore, this paper will focus on fine-tuning as a ing applications, akin to NumPy, PyTorch, or JAX, with
method to train language models on personal ontologies, the distinction of being exclusive to Apple silicon.
highlighting advantages in feasibility and maintainabil- Ontology fine-tuning has been conducted using the
ity. parameter-eficient QLoRA approach [ 16] on our custom
dataset, comprising randomly selected, non-overlapping
sets of training, validation, and test samples.</p>
      <sec id="sec-3-1">
        <title>2.1. Ontology-Driven Knowledge Capture with Fine-Tuning</title>
        <p>Fine-tuning is a process whereby a pre-trained language
model is further trained on a specific dataset to tailor
its capabilities to a particular task. In our study, the
language model is expected to learn the classes, object
properties, and data properties defined in an ontology,
and to use them to populate a knowledge graph from user
prompts. The first step involves preparing a fine-tuning
dataset, which includes user prompts, system prompts,
and expected model responses for each concept in the
ontology. This dataset is used to fine-tune the language
model, which is then evaluated by testing it with new
prompts to assess the efectiveness of the knowledge
capture operation. We define a system prompt for this
task with the requirement of maintaining the model’s
generality across other tasks.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Language Model</title>
        <p>The methods we have developed here do not have a
structural dependency on a particular underlying foundation
model. The key factors guiding our language model
selection were its proven efectiveness across diverse
domains in community benchmarks and its prevalence in
the field. Owing to its performance in the Hugging Face
Open LLM Leaderboard [17] and its robust ecosystem,
the Mistral-7B-Instruct-v0.2 [18], based on the Llama 2
[19] architecture, was selected for our research. We ran
all examples, tests, and benchmarks on the MLX 4-bit
quantized version of this model to be able to run the tests
on personal laptops.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Predefined Ontology</title>
        <p>Our study is inspired by KNOW[20]–the Knowledge
Navigator Ontology for the World–and utilizes it for
representing personal information. KNOW is introduced as
a pioneering framework designed to capture everyday
knowledge to enhance language models in real-world
generative AI applications such as personal AI assistants.
The ontology focuses on human life, encompassing
everyday concerns and significant milestones, and limits its
initial scope to established human universals, including
spacetime (places, events) and social dimensions (people,
groups, organizations). This pragmatic approach
emphasizes universality and utility, contrasting with previous
works like Schema.org[21] and Cyc[22] by building on
language models’ inherent encoding of salient
commonsense knowledge.</p>
        <p>Due to the requirement that each element in the
ontology be associated with a diverse set of prompt and
response samples within the training dataset, our research
focuses on a specific subset of the KNOW ontology. This
subset concentrates on core family relationships with
four ontology classes, eleven object properties, and one
data property. A visual depiction of this subset is
presented in Figure 1.</p>
        <p>{&lt;sppaortunseer{} {&lt;kcnhoilwds{} {&lt;pkanroewnst{} {&lt;pkanrotwnse{r} knows {&lt;ssibisltinegr{} {&lt;mpaortehnetr{} {&lt;ksnibolwinsg{} {&lt;bsriboltinhge{r} {&lt;pfaartehnetr{}</p>
        <p>Person
Thing
name:stringsex</p>
        <p>Sex
=Female{}orMale{}</p>
        <p>mother
=sex{}valueFemale{}</p>
        <p>father
=sex{}valueMale{}</p>
        <p>sister
=sex{}valueFemale{}</p>
        <p>brother
=sex{}valueMale{}</p>
        <p>Female</p>
        <p>Male</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Dataset</title>
        <p>Person
Name
Sex
Child
Father
Mother
Sibling
Sister
Brother
Spouse
Partner
Knows
None</p>
        <p>0
Person
Name
Sex
Child
Father
Mother
Sibling
Sister
Brother
Spouse
Partner
Knows</p>
        <p>None
we included 32 generic user prompts in the dataset. The
composition of this dataset, which consists of 175 user
prompts, is illustrated in Figure 2. Concepts not
associated with the ontology are labeled as the ’none’ legend
in the figure. As each sample prompt typically contains
multiple modeled concepts, the chart shows a total
number of concept occurrences greater than the number of
prompts.</p>
        <p>Occurrences of Ontology Concepts
100</p>
        <p>200
Number of Occurrences
300</p>
        <p>The Turtle format was chosen for serializing the
ontology population in our research because of its
straightforward structure, readability, and prevalent use in existing
pre-training datasets for LLMs.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Performance Evaluation</title>
      <p>Our research focuses on fine-tuning a language model
with predefined ontology concepts and capturing
knowledge from user prompts that fits the ontology. This
section will detail the performance evaluations associated
with these eforts.</p>
      <p>Occurrences of Ontology Concepts
For a language model to efectively learn a predefined
ontology and use it to perform knowledge extraction
and capture, a robust and diverse training dataset is es- 0 20 40 60
sential. Our paper focuses on a subset of the KNOW Number of Occurrences
ontology that includes the concepts of ‘person’, ‘name’,
‘sex’, ‘child’, ‘father’, ‘mother’, ‘sibling’, ‘sister’, ‘brother’, Figure 3: Occurrences of ontology concepts in the test
‘spouse’, ‘partner’ and ‘knows’. We created 143 manually dataset.
crafted user prompts along with their respective
ontology responses for training and tests. Additionally, to Initially, we investigated how many training samples
manage inputs that fall outside these ontology concepts, each ontology concept required to efectively teach the
core family ontology to the selected
Mistral-7B-Instructv0.2 model. From the dataset described in Section3.4, 41
random samples were selected and reserved for method
evaluation. The distribution of ontology concepts within
this test set is shown in Figure 3.</p>
      <p>From the remaining 134 samples, we created three
diferent training datasets. In these three datasets, we
have ensured the inclusion of 2, 4, and 8 sample prompts
for the ontology concepts ‘child’, ‘father’, ‘mother’,
‘sibling’, ‘sister’, ‘brother’, ‘spouse’, ‘partner’, and ‘knows’.</p>
      <p>Subsequent evaluations using the test set measured the
precision, recall, and f1-scores for each fine-tuning
session. During these evaluations, the generated prompt
responses were processed triple by triple and compared
against the ground truth established for the test set. The
ifndings are displayed in Figure 4.</p>
      <p>Performance vs. Samples per Ontology Concept</p>
      <p>Precision Recal F1-Score
1.00
0.75
0.50
0.25
0.00</p>
      <p>In the subsequent phase, we explored the optimal
number of training epochs required to achieve maximum
performance for the training set. For this analysis, we
continued using the default MLX QLoRA
hyperparameters with the 8 samples per ontology concept, but trained
the QLoRA adapter over various epoch lengths. We then
conducted evaluations on the test set using each trained
adapter, and the findings are presented in Figure 5.</p>
      <p>Performance vs. Epochs of Training</p>
      <p>Precision Recal F1-Score
1.00
0.75
0.50
0.25
0.00
9 training epochs</p>
      <p>18 training epochs 27 training epochs 36 training epochs</p>
      <p>As depicted in Figure 5, the success rate of the
ontology population increases with longer training. However,
considering the resource usage and energy consumption,
we observe that 18 epochs are suficient for fine-tuning.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>2 samples/concept 4 samples/concept 8 samples/concept entire training dataset</p>
      <p>The tests were conducted using the default QLoRA
hyperparameters specified in the MLX framework. To
ensure consistent test results, each training session was
set to run for 18 epochs.</p>
      <p>In this paper, we first explored prompt-driven,
ontologybased symbolic knowledge capture and its importance
in the generative AI domain. We then discussed the
ontology approach and how to teach ontology concepts
• Layer keys to apply: ["self_attn.q_proj", to the language model through in-context learning and
"self_attn.v_proj"] training. The language model was fine-tuned using a
cus• Rank: 8 tom dataset focused on core family relationships, and we
• Alpha: 16 evaluated the model’s ability to learn personal ontology
• Scale: 10 concepts.
• Optimizer: Adam Our findings indicate that fine-tuning is particularly
• Learning rate: 110− 5 efective for teaching ontology concepts to a language
model for prompt-time knowledge capture. In our
fu• Number of layers to fine-tune: 16 ture work, we aim to integrate the generated knowledge
• Minibatch size: 4 graph with the language model for knowledge utilization,
As illustrated in Figure 4, our study, which encom- combining the strengths of the neural and symbolic AI
passes twelve key ontology concepts, demonstrates that approaches.
providing eight diverse examples for each concept yields Please refer to the paper’s corresponding GitHub
reposacceptable success rates. Although our training and test itory at https://github.com/HaltiaAI/paper-PTODSKC
datasets are not suficiently diverse or large enough to
generalize the results to real user scenarios, the high
success achieved with a small number of training samples is
promising for the feasibility of the proposed approach.
[20] A. Bendiken, KNOW: A Real-World Ontology for</p>
      <p>Knowledge Capture with Large Language
Models, 2024. URL: https://arxiv.org/abs/2405.19877.</p>
      <p>arXiv:2405.19877.
[21] R. V. Guha, D. Brickley, S. Macbeth, Schema.org:
evolution of structured data on the web,
Communications of the ACM 59 (2016) 44–51. URL: https://
doi.org/10.1145/2844544. doi:10.1145/2844544.
[22] D. B. Lenat, R. V. Guha, Building Large
Knowledge</p>
      <p>Based Systems; Representation and Inference in
the Cyc Project, 1st ed., Addison-Wesley Longman
Publishing Co., Inc., USA, 1989.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>