<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards Explainable Commonsense Reasoning: Semantic Rule Generation from Text using LLMs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Muhammad Raza Naqvi</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Arkopaul Sarkar</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Antoine Zimmermann</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bernard Archimede</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Linda Elmhadhbi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mohamed Hedi Karray</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Physics, Georgetown University</institution>
          ,
          <addr-line>37th St NW,Washington, DC 20057</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>INSA Lyon, Université Lumière Lyon 2, Université Claude Bernard Lyon 1, Université Jean Monnet Saint-Etienne, DISP UR4570</institution>
          ,
          <addr-line>Villeurbanne</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Laboratoire Génie de Production, Université de Technologie Tarbes Occitanie Pyrénées (UTTOP)</institution>
          ,
          <addr-line>47 Av. d'Azereix, Tarbes, 65016</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Mines Saint-Etienne, Univ Clermont Auvergne, INP Clermont Auvergne</institution>
          ,
          <addr-line>CNRS, UMR 6158 LIMOS, F-42023 Saint-Etienne</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>Commonsense knowledge (CSK) is critical for enhancing artificial intelligence (AI) systems by improving their understanding, reasoning, and interaction with the human world, particularly in planning and decision-making tasks. To be practically applicable, CSK must be expressed using the standard vocabulary of the target domain and must be available in suficient quantity and specificity. Large Language Models (LLMs) have shown promise in eficiently curating domain-specific CSK in natural language statements. However, transforming these statements into formal semantic rules such as those written in the Semantic Web Rule Language (SWRL) or Datalog requires further processing and structured prompt engineering. These models also often fail to incorporate standard vocabularies such as those defined by ISO 21838 when generating such rules, limiting their interoperability and reuse. This paper addresses the interoperability challenges in capturing CSK and highlights the importance of standardized vocabularies for semantic integration. We propose a template-based prompt-engineering method combined with a predefined vocabulary-to-ontology mapping to guide LLMs in generating semantic rules from natural language CSK. Our findings reveal key limitations in the ability of LLMs to align output with standard ontologies. To address this, we propose a template-based prompt-engineering method combined with a predefined vocabulary-to-ontology mapping. Comparative evaluation shows that our approach improves consistency and enhances alignment with upper-level ontologies when expressing CSK as semantic rules.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Common Sense Knowledge</kwd>
        <kwd>Knowledge Engineering</kwd>
        <kwd>Large Language Models</kwd>
        <kwd>Manufacturing CommonSense knowledge</kwd>
        <kwd>Semantic Explainable AI</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>In the era of AI, CSK is a crucial component of AI systems that enables them to make rational
and explainable decisions, much like humans do [1]. CSK is an essential element of today’s
AI-driven decision-making applications. When it comes to sophisticated tasks and interactions
across diferent domains, it is pivotal for AI systems to be equipped with this type of knowledge.
This form of knowledge includes the implicit and frequently used understanding of the world
that humans naturally possess [2]. Researchers in various domains are increasingly emphasizing
the importance of acquiring and integrating appropriate domain-specific CSK [3, 4, 5].</p>
      <p>using a large volume of domain-specific CSK improves the AI system’s ability to make
decisions efectively and in an explainable manner. It also enables the system to adapt appropriately
to diferent scenarios [ 6, 7]. The emergence of LLMs has initiated a new era in which these
models possess vast amounts of embedded knowledge across numerous domains. As a
result of the extensive information they contain, LLMs can serve as a surface-level source of
commonsense-like assertions across a wide range of domains; however, their reliability and
semantic coherence as knowledge sources remain limited and often require external validation.
The realization that these models are trained on massive textual datasets and reflect a broad
spectrum of human knowledge makes them highly valuable for capturing CSK [8].</p>
      <p>However, the transition from the unstructured, semantically ambiguous CSK generated by
LLMs to structured, logically coherent semantic rules remains a significant challenge [ 9, 10, 11].
Semantic rules are crucial for explainability because they provide explicit, human-readable logic
that governs system decisions, enabling users to trace why and how a particular output was
produced. Unlike black-box models, semantic rules support transparent reasoning by linking inputs
to conclusions through well-defined conditions grounded in domain knowledge. This clarity
enables justified explanations, particularly in high-stakes contexts such as manufacturing or
healthcare. This is particularly evident when using GPTs (Generative Pre-trained Transformers)
for automatic knowledge engineering [12, 13]. While the evolution of LLMs raises questions
about the extent to which such models might be integrated into various industries, businesses,
and most importantly education it also brings forward critical issues such as ethics [14, 15],
trustworthiness [16, 17], and adherence to Findable, Accessible, Interoperable, Reusable (FAIR)
data principles [18, 19]. In the specific context of knowledge engineering, particularly
ontology development, one might ask: Are we heading toward a future where LLMs automatically
generate ontologies, potentially rendering human ontologists obsolete? [20].</p>
      <p>We argue that ontology engineering and mapping extend far beyond mere linguistic tasks.
While LLMs are proficient at tasks such as relation extraction and entity recognition, both of
which support ontology engineering, true ontology development requires input from domain
experts to define terms, structure hierarchical relationships, and provide formal representations
grounded in logical inference. Furthermore, one of the keys to making ontologies FAIR and
interoperable is aligning them with standard vocabularies, such as ISO-21838, which are closely
linked to commonsense knowledge. Ontologies are consistently validated by domain experts
and evolve, whereas the validation of information produced by LLMs remains an open issue
[16, 17].</p>
      <p>This paper proposes a methodology for generating semantic rules from CSK statements,
guided by predefined mappings to standard ontologies. It argues for the continued importance
of ontologies and the necessity of human involvement in the development process, particularly
for capturing and incorporating CSK from LLMs. This approach enables the derivation of
rule-based expressions such as First Order Logic (FOL) that can inform the creation of ontology
classes and properties using standard vocabularies. When LLMs are prompted to generate
NL statements based on CSK, these statements are then transformed into First-Order Logic
(FOL) rules based on CSK patterns aligned with standard vocabulary. This rule-based method
addresses key limitations of LLMs by producing formal, vocabulary-aligned rules that support
the development of consistent, standardized, and semantically rich ontologies, in adherence to
principles of formal logic.</p>
      <p>The remainder of the paper is structured as follows: Section 2 provides a brief overview of
how LLMs operate based on textual patterns, Section 3 discusses prompt engineering techniques,
and examines the limitations of LLMs in knowledge engineering, along with the need for human
involvement. Section 4 presents the proposed methodology based on CSK-driven semantic
rules and predefined mappings to standard ontologies. Section 5 assesses the efectiveness and
applicability of our method, also its limitations, and Section 6 demonstrates the applicability of
the proposed methodology in the manufacturing domain, and lastly, Section 7 concludes the
paper and outlines directions for future work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Literature Review</title>
      <p>Understanding the distinction between facts and knowledge is pivotal to addressing the challenge
of ontology development using LLMs [21]. A fact is “a statement that can be proven to be
true or false,” whereas knowledge, in the context of ontologies and knowledge engineering,
encompasses a broader understanding that includes the interpretation and inference of facts
within a certain domain. Knowledge is not about truthfulness but involves the structured
organization and representation of information that can be used to infer new insights [22]. CSK
is a subset of knowledge that is considered to be universally true [23] and is crucial for the
development of ontologies that accurately reflect real-world semantics [ 24]. When discussing
CSK, the Cyc project aimed to develop an ontology containing common knowledge terms, facts,
concepts, and rules. The project also focused on creating a system capable of communicating in
English and learning from human interactions [25]. This goal has now been partially achieved
by LLMs, which can interact with users in natural language and answer queries using diferent
prompts, although without creating ontologies, a central aim of the Cyc project.</p>
      <sec id="sec-2-1">
        <title>2.1. The Evolution and Impact of LLMs in AI</title>
        <p>LLMs represent a significant era in technological innovation. They have not only reshaped the
landscape of AI but also ushered in a new research era focused on Generative AI. The evolution
of Natural Language Processing (NLP) has played a critical role in enabling machines to read,
understand, and make sense of human language, alongside Machine Learning (ML) systems that
facilitate the development of models capable of making predictions from data [26, 27, 28, 29].
As we explore LLMs further, it becomes evident that models like GPTs (Generative Pre-trained
Transformers) signify a major leap in AI. LLMs are designed to understand, interact, and generate
language at an unprecedented scale, owing to their access to massive text corpora from which
they learn linguistic patterns and structures. This enables them to perform a wide range of
language-based tasks with remarkable eficiency [ 30]. The capabilities of these models have
sparked debates about whether LLMs are on par with humans in everyday tasks and the societal
implications of their use [31]. Today, it is common to find examples showcasing LLMs as either
impressively intelligent or inexplicably flawed, often referred to as hallucinations."Regardless,
they demonstrate a notable ability to process and respond to human language, often requiring
substantial background knowledge [32].</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Challenges in Using LLMs for Ontology Development</title>
        <p>Despite the substantial advancements achieved by LLMs, their application in ontology
development presents unique challenges due to insuficient knowledge modeling and limited reasoning
capabilities [33]. In knowledge engineering, ontologies represent a set of concepts within a
domain and the relationships between them. They are essential for reasoning about entities
and making inferences. Ontology development entails creating a standardized vocabulary and
expressing formal semantics through axioms. The challenge lies in the design of LLMs: they
excel in statistical pattern recognition but struggle with the deep semantic structures and formal
logic required for ontology construction [34, 35]. LLMs lack the ability to grasp intricate and
nuanced relationships and classifications, which are essential for accurate ontology development
[36]. Although LLMs can comprehend and generate text using statistical patterns, they cannot
understand the semantic relationships and logical structures necessary for creating meaningful
ontologies [37, 38]. This limitation highlights the need for innovative approaches that meet the
requirements of formal logic and semantic complexity [39].</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Ontology Development Practices and the Role of Pre-Trained</title>
    </sec>
    <sec id="sec-4">
      <title>LLMs</title>
      <p>Ontologies, especially reference ontologies, are developed to provide a standardized, controlled
vocabulary for a specific domain. For example, the IOF Core Ontology encompasses notions
common across multiple manufacturing domains, while top-level ontologies such as the Basic
Formal Ontology (BFO)1, Suggested Upper Merged Ontology (SUMO), and OpenCyc address
more general conceptualizations [40].</p>
      <p>An open-source reference ontology ofers human- and machine-readable definitions of its
vocabulary. A major use case for reference ontologies is enabling interoperability between
datasets that use these standardized terms in their data or metadata.</p>
      <p>
        LLMs can provide diverse information based on user prompts via natural language interactions.
Numerous prompt engineering techniques are available, as summarized by Schmidt et al.
[41]. Pre-trained LLMs such as OpenAI’s ChatGPT series and Google’s BERT have shown
efectiveness in various NLP tasks. While their primary role is content generation and language
translation, they are now also being explored for ontology creation due to several capabilities:
(1) Semantic Understanding, (2) Entity Recognition, (
        <xref ref-type="bibr" rid="ref1">3</xref>
        ) Relation Extraction, and (
        <xref ref-type="bibr" rid="ref2">4</xref>
        ) Concept
Generation. Trained on vast amounts of text, these models capture relationships such as
synonymy, hyponymy, and hypernymy due to their semantic capabilities [42]. They also
perform well on Named Entity Recognition (NER) tasks, identifying entities such as locations,
organizations, and people, and can extract relationships between mentioned entities [43].
According to Grandi et al., LLMs can generate conceptual designs based on prompts [44].
      </p>
      <sec id="sec-4-1">
        <title>3.1. Limitations of LLMs in Knowledge Engineering</title>
        <p>
          Despite these capabilities, LLMs exhibit several limitations that render them unsuitable as
standalone tools for ontology development [44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57]:
(1) Lack of explicit knowledge representation, (2) Lack of logical consistency, (
          <xref ref-type="bibr" rid="ref1">3</xref>
          ) Semantic
ambiguity and inconsistent responses, (
          <xref ref-type="bibr" rid="ref2">4</xref>
          ) Domain specificity, (
          <xref ref-type="bibr" rid="ref3">5</xref>
          ) Data bias and incompleteness,
(
          <xref ref-type="bibr" rid="ref4">6</xref>
          ) Limited multi-modal understanding, (
          <xref ref-type="bibr" rid="ref5">7</xref>
          ) Scalability issues.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>3.2. Comparison with Existing Pre-Trained GPTs</title>
        <p>LLMs generate responses based on patterns in training data, not on explicit semantic structures
like those in ontologies. When generating definitions or relationships between terms, LLMs may
produce inconsistent information due to inaccurate representations in their training data. A key
issue is the inability of LLMs to generate ontologies aligned with formal ontological standards.
While they can convert natural language into semantic rules (e.g., SPARQL, SWRL, or Datalog),
they often invent predicates instead of reusing existing ones from standard ontologies. This
behavior underscores the need for aligning LLM-generated content with established standards,
such as Top-Level Ontologies (TLOs) like the Basic Formal Ontology (BFO) [ISO/IEC
218382:2021] and mid-level ontologies like the Industrial Ontologies Foundry (IOF)2, and Machine
Service Description Language (MSDL)3. For general-purpose vocabularies like FOAF (Friend of
a Friend) or family trees, LLMs can efectively model text and align with standard ontologies,
given the generality of linguistic terms and labels. However, modeling abstract concepts using
vocabularies like BFO is more dificult. In our prompt-based use case with GPT-3.5 4, the model
often failed to reuse the provided ontological terms and instead generated plausible-sounding
but ontologically invalid relations, highlighting its dependence on linguistic surface patterns
rather than formal alignment.</p>
        <p>GPT-3.5 also faces input limitations due to the 4096-token cap, which prevents users from
uploading large RDF or TTL files. In GPT-4.0 5, users can upload ontologies as files, improving
results. Nevertheless, the model still fails to consistently reuse the provided ontological classes
and relations.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Methodology</title>
      <p>The proposed method6 addresses the significant challenge of translating Common Sense
Knowledge (CSK) into semantic rules. These rules are retrieved from a Large Language Model (LLM) as
natural language (NL) statements and subsequently converted into ontology classes, subclasses,
instances, or relationships, while ensuring alignment with standard vocabularies. The primary
goal is to bridge the gap between the flexible nature of natural language and the structured,
rigid requirements of ontologies necessary for efective reasoning and data integration.</p>
      <sec id="sec-5-1">
        <title>2https://spec.industrialontologies.org/iof/</title>
        <p>3https://labs.engineering.asu.edu/semantics/ontology-download/msdl-ontology/
4https://chat.openai.com/share/44a3d1d6-66b2-407d-b6db-509f158a30f9
5https://chat.openai.com/share/86565869-7a00-47e8-9067-d6359f61c32c
6https://github.com/MRNaqvi/Common-Sense-Knowledege-Driven-SemanticRule-Base-Ontology-Mapping</p>
        <p>To achieve this, a rule-based mechanism identifies relevant concepts within CSK statements
and systematically integrates them into ontology elements. This process enables the structured
transformation of CSK into semantic rules aligned with ontologies, leveraging the owlready27
library for ontology manipulation. We use the Owlready2 library, a Python module for loading,
editing, and reasoning with OWL ontologies, due to its seamless integration with Python-based
systems and support for ontology-driven rule reasoning.</p>
        <sec id="sec-5-1-1">
          <title>Formalizing Semantic Rule Specialization using Common Sense Knowledge (CSK)</title>
          <p>Definition 1: Rule Template
A Rule Template,  , is defined as a logical expression containing placeholders that represent
general classes or relationships.</p>
          <p>Example:
process() → ∃process() ∧ comesAfter(, )
(1)
Definition 2: Common Sense Knowledge (CSK)
A CSK is a natural language statement that provides specific knowledge about a particular case,
such as classes or instances related to a process (e.g., painting). CSK is extracted from LLMs
using the chain-of-thought prompt engineering method. The CSK serves as the source from
which specific information is extracted to replace placeholders in the Rule Template.</p>
          <p>Example 1: “The result of the painting process is a painted object.”
Example 2: “After the painting process, you do the drying process.”
Example 3: “The drying process involves a dryer machine.”
Definition 3: Concrete Rule
A Concrete Rule, , is derived from a Rule Template by replacing its placeholders with specific
classes or instances extracted from a CSK.</p>
          <p>Function: SpecializeRule</p>
        </sec>
      </sec>
      <sec id="sec-5-2">
        <title>Formally defined, the function SpecializeRule is:</title>
        <p>SpecializeRule :  ×  → 
(2)
Process:
1. Input: A Rule Template,  , and Common Sense Knowledge, .
2. Extraction: Identify and extract specific classes or instances from the .
3. Substitution: Systematically replace the placeholders in  with the extracted classes
or instances.</p>
        <p>4. Output: Return a Concrete Rule, .
7https://owlready2.readthedocs.io/en/v0.48/
Rule 1
Given the Rule Template  based on standard vocabulary classes and property relations from
BFO and IOF:</p>
        <p>IOF:MaterialProduct() → ∃BFO:process() ∧ IOF:isOutputOf(, )</p>
      </sec>
      <sec id="sec-5-3">
        <title>CSK: “The result of painting is a painted object.”</title>
        <p>Applying SpecializeRule:</p>
        <p>paintedObject() → ∃painting() ∧ IOF:isOutputOf(, )
Rule 2</p>
      </sec>
      <sec id="sec-5-4">
        <title>Given the Rule Template  :</title>
        <p>BFO:process() → ∃BFO:process() ∧ BFO:precedes(, )</p>
      </sec>
      <sec id="sec-5-5">
        <title>CSK: “After painting process, you should perform a drying process.”</title>
        <p>Applying SpecializeRule:</p>
        <p>drying() → ∃painting() ∧ BFO:precedes(, )
Rule 3</p>
      </sec>
      <sec id="sec-5-6">
        <title>Given the Rule Template  :</title>
        <p>
          BFO:process() → ∃MSDL:productionEquipment() ∧ BFO:participatesInAtSomeTime(, )
(
          <xref ref-type="bibr" rid="ref5">7</xref>
          )
CSK: “The drying process involves a dryer machine.”
Applying SpecializeRule:
        </p>
        <p>drying() → ∃dryer() ∧ BFO:participatesInAtSomeTime(, )</p>
        <p>Explanation: The function SpecializeRule refines a general rule template based on CSK to
yield a concrete rule suitable for a particular context. In this case, the rule template associates a
process with the equipment involved in it. The CSK statement “The drying process involves a
dryer machine” allows us to adapt the template to indicate that a dryer machine participates in
the drying process.</p>
        <p>
          Our method utilizes the pattern recognition capabilities of LLMs to interpret varied
expressions of CSK and map them into predefined rule structures. LLMs are highly proficient at
extracting and preserving the core semantics of patterns, even when expressed in diverse ways.
This adaptability ensures that the system remains robust in handling the complex language
(
          <xref ref-type="bibr" rid="ref1">3</xref>
          )
(
          <xref ref-type="bibr" rid="ref2">4</xref>
          )
(
          <xref ref-type="bibr" rid="ref3">5</xref>
          )
(
          <xref ref-type="bibr" rid="ref4">6</xref>
          )
(
          <xref ref-type="bibr" rid="ref6">8</xref>
          )
often found in technical or industrial texts. The primary purpose of translating CSK into
semantic rules is to ensure that ontology classes, subclasses, and predicates align with standard
vocabularies. This alignment helps resolve LLMs’ common issues with ontology-incompatible
outputs. Our methodology employs OWLReady2 for ontology manipulation. When CSK is
converted into ontology elements through semantic rule generation, we verify whether the
resulting classes and predicates match a standard vocabulary. If a close match is found based on
definitions and axioms, the concept is used directly or mapped via a predefined semantic rule.
        </p>
        <p>For instance, in the CSK statement, The result of painting is a painted object, the term
“painting” aligns with the concept BFO:Process, while the painted object corresponds to
IOF:MaterialProduct. Our semantic rule-based mapping establishes that a painted object
is the output of a painting process using the IOF property IOF:isOutputOf.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Evaluation</title>
      <p>To evaluate the efectiveness of our approach for transforming CSK into semantic rules aligned
with standard ontologies, we conducted a two-part study focusing on (i) the correctness of class
and relation extraction, (ii) semantic alignment with reference ontologies, and (iii) the practical
usability of the generated rules for ontology population and reasoning.</p>
      <p>We compiled a dataset of 50 CSK statements related to manufacturing processes (e.g., painting,
drying, welding), extracted from LLM queries using chain-of-thought prompting. Each statement
was processed using our SpecializeRule function to generate corresponding semantic rules.
The evaluation was conducted in three stages:
1. Class/Relation Extraction Accuracy: Manually annotated gold-standard mappings of
ontology classes and relations were created for the CSK statements.
2. Semantic Alignment: We assessed whether the generated rules used vocabulary terms
consistent with BFO, IOF, and MSDL ontologies.
3. Usability in Ontology Population: We tested the generated rules in populating OWL
ontologies via the owlready2 API and evaluated their syntactic and semantic correctness.</p>
      <sec id="sec-6-1">
        <title>5.1. Metrics</title>
        <p>We used the following metrics:
• Precision: Fraction of correctly mapped classes/relations over all predicted.
• Recall: Fraction of correct mappings in the gold standard that were retrieved by the
system.
• Semantic Validity: Percentage of rules whose predicates and classes corresponded to
terms defined in BFO, IOF, or MSDL.
• Rule Usability: Proportion of rules successfully instantiated and executed within an</p>
        <p>OWL ontology environment.</p>
      </sec>
      <sec id="sec-6-2">
        <title>5.2. User Study: Expert Evaluation of Semantic Rules</title>
        <p>To complement the quantitative metrics, we conducted a small-scale user study involving five
domain experts from the fields of manufacturing and knowledge engineering. The participants
were asked to evaluate a randomized subset of 20 generated semantic rules corresponding to
CSK statements.</p>
        <p>Each expert assessed the following dimensions on a 5-point Likert scale:
1. Correctness: Does the rule correctly represent the intended meaning of the CSK
statement?
2. Ontological Alignment: Are the classes and predicates correctly aligned with standard
vocabularies (BFO, IOF, MSDL)?
3. Usefulness: Would this rule be useful for automating ontology population or reasoning?
Results:</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>6. Application of Proposed Methodology in Manufacturing</title>
      <p>We propose MACS-KG8 [59], a specialized knowledge graph that incorporates Manufacturing
Commonsense Knowledge (MCSK) to enhance reasoning and explainability within
manufacturing decision-making processes. The core innovation of MACS-KG lies in its ability to extract
domain-specific knowledge from pre-trained Large Language Models (LLMs) through
Chain-ofThought prompt engineering, leveraging established MCSK patterns [1] [59]. The extracted
knowledge is then automatically transformed into First-Order Logic (FOL) representations that
align with standard ontological vocabularies such as the Basic Formal Ontology (BFO)9, the
Industrial Ontologies Foundry (IOF)10, and the Machine Service Description Language (MSDL)11.
This semantic alignment is critical for ensuring interoperability and consistent reasoning across
heterogeneous manufacturing data sources. Once users validate the generated FOL rules, they
are converted into executable SPARQL and Datalog rules.</p>
      <p>The MACS-KG user interface provides two primary functionalities: Building MACS-KG:
Users generate semantic rules grounded in MCSK using LLMs or manual rule templates (Fig. 1).
After rule creation, the system allows users to save these rules in a graph database, ensuring
structured storage and enabling eficient retrieval.</p>
      <p>Exploring MACS-KG: Users can query the knowledge graph using SPARQL queries, visualize
the graph structure, and manage stored rules through an interface intended to be intuitive and
accessible for domain experts, though its usability would benefit from further evaluation. (Fig.
2).</p>
      <p>To demonstrate the practical utility of the MACS-KG framework, we applied it to a car
manufacturing scenario (Fig.3). This use case illustrates how MACS-KG integrates knowledge
of manufacturing processes and production equipment, allowing users to validate and refine
generated semantic rules relevant to automotive production workflows. The system enforces
an initial validation step where user-approved rules are required before they are committed to
the knowledge graph. This validation prevents invalid or semantically inconsistent rules from</p>
      <sec id="sec-7-1">
        <title>9https://basic-formal-ontology.org</title>
        <p>10https://spec.industrialontologies.org/iof/
11https://labs.engineering.asu.edu/semantics/ontology-download/msdl-ontology/
entering the database and thereby protects the integrity of downstream SPARQL queries and
reasoning tasks.</p>
        <p>Figure 4 shows an example of validated rules encompassing car manufacturing processes
and production equipment, as managed within the MACS-KG platform. This step ensures that
only domain-relevant, ontologically aligned knowledge is incorporated, thereby enhancing the
reliability of decision support derived from the knowledge graph.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>7. Conclusion</title>
      <p>Ontology development has traditionally required significant manual efort and domain
expertise. This paper proposed a methodology for automating the transformation of commonsense
knowledge (CSK), extracted via Large Language Models (LLMs), into semantically valid rules
aligned with standard ontologies.</p>
      <p>By bridging natural language processing and formal ontology engineering, our approach
enables scalable, explainable, and ontology-aligned rule generation. Adhering to foundational
vocabularies such as BFO, IOF, and MSDL, the method supports semantic interoperability and
logical reasoning in OWL-based environments.</p>
      <p>Future work will focus on expanding domain coverage, improving recall through prompt
optimization, and incorporating human-in-the-loop strategies for validation. While demonstrated
in the manufacturing domain, this approach lays the foundation for integrating LLM-based
commonsense reasoning into broader semantic web applications.</p>
    </sec>
    <sec id="sec-9">
      <title>Declaration on Generative AI</title>
      <p>We acknowledge the use of the OpenAI API for generating experimental content and the GRaph
DB and RDFox rule engine as the underlying graph database for reasoning tasks. Grammarly
and ChatGPT were employed to assist in language refinement. However, the authors take full
responsibility for the scientific content and research ideas presented in this study.</p>
    </sec>
    <sec id="sec-10">
      <title>Disclaimer</title>
      <p>Since September 2024, Mohamed Hedi Karray has joined European Innovation Council and
SMEs Executive Agency. The views expressed in this publication are the responsibility of
the authors and do not necessarily reflect the views of the European Commission nor of the
European Innovation Council and SMEs Executive Agency. The European Commission or the
European Innovation Council and SMEs Executive Agency are not liable for any consequence
stemming from the reuse of this publication.</p>
    </sec>
    <sec id="sec-11">
      <title>Acknowledgements: References</title>
      <p>This work is performed within the CHAIKMAT project funded by the French National Research
Agency (ANR) under grant agreement ” ANR-21-CE10-0004-01”
[1] Naqvi, S. M. R. (2025). Exploring LLMs and semantic XAI for industrial robot capabilities and
manufacturing commonsense knowledge (Doctoral dissertation, Université de Toulouse).
[2] Schank, R.C. and Abelson, R.P., 2013. Scripts, plans, goals, and understanding: An inquiry
into human knowledge structures. Psychology Press.
(Vol. 35, No. 14, pp. 12710-12718).
[19] Zhang, J., Bao, K., Zhang, Y., Wang, W., Feng, F., &amp; He, X. (2023, September). Is chatgpt
fair for recommendation? evaluating fairness in large language model recommendation. In
Proceedings of the 17th ACM Conference on Recommender Systems (pp. 993-999).
[20] Neuhaus, F., 2023. Ontologies in the era of large language models–a perspective. Applied</p>
      <p>
        Ontology, 18(
        <xref ref-type="bibr" rid="ref2">4</xref>
        ), pp.399-407.
[21] Kripke SA. The question of logic. Mind. 2024 Jan 1;133(529):1-36.
[22] Zangwill N. Does knowledge depend on truth? Acta Analytica. 2013 Jun;28(2):139-44.
[23] Dupuy JP. Common knowledge, common sense. Theory and Decision. 1989
Jul;27(1-2):3762.
[24] Berg-Cross G. Commonsense and Explanation. Journal of the Washington Academy of
      </p>
      <p>
        Sciences. 2020 Dec 1;106(
        <xref ref-type="bibr" rid="ref2">4</xref>
        ):39-66.
[25] Lenat DB. From 2001 to 2001: Common Sense and the Mind of HAL. HAL’s Legacy.
      </p>
      <p>2001:193-209.
[26] Dhamani N, Engler M. Introduction to Generative AI. Simon and Schuster; 2024 Feb 27.
[27] Han, Mengjie, et al. "Perspectives of Machine Learning and Natural Language Processing
on Characterizing Positive Energy Districts." Buildings 14.2 (2024): 371.
[28] Barbierato, Enrico, and Alice Gatti. "The Challenges of Machine Learning: A Critical</p>
      <p>
        Review." Electronics 13, no. 2 (2024): 416.
[29] Retzlaf, Carl Orge, Srijita Das, Christabel Wayllace, Payam Mousavi, Mohammad Afshari,
Tianpei Yang, Anna Saranti, Alessa Angerschmid, Matthew E. Taylor, and Andreas Holzinger.
"Human-in-the-Loop Reinforcement Learning: A Survey and Position on Requirements,
Challenges, and Opportunities." Journal of Artificial Intelligence Research 79 (2024): 359-415.
[30] Long, S., Tan, J., Mao, B., Tang, F., Li, Y., Zhao, M., &amp; Kato, N. (2025). A survey on intelligent
network operations and performance optimization based on large language models. IEEE
Communications Surveys &amp; Tutorials.
[31] Kirk, H. R., Vidgen, B., Röttger, P., &amp; Hale, S. A. (2024). The benefits, risks and bounds
of personalizing the alignment of large language models to individuals. Nature Machine
Intelligence, 6(
        <xref ref-type="bibr" rid="ref2">4</xref>
        ), 383-392.
[32] Korinek A. Language models and cognitive automation for economic research. National
      </p>
      <p>Bureau of Economic Research; 2023 Feb 13.
[33] Babaei Giglou H, D’Souza J, Auer S. LLMs4OL: Large language models for ontology learning.</p>
      <p>In International Semantic Web Conference 2023 Oct 27 (pp. 408-427). Cham: Springer Nature
Switzerland.
[34] Pan JZ, Razniewski S, Kalo JC, Singhania S, Chen J, Dietze S, Jabeen H, Omeliyanenko J,
Zhang W, Lissandrini M, Biswas R. Large language models and knowledge graphs:
Opportunities and challenges. arXiv preprint arXiv:2308.06374. 2023 Aug 11.
[35] Koubaa A, Boulila W, Ghouti L, Alzahem A, Latif S. Exploring ChatGPT Capabilities and</p>
      <p>Limitations: A Survey. IEEE Access. 2023 Oct 23.
[36] Babaei Giglou, H., D’Souza, J., &amp; Auer, S. (2023, October). LLMs4OL: Large language
models for ontology learning. In International Semantic Web Conference (pp. 408-427).</p>
      <p>Cham: Springer Nature Switzerland.
[37] Burtsev M, Reeves M, Job A. The Working Limitations of Large Language Models. MIT</p>
      <p>Sloan Management Review. 2024;65(2):8-10.
[38] Molinari, A., &amp; Sandri, S. (2024, November). Evolution of lms design and implementation
in the age of ai and large language models. In Proceedings of the Second International
Workshop on Artificial INtelligent Systems in Education co-located with 23rd International
Conference of the Italian Association for Artificial Intelligence (AIxIA 2024), Bolzano, Italy.
[39] Li Y, Huang Y, Lin Y, Wu S, Wan Y, Sun L. I Think, Therefore I am: Awareness in Large
Language Models. arXiv preprint arXiv:2401.17882. 2024 Jan 31. Zangwill N. Does knowledge
depend on truth?. Acta Analytica. 2013 Jun;28(2):139-44.
[40] Jansen L. Categories: The top-level ontology. Applied ontology: An introduction. 2008</p>
      <p>
        Jan:173-96.
[41] Schmidt, Douglas C., Jesse Spencer-Smith, Quchen Fu, and Jules White. "Towards a catalog
of prompt patterns to enhance the discipline of prompt engineering." ACM SIGAda Ada
Letters 43, no. 2 (2024): 43-51.
[42] Li, J., Tang, T., Zhao, W. X., Nie, J. Y., &amp; Wen, J. R. (2024). Pre-trained language models for
text generation: A survey. ACM Computing Surveys, 56(
        <xref ref-type="bibr" rid="ref7">9</xref>
        ), 1-39.
[43] Min, B., Ross, H., Sulem, E., Veyseh, A.P.B., Nguyen, T.H., Sainz, O., Agirre, E., Heintz, I.
and Roth, D., 2023. Recent advances in natural language processing via large pre-trained
language models: A survey. ACM Computing Surveys, 56(2), pp.1-40.
[44] Ma, K., Grandi, D., McComb, C., &amp; Goucher-Lambert, K. (2023, August). Conceptual design
generation using large language models. In International Design Engineering Technical
Conferences and Computers and Information in Engineering Conference (Vol. 87349, p.
      </p>
      <p>V006T06A021). American Society of Mechanical Engineers.
[45] Pan S, Luo L, Wang Y, Chen C, Wang J, Wu X. Unifying large language models and
knowledge graphs: A roadmap. IEEE Transactions on Knowledge and Data Engineering.
2024 Jan 10.
[46] Shafee, S., Bessani, A., &amp; Ferreira, P. M. (2025). Evaluation of LLM-based chatbots for</p>
      <p>
        OSINT-based Cyber Threat Awareness. Expert Systems with Applications, 261, 125509.
[47] Acharya, K., Velasquez, A., &amp; Song, H. H. (2024). A survey on symbolic knowledge
distillation of large language models. IEEE Transactions on Artificial Intelligence.
[48] Pan, S., Luo, L., Wang, Y., Chen, C., Wang, J., &amp; Wu, X. (2024). Unifying large language
models and knowledge graphs: A roadmap. IEEE Transactions on Knowledge and Data
Engineering, 36(
        <xref ref-type="bibr" rid="ref5">7</xref>
        ), 3580-3599.
[49] Xi, Z., Chen, W., Guo, X., He, W., Ding, Y., Hong, B., ... &amp; Gui, T. (2025). The rise and
potential of large language model based agents: A survey. Science China Information
Sciences, 68(2), 121101.
[50] Saeedizade, M. J., &amp; Blomqvist, E. (2024, May). Navigating ontology development with large
language models. In European Semantic Web Conference (pp. 143-161). Cham: Springer
Nature Switzerland.
[51] Andročec, D. (2025). Using Large Language Models for Ontology Development. Engineering
      </p>
      <p>
        Proceedings, 104(1), 9.
[52] Joachimiak, M. P., Miller, M. A., Caufield, J. H., Ly, R., Harris, N. L., Tritt, A., ... &amp; Bouchard,
K. E. (2024). The Artificial Intelligence Ontology: LLM-assisted construction of AI concept
hierarchies. Applied Ontology, 19(
        <xref ref-type="bibr" rid="ref2">4</xref>
        ), 408-418.
[53] García-Fernández, J., Verhoosel, J., Ubacht, J., &amp; Bakker, R. M. (2025). Ontology Engineering
with Large Language Models: Unveiling the potential of human-LLM collaboration in the
ontology extension process. extraction, 7, 15.
[54] Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E., Bang, Y.J., Madotto, A. and Fung, P.,
2023. Survey of hallucination in natural language generation. ACM Computing Surveys,
55(
        <xref ref-type="bibr" rid="ref10">12</xref>
        ), pp.1-38.
[55] Zhang, H., Song, H., Li, S., Zhou, M. and Song, D., 2023. A survey of controllable text
generation using transformer-based pre-trained language models. ACM Computing Surveys,
56(
        <xref ref-type="bibr" rid="ref1">3</xref>
        ), pp.1-37.
[56] Li, J., Garijo, D., &amp; Poveda-Villalón, M. (2025). Large Language Models for Ontology
      </p>
      <p>Engineering: A Systematic Literature Review.
[57] Mai, Huu Tan, Cuong Xuan Chu, and Heiko Paulheim. "Do LLMs really adapt to domains?
an ontology learning perspective." International Semantic Web Conference. Cham: Springer
Nature Switzerland, 2024.
[58] Naqvi, M. R., Sarkar, A., Ameri, F., Elmhadhbi, L., &amp; Karray, M. H. (2025, June). MACS-KG:
MAnufacturing CommonSense Knowledge Graph. In European Semantic Web Conference
(pp. 120-124). Cham: Springer Nature Switzerland.
[59] Naqvi, M. R., Sarkar, A., Ameri, F., Elmhadhbi, L., &amp; Karray, M. H. (2024, December).</p>
      <p>Manufacturing Commonsense Knowledge. In International Knowledge Graph and Semantic
Web Conference (pp. 320-333). Cham: Springer Nature Switzerland.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Carey</surname>
            , Susan, and
            <given-names>Elizabeth</given-names>
          </string-name>
          <string-name>
            <surname>Spelke</surname>
          </string-name>
          .
          <article-title>"Domain-specific knowledge and conceptual change." Mapping the mind: Domain specificity in cognition and culture 169 (</article-title>
          <year>1994</year>
          ):
          <fpage>200</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Fensel</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          (
          <year>2001</year>
          ).
          <article-title>Ontologies. In Ontologies: A silver bullet for knowledge management and electronic commerce</article-title>
          (pp.
          <fpage>11</fpage>
          -
          <lpage>18</lpage>
          ). Berlin, Heidelberg: Springer Berlin Heidelberg.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Zang</surname>
            ,
            <given-names>L.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cao</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cao</surname>
            ,
            <given-names>Y.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>Y.M.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Cao</surname>
            ,
            <given-names>C.G.</given-names>
          </string-name>
          ,
          <year>2013</year>
          .
          <article-title>A survey of commonsense knowledge acquisition</article-title>
          .
          <source>Journal of Computer Science and Technology</source>
          ,
          <volume>28</volume>
          (
          <issue>4</issue>
          ), pp.
          <fpage>689</fpage>
          -
          <lpage>719</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Naqvi</surname>
            ,
            <given-names>M. R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sarkar</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ameri</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Araghi</surname>
            ,
            <given-names>S. N.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Karray</surname>
            ,
            <given-names>M. H.</given-names>
          </string-name>
          (
          <year>2023</year>
          ).
          <article-title>Application of MSDL in Modeling Capabilities of Robots</article-title>
          .
          <source>In CEUR Workshop Proceedings</source>
          (Vol.
          <volume>3595</volume>
          ).
          <article-title>CEUR-WS.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Naqvi</surname>
          </string-name>
          , Syed Muhammad Raza.
          <article-title>Exploration des LLM et de l'XAI sémantique pour les capacités des robots industriels et les connaissances communes en matière de fabrication</article-title>
          .
          <source>Diss</source>
          . Université de Toulouse (
          <year>2023</year>
          -....),
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hui</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qu</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , ... &amp;
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          (
          <year>2023</year>
          ).
          <article-title>Can llm already serve as a database interface? a big bench for large-scale database grounded text-to-sqls</article-title>
          .
          <source>Advances in Neural Information Processing Systems</source>
          ,
          <volume>36</volume>
          ,
          <fpage>42330</fpage>
          -
          <lpage>42357</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>T. P.</given-names>
          </string-name>
          (
          <year>2024</year>
          ).
          <article-title>Large-Scale Acquisition of Refined Commonsense Knowledge</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Balakrishna</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Moldovan</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <year>2013</year>
          , May.
          <article-title>Automatic building of semantically rich domain models from unstructured data</article-title>
          .
          <source>In The Twenty-Sixth International FLAIRS Conference.</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Tekli</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>An overview on xml semantic disambiguation from unstructured text to semi-structured data: Background, applications, and ongoing challenges</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          ,
          <volume>28</volume>
          (
          <issue>6</issue>
          ),
          <fpage>1383</fpage>
          -
          <lpage>1407</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Graham</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yates</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          and
          <string-name>
            <surname>El-Roby</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <year>2023</year>
          .
          <article-title>Investigating antiquities traficking with generative pre-trained transformer (GPT)-3 enabled knowledge graphs: A case study</article-title>
          .
          <source>Open Research Europe</source>
          ,
          <volume>3</volume>
          , p.
          <fpage>100</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Yenduri</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramalingam</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Chemmalar</given-names>
            <surname>Selvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Supriya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Srivastava</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Maddikunta</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.K.R.</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Deepti</given-names>
            <surname>Raj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Jhaveri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.H.</given-names>
            ,
            <surname>Prabadevi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            and
            <surname>Athanasios</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            ,
            <surname>GPT (Generative Pre-trained Transformer</surname>
          </string-name>
          )
          <article-title>-A Comprehensive Review on Enabling Technologies, Potential Applications</article-title>
          , Emerging Challenges, and Future Directions.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Yan</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sha</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martinez-Maldonado</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jin</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Gašević</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <year>2024</year>
          .
          <article-title>Practical and ethical challenges of large language models in education: A systematic scoping review</article-title>
          .
          <source>British Journal of Educational Technology</source>
          ,
          <volume>55</volume>
          (
          <issue>1</issue>
          ), pp.
          <fpage>90</fpage>
          -
          <lpage>112</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Kasneci</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seßler</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Küchemann</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bannert</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dementieva</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fischer</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gasser</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Groh</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Günnemann</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hüllermeier</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Krusche</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <year>2023</year>
          .
          <article-title>ChatGPT for good? On opportunities and challenges of large language models for education</article-title>
          .
          <source>Learning and individual diferences</source>
          ,
          <volume>103</volume>
          , p.
          <fpage>102274</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>J.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Staats</surname>
            ,
            <given-names>C.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Szegedy</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weinberger</surname>
            ,
            <given-names>K.Q.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <year>2023</year>
          , October. Don't Trust:
          <article-title>Verify-Grounding LLM Quantitative Reasoning with Autoformalization</article-title>
          .
          <source>In The Twelfth International Conference on Learning Representations.</source>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Xie</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jia</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ye</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bibi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Torr</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghanem</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <year>2024</year>
          .
          <article-title>Can Large Language Model Agents Simulate Human Trust Behaviors?</article-title>
          .
          <source>arXiv preprint arXiv:2402</source>
          .
          <fpage>04559</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Choudhury</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Deshpande</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <year>2021</year>
          , May. How
          <string-name>
            <surname>Linguistically Fair Are Multilingual PreTrained Language Models</surname>
          </string-name>
          <article-title>?</article-title>
          .
          <source>In Proceedings of the AAAI conference on artificial intelligence</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>