<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Ontologies, Arguments, and Large-Language Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>John Beverley</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesco Franda</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mohamed Hedi Karray</string-name>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dan Maxwell</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carter Benson</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Barry Smith</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute for Artificial Intelligence and Data Science, University at Bufalo</institution>
          ,
          <addr-line>NY</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>KadSci, LLC</institution>
          ,
          <addr-line>VA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>National Center for Ontological Research</institution>
          ,
          <addr-line>NY</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>SUNY University at Bufalo</institution>
          ,
          <addr-line>NY</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>University of Neuchâtel</institution>
          ,
          <country country="CH">Switzerland</country>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>University of Technology of Tarbes UTTOP</institution>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The explosion of interest in large language models (LLMs) has been accompanied by concerns over the extent to which generated outputs can be trusted, owing to the prevalence of bias, hallucinations, and so forth. Accordingly, there is a growing interest in the use of ontologies and knowledge graphs to make LLMs more trustworthy. This rests on the long history of ontologies and knowledge graphs in constructing human-comprehensible justification for model outputs as well as traceability concerning the impact of evidence on other evidence. Understanding the nature of arguments and argumentation is critical to each, especially when LLM output conflicts with what is expected by users. The central contribution of this article is to extend the Arguments Ontology (ARGO) an ontology specific to the domain of argumentation and evidence broadly construed - into the space of LLM fact-checking in the interest of promoting justification and traceability research through the use of ARGO-based 'blueprints'.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Ontology</kwd>
        <kwd>Arguments</kwd>
        <kwd>Semantic Reasoning</kwd>
        <kwd>Large Language Models</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The explosion of interest in large language models (LLMs) has been accompanied by concerns over the
extent to which generated outputs can be trusted, owing largely to the prevalence of hallucinations [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
and bias [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Each may be partially addressed by intervention at various points of LLM development,
such as by pre-training on vetted data [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] or human-in-the-loop reinforcement [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. A natural strategy for
addressing hallucinations post-inferencing involves fact-checking, the process of evaluating whether
claims asserted to be true, are indeed true [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Traditionally, this process involves identifying asserted
claims, relevant evidence or counter-evidence, and delivering a verdict [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Work exploring fact-checking
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] with respect to LLMs has focused on the outputs of domain-specific prompts, for example in domains
such as climate change [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], disease [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], or Twitter [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], but recent work has sought to expand the scope
of this approach [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. In naive models, claim and evidence pairs are tagged to indicate whether the
evidence supports or undermines the claim. More sophisticated models attempt to identify missing
evidence for or against claims [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] or to correct claims based on existing evidence [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>
        Regardless of the strategies adopted, there is clear reliance on the relationship between claims
and evidence, a relationship often characterized in terms of arguments. In this respect, fact-checking
research for LLMs dovetails with traditional applications of ontologies - representational artifacts whose
representations are intended to designate some combination of classes and certain relationships among
them. Ontologies have proven useful for providing both explicit machine-understandable schemata for
machine-machine interoperability, and explicit human-understandable schemata [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ][
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] for
humanmachine interoperability across otherwise disparate vocabularies. Accordingly, ontologies have been
recognized as crucial for enhancing LLM accuracy [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ][
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>Ontology research intersects well with LLM fact-checking research by providing an avenue for
extracting implicit justification – human comprehensible arguments in favor of or against model outputs
- and facilitating traceability – human comprehensible arguments concerning impacts of evidence on
other evidence. Little research has been conducted at this intersection. In what follows, we aim to
remedy this gap by leveraging an ontology of arguments to highlight how ontologies may support
justification and traceability for fact-checking strategies.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Adequacy Constraints for Argument Ontologies</title>
      <sec id="sec-2-1">
        <title>2.1. Hallmarks of Arguments</title>
        <p>Claims are often asserted in the form of sentences, roughly, the smallest complete syntactic patterns
of characters that can be used to convey meaning. For example, “Snow is white” expresses that snow
is white. Sentences are not, however, identical to the contents expressed by them: “Schnee ist weiß”
expresses the same content as “Snow is white”. Sentences and the content they express may also be
used in diferent ways. The same content might be used to assert that now is white or to ask whether
snow is white, the former being an assertion and the latter a question. Additionally, sentences and
sentence contents may have diferent bearers. The sentence you are reading now could be produced on
a piece of paper, a billboard, or another computer monitor.</p>
        <p>Arguments themselves can be described as collections of sentence contents used to support another
sentence content. To illustrate:</p>
        <sec id="sec-2-1-1">
          <title>1. If Susan leaves work early, she will go home and then to the gym.</title>
          <p>2. SUPPOSE Susan leaves work early.
3. Hence, Susan will go home and then to the gym.
4. Hence, Susan will go home.</p>
          <p>5. Hence, if Susan leaves work early then Susan will go home.</p>
          <p>Lines 1 and 2 are intended to support 3, which supports 4; lines 2 and 4 support 5. Line 1 is used to assert
that the corresponding content is true. Line 2 reflects a supposition that the corresponding content is
true in the interest of drawing out consequences. Line 3 follows from 1 and 2; 4 from 5; and 5 from 2
and 4. Arguments, as we see, may be complex, with claims depending on other claims in support of an
overall conclusion.</p>
          <p>This example also highlights how sentence contents occupy distinct roles in the context of arguments.
Lines 1 and 2 are premises of the argument, lines 3 and 4 are sub-conclusions, and line 5 is the main
conclusion. Distinguishing roles that sentence contents may play is important for classifying argument
types. For example:</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>1. Ghosts exist.</title>
          <p>2. Hence, ghosts exist.</p>
        </sec>
        <sec id="sec-2-1-3">
          <title>1. Hold the door if you want to keep your job!</title>
          <p>2. You want to keep your job.</p>
          <p>
            3. Hence, hold the door!
This is a textbook example of a question-begging argument in which the same sentence content plays
distinct argument roles. The preceding examples involve sentence contents that may be true or false,
but arguments in general need not be so restricted [
            <xref ref-type="bibr" rid="ref16">16</xref>
            ]:
Lines 1 and 3 are expressed by commands, the content of which is not typically taken to be true or false
[
            <xref ref-type="bibr" rid="ref17">17</xref>
            ][
            <xref ref-type="bibr" rid="ref18">18</xref>
            ]. A general ontology of arguments should thus allow for representing arguments containing
contents that are neither true nor false.
          </p>
          <p>These brief remarks highlight hallmarks of arguments:
* The tripartite relationship among sentences, sentence contents, and bearers
* The supportive roles sentence contents may play in the context of an argument
* Diferent ways sentence contents may be used as parts of arguments
* The potential complexity of arguments</p>
        </sec>
        <sec id="sec-2-1-4">
          <title>Any robust ontology characterization of arguments must respect these hallmarks.</title>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Ontology Best Practices</title>
        <p>
          A well-designed ontology should satisfy certain adequacy constraints that reflect the main goal of
ontology development, namely, promoting interoperability of heterogeneous data and information systems
[
          <xref ref-type="bibr" rid="ref19">19</xref>
          ][20]. Standard adequacy constraints require that ontologies be accurate, adaptable, consistent, and
able to provide clear annotations to salient data [21][22]. With respect to accuracy, an ontology should
accurately represent entities and relationships within its stated scope. For our purposes, this amounts to
respecting the preceding hallmarks of arguments. With respect to adaptability, an ontology should be
designed for reuse by other ontology developers and users. With respect to clarity, terms in ontologies
should be given unambiguous, clear, labels, synonyms, both formal and natural language definitions,
and definition sources, to promote understanding across a variety of potential stakeholders. Ontologies
should in addition be logically consistent, both internally and with respect to sister ontologies, as
demonstrated, for example, by Web Ontology Language (OWL)-based reasoners [23][24].
        </p>
        <p>These best practices are borne out of years of ontology development by a wide variety of users,
and indeed are codified as principles in large ontology development communities, such as the Open
Biological and Biomedical (OBO) Foundry [25] and the Industrial Ontologies Foundry (IOF) [26]. To
encourage best practices, further principles for ontology design have emerged from these communities,
such as having ontologies within their respective communities extend from a single top-level ontology:
Basic Formal Ontology (BFO) [27][28]. BFO - an ISO/IEC 21838:2 top-level ontology standard [29]
contains high-level general terms such as object and process. Extending from a common top-level
ontology promotes interoperability by ensuring that no matter how far ontologies extend into specific
domains, relating for example to electrons, tables, whales, they will nevertheless share a common
top-level language and logical framework for definitions. Relevant for our purposes here: extending
from BFO promotes explainability and traceability, as all extended ontology terms will be accompanied
by definitions [ 30] following the scheme: A is a B that Cs, where “A" is a subclass of “B" and diferentiated
from other subclasses of B by virtue of satisfying “C" [20]. For example, an agent is a material entity
capable of performing a planned act. Altogether, these observations suggest the following constraints
should be respected for an ontology of arguments, the ontology should:
* Extend from a top-level ontology, such as BFO. [Adaptability]
* Contain clear labels, annotations, and in particular, definitions following the “A is a B that Cs"
scheme. [Clarity]
* Be represented in OWL2 and verified for logical consistency using associated reasoners.</p>
        <p>[Consistency]
* Distinguish sentences, sentence contents and their uses, the roles of sentence contents in
arguments, bearers of sentences and contents, and simple from complex arguments. [Accuracy]
In the next section, we examine existing ontologies of arguments or evidence, noting where they fall
short of one or more adequacy constraints.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Literature Review</title>
        <p>Of the ontologies dealing with argumentation and evidence we reviewed, none adequately reflect
the hallmarks of arguments and none satisfy all of the preceding criteria for ontology development.
A common issue is the conflation of arguments and evidence. For example, the Legal Knowledge
Interchange Format (LKIF) Core Ontology [31] views arguments ‘as reasons expressed through a
medium’, thereby overlooking how sentence contents may serve as parts of complex arguments to
support other contents in multiple ways, in addition to serving as ‘reasons’. The Argument Interchange
Format (AIF) does not make such a conflation, [ 32] but instead conflates sentence contents with what
the contents are about [33]. The Argument Model Ontology (AMO) distinguishes claims, evidence,
warrant, rebuttals, and so on, but does not distinguish the contents of sentences composing arguments
from the roles played by such contents within an argument [34][35]. The Semantic Science Integrated
Ontology (SIO) provides a treatment of arguments, validity, soundness, and so forth, but defines the
contents of sentences as “expressing something true or false”, ignoring the fact that arguments may
involve sentence contents that are neither [36]. Most of the preceding do not adopt a top-level ontology,
though there are argument ontologies that do, such as the OBO Foundry Evidence Ontology (EO) [37].
Though EO imports BFO as a top-level, it creates a sibling hierarchy alongside the main hierarchy of
BFO, despite the fact that terms such as evidence are defined to fall under BFO’s root class entity.
Additionally, the scope of EO is restricted to the biological domain. Related, while the Explanation
Ontology (EXO) adopts a top-level ontology, namely SIO, it inherits the serious issues exhibited by that
import [38]. The lack of any existing ontology that respects our adequacy constraints motivated the
development of the Arguments Ontology (ARGO), which we discuss in what follows.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. ARGO</title>
      <p>ARGO is a small ontology designed to satisfy the above adequacy constraints. ARGO extends from BFO,
leveraging resources from other BFO-conformant ontologies such as the Information Artifact Ontology
(IAO) [33], an extension of BFO designed to represent information and information bearers.</p>
      <p>ARGO not only distinguishes sentences and sentence content, it also provides a more general class
under which instances of the former will fall. The ARGO class expression consists of patterns of
character shapes in a language, such as the string of characters comprising this clause. Sentence is
a subclass of expression, instances of which must be usable on their own to express content. For
example, “Happy” is not a sentence, while “Sam is happy” is. Both are distinct from the class sentence
content. The sentence “Susan is happy” expresses the sentence content that Susan is happy, which
is plausibly what the sentence is about.</p>
      <p>Sentence contents are a subclass of the IAO class information content entity, roughly, copyable
patterns that are about things, such as the content of a book or the information encoded in a docx file
on a hard drive. Information content entities allow us to distinguish sentence contents from bearers,
as the same information content entity may be copied across multiple bearers. For example, the
content of a given PDF may exist across distinct laptops. Similarly, a sentence content may have
identical instances across bearers. Two observant friends of Susan, for instance, may both believe Susan
is happy, each expressing this by uttering “Susan is happy”.</p>
      <p>A given sentence content may serve as a conclusion in one argument and premise in another.
Premises are linked to conclusions insofar as they are ofered as support for conclusions. This link
between premises and a conclusion may be plausibly understood as an action — a passing from some
collection of sentence contents to another sentence content because one believes the latter is
justified, supported, or entailed by the former. We reflect this link between premises and conclusion
with a class act of inferring. A premise is a sentence content that stands in a particular relation to
an argument as a result of being the input of an act of inferring; a conclusion is a sentence content
that stands in a particular relation to an argument as a result of being the output of an act of inferring.
The relations ‘has input’ and ‘has output’ are reused from the Common Core Ontologies [39].</p>
      <p>Premises are often afirmed in arguments; suppositions are always accepted. To suppose that
a sentence content is true, is often done for the sake of some further inferential goal, for example
hypothetical deliberation, indirect reasoning, reductio ad absurdam, and so on. We capture these
distinctions in terms of difering acts, namely, an act of afirming in which an agent believes a
sentence content is true based on evidence, and an act of accepting in which an agent entertains a
sentence content as true independent of evidence. Outputs in each case can, moreover, be inputs to
acts of inferring. To illustrate:</p>
      <sec id="sec-3-1">
        <title>1. If Susan leaves work early, she will go home and then to the gym.</title>
        <p>2. SUPPOSE Susan leaves work early.
3. Hence, Susan will go home and then to the gym.
4. Hence, Susan will go home.</p>
        <p>5. Hence, if Susan leaves work early then Susan will go home.</p>
        <p>Here, 1 is afirmed while 2 is accepted. Lines 3 and 4 are sub-conclusions and line 5 is the main
conclusion; line 3 is inferred based on a combination of an afirmed line and an accepted line, which
suggests it is itself accepted. Accordingly, 4 is accepted as it is based on the accepted 3, while 5 is best
described in terms of an agent afirming the connection between what is supposed and a consequence
of it.</p>
        <p>Premises may also be prescribed in arguments, through an act of prescribing, as illustrated by line
2 in the following:</p>
      </sec>
      <sec id="sec-3-2">
        <title>1. Attack now only if the weather is fine! 2. Attack now! 3. Hence, the weather is fine.</title>
        <p>Putting aside whether this is a persuasive argument, line 2 is plausibly a premise in the overall argument,
but certainly not one that is believed to be true. In conformance with our adequacy constraints, we
maintain that prescriptive sentence contents may be the inputs or outputs of acts of inferring, and
so operate as premises or conclusions in arguments.</p>
        <p>Interestingly, suppositions are not plausibly understood as being prescribed. To suppose one “Attack
now!" is to entertain a sentence content of the sort ‘whomever is directed to attack now, attacks now’
as true. This stems from supposition always being made for the sake of some further inferential goal.
We thus say that suppositions are always accepted. In the interests of space, we focus on acts of
afirming and acts of accepting in the remainder.</p>
        <p>We are now able to characterize arguments as ordered collections of sentence contents involving
premises, suppositions, and a single conclusion. There are, of course, sub-conclusions in arguments,
and they provide the means by which to describe complex arguments. A sub-conclusion is a sentence
content that is: (1) An afirmed, prescribed or accepted input in an act of inferring in an argument;
(2) An afirmed, prescribed, or accepted output in an act of inferring in an argument distinct from the
argument of (1); (3) Both arguments of (1) and (2) are parts of the argument to which the sentence
content stands in the ‘subconclusion in’ relation. Arguments that involve sub-conclusions are
complex arguments.</p>
        <p>There are many diferent purposes one might have in constructing an argument. The paradigm
case involves arguing, where an individual provides an argument with the intent of convincing others
that the conclusion of the argument is true. We characterize this process as an act of arguing. One
can argue successfully or unsuccessfully, but one cannot argue without intending to convince one’s
audience of some conclusion; even when arguing against a given conclusion, you are still arguing in
favor of some conclusion. Of course, one may have no intention to convince others of some conclusion;
one may be creating an argument for the purpose of interpretation or to anticipate what an opponent
might say during a debate. In such cases, one is not arguing; rather, one is merely creating arguments,
which we characterize as a process of act of argument creation. An act of arguing may have an act
of argument creation as process part, if in the process of arguing one creates an argument. Creating
an argument involves a series of steps, at least one of which is an act of inferring. In Figure 1 we
have a complex argument that is created by an act of argument creation which has two proper act
of argument creation parts, each of which has an act of inferring part, respectively. The sentence
contents exhibited by lines 1 and 2 exhibit inputs to an act of inferring, just as lines 4 and 5 are
inputs to another such act. Each has as its output a corresponding sentence content that is a part
of some argument, which in turn is a proper part of the overall complex argument. Importantly,
3 and 5 exhibit the same sentence content used diferently across two simple arguments that make
up a complex argument. These observations reflect our position that any complex argument is an
argument with at least one argument as proper part.</p>
        <p>In anticipation of applying our results to fact-checking strategies, we now turn to relationships
between sentence contents in distinct arguments. For example, argument A may be a
counterargument to argument B if B has a conclusion that contradicts the conclusion of argument A. It is
however more often the case that counterarguments undermine, but do not contradict, some part of
another argument. For example, argument C may have some counterargument D if the conclusion of D
undermines the justification of one or more premises of C. To capture such generality, ARGO adopts
an ‘opposes’ relation, which holds between sentence contents across arguments. Sub-relations of
opposes include: negates, contradicts, and undermines. Similarly, a ‘supports’ relation is introduced to
illustrate potentially favorable evidence. These are, of course, coarse-grained and a fuller treatment will
introduce degrees of support and opposition. Even so, the resources described here sufice for applying
ARGO to fact-checking.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Explainable, Traceable, LLMs</title>
      <p>Having illustrated design features of ARGO and demonstrated how this ontology satisfies our adequacy
constraints, we return to the topic of research at the intersection of ontologies, LLMs, and fact-checking.
As discussed, fact-checking typically involves tagging pairs of claims and evidence, noting when the
latter supports or undermines the former. Claims and evidence are, however, best understood in their
appropriate context, i.e. with respect to relevant supporting or opposing arguments. While situating
claims and evidence in their appropriate argument context can be, admittedly, a monumental task,
ARGO can be leveraged to provide structure to such chaos.</p>
      <p>Suppose we have the following claim-evidence pair: &lt;c,e&gt; where c is assumed to be part of argument
X and e part of argument Y. This results in four pairs - &lt;c,e&gt;, &lt;c,Y &gt;, &lt;X,e&gt;, &lt;X,Y &gt;, some of which are
connected as parts, some of which are connected by ‘opposes’, ‘negates’, ‘supports’, and so on. Figure 2
illustrates, fixing on ‘opposes’. The pair &lt; c,e&gt; is part of &lt;c,Y &gt; since e is part of Y. Similarly for &lt;c,e&gt; and
&lt;X,e&gt;. Pairs &lt;c,Y &gt; and &lt;X,e&gt; share overlapping parts, namely, c and e, while both are part of &lt;X,Y &gt;.
If we assume &lt;c,e&gt; is tagged with ‘opposes’, we have a method for filtering relevant from irrelevant
arguments. Note that if e opposes c, then Y should oppose c. Indeed, if Y does not oppose c, then at
a minimum Y does not support e opposing c. For example, suppose “Tom is at the grocery store" is
opposed by “Tom is at home". Suppose the latter is part of the argument:</p>
      <sec id="sec-4-1">
        <title>1. If Tom lives in the grocery store then he is at home.</title>
        <p>2. Tom lives in the grocery store.</p>
        <p>3. Hence, Tom is at home.</p>
        <p>This argument does not oppose “Tom is at the grocery store" because line 2 of the argument does
not support “Tom is at home" opposing “Tom is at the grocery store".1 This is, in broad strokes, the
ARGO-Blueprint strategy which facilitates expanding the scope of fact-checking beyond tagging claim
and evidence pairs to include arguments relevant to a given tagging.</p>
        <p>
          Because ARGO is represented in the Web Ontology Language (OWL), our ontological representations
are machine-readable and supported by automated reasoners such as Hermit [23], targeted information
extraction using SPARQL, and schema validation using SHACL. ARGO can support quality control
checks for fact-checking in the presence of such tools. For example, if it is asserted that e opposes c,
then a well-designed ontology aimed at filtering relevant arguments should support the inferences: Y
opposes c, e opposes X, and Y opposes X. Such a filtering strategy seems plausible working top-down
1We leave for another time when Y opposes c but does not support e, a topic that is relevant to research on expanding evidence
[
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] and claim correction [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
as well. If Y opposes X, then some part of X is opposed by Y - namely, c - and some part of Y opposes X
- namely, e. Hence, e opposes c. What goes for ‘opposes’ should hold - modulo where appropriate - for
‘negates’, ‘supports’, and so on.
        </p>
        <p>As illustrated, ARGO-blueprints can be used to promote the identification and extraction of
justifications both for and against claims, and to provide the sort of traceability that is needed to evaluate
the impact evidence has on other evidence. In this respect, we envision that, as we work to improve
the trustworthiness of LLMs, ARGO-Blueprints will provide a path towards more encompassing, more
complete, fact-checking results.
[20] R. Arp, A. Spear, B. Smith, Building Ontologies with Basic Formal Ontology, MIT Press, 2015.
[21] D. Vrandecic, Ontology evaluation, in: Handbook on Ontologies, Springer, 200, pp. 293–313.
[22] A. Hogan, et al., An empirical survey of linked data conformance, Journal of Web Semantics 14
(2012) 14–44.
[23] HermiT, Hermit: A highly-eficient owl reasoner, 2023. URL: http://www.hermit-reasoner.com/.
[24] Pellet, Pellet: An owl 2 reasoner, 2023. URL: https://github.com/stardog-union/pellet.
[25] B. Smith, et al., The obo foundry: coordinated evolution of ontologies to support biomedical data
integration, Nat Biotechnol 25 (2007) 1251–1255. doi:10.1038/nbt1346.
[26] B. Kulvatunyou, et al., The industrial ontologies foundry project, 2018.
[27] N. Otte, J. Beverley, A. Ruttenberg, Bfo: Basic formal ontology, Applied Ontology (2022) 17–43.
[28] J. Tchouanguem, et al., Bfo-based ontology enhancement to promote interoperability in bim,</p>
        <p>Applied Ontology 16 (2021) 453–479.
[29] Information technology — top-level ontologies (tlo) — part 2: Basic formal ontology (bfo), 2021.
[30] S. Seppälä, et al., Definitions in ontologies, Cahiers de Lexicologie 109 (2016) 175–207.
[31] J. Breuker, R. Hoekstra, A. e. Boer, OWL Ontology of Basic Legal Concepts
(LKIFCore), Estrella, University of Amsterdam, 2006. URL: www.estrellaproject.org/doc/D1.
4-OWL-Ontology-of-Basic-Legal-Concepts.pdf.
[32] I. Rahwan, B. Banihashemi, Arguments in owl: A progress report, in: Computational Models of</p>
        <p>Argument: Proceedings of COMMA 2008, 2008.
[33] W. Ceusters, B. Smith, Aboutness: Towards foundations for the information artifact ontology, in:
Proceedings of the Sixth International Conference on Biomedical Ontology (ICBO), CEUR Vol.
1515, 2015, pp. 1–5.
[34] S. Peroni, F. Vitali, The arguments model ontology (amo), 2011. Published April 5, 2011.
[35] S. Toulmin, The Uses of Argument, 2nd ed., Cambridge University Press, 2003.
[36] M. Dumontier, et al., The semantiscience integrated ontology (sio) for biomedical knowledge
discovery, Journal of Biomedical Semantics 5 (2014).
[37] M. Chibucos, et al., The evidence and conclusion ontology (eco): Supporting go annotations,</p>
        <p>Methods in Molecular Biology 1446 (2017) 245–259.
[38] S. Chari, et al., Explanation ontology: A general-purpose, semantic representation for supporting
user-centered explanations, 2020.
[39] M. Jensen, et al., The common core ontologies, arXiv 2404.17758 [cs.AI] (2024). URL: https:
//doi.org/10.48550/arXiv.2404.17758.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Tang</surname>
          </string-name>
          , et al.,
          <article-title>Evaluating large language models on medical evidence summarization</article-title>
          ,
          <source>npj Digital Medicine</source>
          <volume>6</volume>
          (
          <year>2023</year>
          )
          <fpage>158</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Koo</surname>
          </string-name>
          , et al.,
          <article-title>Benchmarking cognitive biases in large language models as evaluators</article-title>
          ,
          <source>arXiv abs/2309</source>
          .17012 (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , et al.,
          <article-title>Siren's song in the ai ocean: A survey on hallucination in large language models</article-title>
          ,
          <source>CoRR abs/2309</source>
          .01219 (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          , et al.,
          <article-title>Factcheck-gpt: End-to-end fine-grained document-level fact-checking and correction of llm output, arXiv preprint (</article-title>
          <year>2023</year>
          ). arXiv:arXiv:
          <fpage>2311</fpage>
          .
          <fpage>09000</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Guo</surname>
          </string-name>
          , et al.,
          <source>A survey on automated fact-checking, Transactions of the Association for Computational Linguistics</source>
          <volume>10</volume>
          (
          <year>2022</year>
          )
          <fpage>178</fpage>
          -
          <lpage>206</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Wright</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Augenstein</surname>
          </string-name>
          ,
          <article-title>Claim check-worthiness detection as positive unlabelled learning</article-title>
          ,
          <source>in: Findings of the Association for Computational Linguistics: EMNLP</source>
          <year>2020</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>476</fpage>
          -
          <lpage>488</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Gao</surname>
          </string-name>
          , et al.,
          <article-title>Human-like summarization evaluation with chatgpt</article-title>
          ,
          <source>ArXiv</source>
          (
          <year>2023</year>
          ). arXiv:
          <volume>2304</volume>
          .
          <fpage>02554</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>T.</given-names>
            <surname>Diggelmann</surname>
          </string-name>
          , et al.,
          <article-title>Climate-fever: A dataset for verification of real-world climate claims</article-title>
          , CoRR abs/
          <year>2012</year>
          .00614 (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Thorne</surname>
          </string-name>
          , et al.,
          <article-title>Fever: a large-scale dataset for fact extraction and verification</article-title>
          ,
          <source>in: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics, Association for Computational Linguistics</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>809</fpage>
          -
          <lpage>819</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>N18</fpage>
          -1074.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>N.</given-names>
            <surname>Prollochs</surname>
          </string-name>
          ,
          <article-title>Community-based fact-checking on twitter's birdwatch platform</article-title>
          ,
          <source>ArXiv</source>
          (
          <year>2022</year>
          ).
          <source>arXiv:abs/2104</source>
          .07175.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>L.</given-names>
            <surname>Huang</surname>
          </string-name>
          , et al.,
          <article-title>A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions, arXiv preprint (</article-title>
          <year>2023</year>
          ). arXiv:arXiv:
          <fpage>2311</fpage>
          .
          <fpage>05232</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>R.</given-names>
            <surname>Confalonieri</surname>
          </string-name>
          , G. Guizzardi,
          <article-title>On the multiple roles of ontologies in explainable ai</article-title>
          ,
          <source>ArXiv</source>
          (
          <year>2023</year>
          ).
          <source>arXiv:abs/2311</source>
          .04778.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>C.</given-names>
            <surname>Menzel</surname>
          </string-name>
          ,
          <article-title>Knowledge representation, the world wide web, and the evolution of logic</article-title>
          ,
          <source>Synthese</source>
          <volume>182</volume>
          (
          <year>2011</year>
          )
          <fpage>269</fpage>
          -
          <lpage>295</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S.</given-names>
            <surname>Pan</surname>
          </string-name>
          , et al.,
          <article-title>Unifying large language models and knowledge graphs: A roadmap, arXiv preprint (</article-title>
          <year>2023</year>
          ). arXiv:arXiv:
          <fpage>306</fpage>
          .
          <fpage>08302</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>J.</given-names>
            <surname>Sequeda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Allemang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Jacob</surname>
          </string-name>
          ,
          <article-title>A benchmark to understand the role of knowledge graphs on large language model's accuracy for question answering on enterprise sql databases</article-title>
          , arXiv preprint (
          <year>2023</year>
          ). arXiv:arXiv:
          <fpage>2311</fpage>
          .
          <fpage>07509</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>J.</given-names>
            <surname>Parsons</surname>
          </string-name>
          , Command and consequence,
          <source>Philosophical Studies</source>
          <volume>164</volume>
          (
          <year>2013</year>
          )
          <fpage>69</fpage>
          -
          <lpage>92</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>H.</given-names>
            <surname>Clark-Younger</surname>
          </string-name>
          ,
          <article-title>Imperatives and the more generalized tarski thesis</article-title>
          ,
          <source>Thought: A Journal of Philosophy</source>
          <volume>3</volume>
          (
          <year>2015</year>
          )
          <fpage>314</fpage>
          -
          <lpage>320</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>J.</given-names>
            <surname>Jørgensen</surname>
          </string-name>
          , Imperatives and logic,
          <source>Erkenntnis</source>
          <volume>7</volume>
          (
          <year>1938</year>
          )
          <fpage>288</fpage>
          -
          <lpage>298</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>H.</given-names>
            <surname>Abukwaik</surname>
          </string-name>
          , et al.,
          <article-title>Interoperability-related architectural problems and solutions in information systems: a scoping study, in: Software architecture</article-title>
          .
          <source>ECSA</source>
          <year>2014</year>
          , volume
          <volume>8627</volume>
          , Springer,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>