<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Script-based Inferences in an Image Schema Story Understander</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jamie C. Macbeth</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Boming Tony Zhang</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sharmin Badhan</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Smith College</institution>
          ,
          <addr-line>10 Elm Street, Northampton, Massachusetts, 01063</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Independent Researcher</institution>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Manning College of Information and Computer Sciences, University of Massachusetts</institution>
          ,
          <addr-line>Amherst, 140 Governors Drive, Amherst, Massachusetts, 01003</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>Recent studies of large language models (LLMs) have revealed that they lack human-like cognitive models of reasoning and understanding. An important thread of research merges image schemas into symbolic artificial intelligence systems where their use as conceptual building blocks and primitives shows promise for the study of human cognition through intelligent systems that perform neurosymbolically. The work presented in this paper demonstrates image schema primitives being used in structures of a representation system called conceptual dependency (CD) and in broader commonsense knowledge structures called scripts. We present a story understanding system that uses image schemas as primitives in scripts that encode stereotypical sequences of events within familiar contexts, such as dining at a restaurant or visiting a doctor. We explain the content and structure of an image schema script, and demonstrate the Image Schema Script Applier (ISSA) as it processes a story and performs anaphora resolution, inference, and summarization.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Image Schemas</kwd>
        <kwd>Scripts</kwd>
        <kwd>Story Understanding</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Recent studies of large language models (LLMs) reveal the ways in which they lack human-like
cognitive models of reasoning and understanding [
        <xref ref-type="bibr" rid="ref1">1, 2, 3</xref>
        ]. A thread of research which merges image
schemas into symbolic artificial intelligence systems where they serve as conceptual building blocks and
primitives shows promise for the study of human cognition through intelligent systems that perform
neurosymbolically.
      </p>
      <p>The work presented in this paper demonstrates image schema primitives being used in structures
of a representation system called conceptual dependency (CD) [4, 5]. The CD framework supports
inference and paraphrase by expressing meaning in a language-independent, structured form that
reveals conceptual relationships between entities and events. CD has been used in broader commonsense
knowledge structures called scripts [6], which encode stereotypical sequences of events within familiar
contexts, such as dining at a restaurant or visiting a doctor. The Script Applier Mechanism (SAM) [7, 8]
used CD-based scripts successfully for natural language understanding and story understanding to
represent events and acts and the actors and objects involved in those acts.</p>
      <p>In this paper, we explore structures of a representation system that combines image schemas and
conceptual dependency, which we call IS-CD, and use them to create script structures. We present
an example script structure composed of a sequence of IS-CD conceptualizations which have image
schema primitives as their central events and conceptual dependency case frames to specify actors,
objects, directions, and other aspects of the event. We also present the Image Schema Script Applier
(ISSA), a rebuild of the original Script Applier Mechanism. ISSA uses image schema-based scripts to
process a narrative posed in natural language and demonstrates its understanding and inferencing
capabilities through summary generation. This work demonstrates image schemas playing a major part
in artificial intelligence systems and structures for language understanding and reasoning.</p>
      <p>The paper has the following structure: Section 2 presents background on image schemas, conceptual
dependency, scripts, and script applier mechanism systems. Sections 3 and 4 introduce the Image
Schema Script Applier and explain in detail its processing of a natural language story using an example
image schema script. Section 5 discusses related work, and the paper concludes with Section 6, a
discussion of future work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <sec id="sec-2-1">
        <title>2.1. Image Schemas and Conceptual Dependency</title>
        <p>Early work in artificial intelligence developed systems for in-depth understanding of natural language
using the conceptual dependency theory of meaning representation structures. Originally developed by
Schank [4, 5] and popularized by Minsky under the term “Trans-Frames” [9], conceptual dependency
(CD) represents the meaning of natural language by abstracting away from surface syntax and focusing
on underlying conceptual structures relying on a small set of abstract primitives. This thread of work
evolved independently from the cognitive linguistics literature on image schemas, which reflect patterns
for understanding and reasoning formed through bodily and sensorimotor experiences [10].</p>
        <p>More recent work on conceptual modeling has explored mappings between image schemas and
conceptual dependency. Macbeth, Gromann, and Hedblom [11] investigated how these systems relate,
especially in representing spatial and physical concepts, and they find that several CD primitives that
represent acts such as moving and ingesting correspond closely to image schemas like Containment
and Source_Path_Goal. The comparison opens possibilities for refining CD by merging or simplifying
its components based on image schema theories. Additionally, the connections between image schemas
and CD opens possibilities for testing image schema theories by implementing them within artificial
intelligence systems in place of CD primitives.</p>
        <p>In the original CD, a conceptual structure can be composed from one of eleven primitive ACTs:
PROPEL, MOVE, INGEST, EXPEL, GRASP, PTRANS, ATRANS, SPEAK, ATTEND, MTRANS, and MBUILD
[5]. At the conceptual level, CD encodes meaning using networks of interrelated elements: Picture
Producers (PPs) for entities, Action Primitives (ACTs) for basic actions, and modifiers called Picture
Aiders (PAs) and Action Aiders (AAs), which describe attributes of objects and actions, respectively.
These elements are linked through conceptual dependencies that specify how one concept contributes
to the interpretation of another. The structure of a CD conceptualization is also governed by a set of
conceptual rules. For instance, a central rule states that a conceptualization must involve an ACT and a
PP in a two-way dependency, indicating that both elements are essential for the event to be meaningful.
Other rules define how attributes can be predicated of concepts, how objects are related to actions,
and how conceptual relations such as containment or possession are encoded. In this work, we create
CD conceptualization structures conforming to the conceptual cases and rules of CD, but using image
schemas as the ACT primitives.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Scripts and The Script Applier Mechanism</title>
        <p>Scripts, as introduced by Schank and Abelson [6], are cognitive structures that encode stereotypical
sequences of events within familiar contexts, with dining at a restaurant being the best known example.
These structured scenarios consist of scenes, roles, and props linked by temporal and causal
relationships. Scripts support eficient comprehension by enabling individuals to infer background events and
disambiguate language based on prior experience. For instance, hearing “I left a tip” activates a
restaurant script that implies a meal and payment occurred. This framework has been influential in cognitive
science and AI, ofering a model for how systems can interpret narratives by leveraging structured,
experience-based knowledge. Scripts are one mechanism through which CD conceptualizations can
relate to each other through higher-level conceptual relations like causality, allowing complex events to
be represented as sequences or chains of interdependent actions and states.</p>
        <p>The Script Applier Mechanism (SAM) [7, 8] is an early computational model developed to simulate
human story understanding by using scripts as structured, context-dependent knowledge. It processes
natural language input by identifying relevant scripts and making inferences to fill narrative gaps
through predefined causal chains and role expectations. The original SAM integrates components such
as a conceptual analyzer (the English Language Interpreter, or ELI [12]), a memory module known
as PP-Memory [8], and a script application system to convert text into meaning representations and
generate outputs such as summaries and answers to questions [13]. Although limited in processing
speed and domain coverage, SAM established foundational principles in natural language understanding
by showing how structured world knowledge and inference strategies such as causal chain completion,
role instantiation, and role merging support coherent interpretation of text. The original script applier
mechanism could process stories presented in English, and generate summaries of stories in English,
Spanish, and Chinese. SAM could also provide answers to questions about stories which were posed
in English. In this paper we present a rebuild of SAM which utilizes image schemas in its knowledge
structures.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. The Image Schema Script Applier</title>
      <p>In this section we describe the Image Schema Script Applier (ISSA), its components, and the script
application process. We provide an example of a script composed of image schema CD (IS-CD) structures,
and an example of the script being applied to process a brief natural language story. We also show how
the script applier mechanism performs interferences and generates summaries and paraphrases of the
story that it has processed.</p>
      <sec id="sec-3-1">
        <title>3.1. An Image Schema Script</title>
        <p>In the original conceptions of scripts [6], they are composed of “causal chains”, sequences of acts and
events represented in a language-free conceptual representation. Scripts in the original SAM also had
multiple “scenes” and “tracks” within scenes that represent the typical afairs in that situation or place.
For example, the restaurant script had scenes for the diferent phases of activity: scenes for ordering,
eating, and paying, with the scenes having multiple tracks or paths, often representing the diferent
ways of accomplishing the activity. The original script applier mechanism was supplied with multiple
scripts for understanding stories about a variety of subjects, from car accidents, train wrecks, and oil
spills, to diplomatic meetings between ministers of foreign afairs. We based our implementation of
SAM and our example of it on detailed logs from Cullingford’s Ph.D. dissertation which showed the
inner workings of SAM as it read a brief story about a car crash [8].</p>
        <p>Table 1 shows a sequence of acts from a script structure that represents knowledge about car accidents.
The script is inspired by a script called $VEHACCIDENT from Cullingford’s work [8]. The original
$VEHACCIDENT had three subscenes corresponding to the crash itself, the treatment of the crash
victims, and (if relevant) the investigation of the accident. The subscenes in the crash scene of the
vehicle accident script have multiple causal chains that represent the many ways and reasons that a
vehicle can end up in a crash and the various objects that it can crash into. Other tracks or subscenes
could handle variations in which, for instance, a vehicle hits another vehicle, or where a vehicle hits a
pedestrian.</p>
        <p>The example we present represents a partial path through subscenes of $VEHACCIDENT in which
a vehicle leaves a road and hits an obstacle on the side of the road. Table 1 shows the causal chain.
The script consists of a special kind of conceptual dependency structure called a pattern which has
variables in some locations of the structures. Each script pattern also has a list containing the pattern
IDs of other patterns which are predicted to be mentioned soon afterward. These lists appear in the
“Predicted” column.</p>
        <p>Pattern ID
CRA1
CRA2
CRA3
CRA4
TRE1
TRE2</p>
        <p>Source_Path_Goal
Support
Source_Path_Goal
Force
Containment
Source_Path_Goal
ACTOR &amp;VEHICLE
OBJECT &amp;VEHICLE
ACTOR &amp;LINK
OBJECT &amp;VEHICLE
ACTOR &amp;VEHICLE
OBJECT &amp;VEHICLE
FROM &amp;LINK
ACTOR &amp;VEHICLE
OBJECT &amp;OBSTACLE
ACTOR &amp;AMBVEHICLE
OBJECT &amp;HURTGRP
ACTOR &amp;AMBVEHICLE
OBJECT &amp;HURTGRP
TO &amp;HOSPORG</p>
        <p>Predicted
CRA2, CRA3
CRA3
CRA4
TRE1, TRE2
TRE2</p>
        <p>The first pattern, CRA1, represents the car in motion. In a conceptual dependency-based script,
this pattern would have been a PTRANS act, representing that a object or being changed its location.
However, in IS-CD, based on an established mapping between CD primitives and image schemas [11],
we represent CRA1 using a Source_Path_Goal image schema. The conceptual structures that make
up the script pattern retain the conceptual cases which are common in CD structures. In conceptual
dependency, PTRANS acts usually have an ACTOR case representing the animate being that performed
the act and an OBJECT case representing the thing that moved. In CRA1, both the ACTOR and the
OBJECT are the script variable &amp;VEHICLE to indicate that the vehicle is “moving itself.” In CD, the
ACTOR and OBJECT case may be diferent beings or objects in situations where one animate being
is responsible for changing the location of another thing or being. In CD, PTRANS additionally has
TO and FROM conceptual cases to represent the direction of the movement. Since the TO and FROM
conceptual cases are not part of the pattern and they do not have any script variables associated with
them, the pattern will successfully match any TO and FROM cases in the input structure.</p>
        <p>The second pattern, CRA2, represents a Support relationship between the vehicle and the road or
other surface that it is traveling on. Here the ACTOR is a script variable named &amp;LINK. Interestingly,
while Link is a known image schema [10], in earlier work on script understanding systems, LINK refers
to an abstract class of picture producers that are objects that connect locations together. In Cullingford’s
description, roads, train tracks, ship channels, and other paths are LINKs [8]. The script variable in the
car crash script which is usually assigned to these kinds of picture producers (in English expressions
such as “Route 9” or “Elm Street”) is called &amp;LINK as well.</p>
        <p>CRA3 is a second Source_Path_Goal conceptualization which represents the car leaving the
road, which could be matched to English verbs such as “veer” or “swerve”. As with the earlier
Source_Path_Goal, this appeared in CD scripts as a PTRANS act, and both the ACTOR and
OBJECT are the &amp;VEHICLE , again to indicate that no external force is causing the vehicle’s motion. There
is also a FROM case which indicates that the motion is away from the &amp;LINK.</p>
        <p>In the original CD script conception, the next pattern, CRA4, would have been a PROPEL
conceptualization referring to the car colliding with an obstacle. Earlier work [11] mapped the PROPEL CD
primitive to the Force image schema. Therefore, here, in Table 1, CRA4 is a Force image schema
con“A car swerved of</p>
        <p>the road.”
“It hit a tree.”
“The driver went to</p>
        <p>the hospital ...”
“... in an ambulance.”
Source_Path_Goal</p>
        <p>ACTOR CAR
OBJECT CAR
FROM ROAD</p>
        <p>Force
ACTOR PHYSOBJECT</p>
        <p>OBJECT TREE
Source_Path_Goal</p>
        <p>ACTOR DRIVER
OBJECT DRIVER</p>
        <p>TO HOSPITAL</p>
        <p>Containment
ACTOR AMBULANCE</p>
        <p>OBJECT DRIVER</p>
        <p>Script
Pattern Match</p>
        <p>CRA3
CRA4
TRE2
TRE1
&amp;VEHICLE → CAR</p>
        <p>&amp;LINK → ROAD
&amp;OBSTACLE → TREE
&amp;HURTGRP →</p>
        <p>DRIVER
&amp;HOSPORG →</p>
        <p>HOSPITAL
&amp;AMBVEHICLE →</p>
        <p>AMBULANCE
ceptualization which retains &amp;VEHICLE as the ACTOR case and a script variable named &amp;OBSTACLE
as the OBJECT case. The &amp;OBSTACLE variable will match to picture producers capable of damaging a
vehicle in a collision, such as trees, walls, poles and the like.</p>
        <p>The final two patterns in the script are part of a subscene in which one or more persons involved
in the accident are treated for injuries. TRE1 represents one or more persons going into or being put
into an ambulance. In CD there is a predicate called CONTAIN which is mapped to the Containment
image schema. While CONTAIN is not one of the eleven primitive acts of CD, it may be used to indicate
containment relationships between picture producers. TRE1 has a conceptualization based on the
Containment image schema with the ACTOR case representing the containing object and the OBJECT
case representing the contained object.</p>
        <p>In this case, a script variable &amp;AMBVEHICLE represents the vehicle that is transporting injured
persons, and &amp;HURTGRP is a variable that can represent one or more injured persons which are being
transported. TRE2 represents the ambulance vehicle, &amp;AMBVEHICLE, transporting the injured persons,
&amp;HURTGRP, to the hospital organization location, &amp;HOSPORG. In the original SAM, the existence
of these variables allows the script to handle sentences from newspaper stories which specified the
ambulance and hospital organizations such as “[the driver] was taken to Milford Hospital by Flanagan
Ambulance.”</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. The Script Application Process</title>
        <p>The original SAM system “read” various types of newspaper stories. Here we demonstrate ISSA’s script
application process as our car accident script is applied to a brief story about a car accident. The story
below is reminiscent of a newspaper story from the New Haven Register which was processed by the
original SAM [8, 7]. The story involves a vehicle going of a road and striking an obstacle, and then an
injured party being taken to the hospital. The story is:</p>
        <p>“A car swerved of the road. It hit a tree. The driver went to the hospital in an ambulance.”
All of the sentences in the story correspond directly to patterns in the car accident script, with “a car
swerved of the road” corresponding to CRA3 and “it hit a tree” corresponding to CRA4 (see Table 1).
The last sentence has parts that correspond to two diferent patterns in the script with “the driver went
to the hospital” corresponding to TRE2 and “... in an ambulance” corresponding to TRE1. However
there are patterns in the script which do not match with any particular statements in the story.</p>
        <p>The Image Schema Script Applier performs the following steps in processing the story. First a
sentence of the story is fed to the conceptual analyzer system, which analyzes the natural language
input to produce an initial language-independent conceptual representation. The column labeled
“Conceptual Analysis” in Table 2 shows conceptual dependency representations which are the outputs
of a conceptual analysis of sentences and phrases from the input story.</p>
        <p>In the original SAM, the output of a conceptual analysis would have been conceptual dependency
representations which had primitive acts and conceptual “cases” which could indicate the actor or
object of an act or the specification of a directionality of a movement. In CD structures, the conceptual
cases are often filled by picture producer elements that represent objects, human story actors, and
locations (also called PPs). As with the scripts, in the image schema version of conceptual analysis,
the representations are a hybrid; they are CD structures which have, as primitive acts, image schemas
in place of the CD primitives. ISSA uses a version of a conceptual analyzer called CA [14, 15] which
originally created CD structures but has been modified to produce IS-CA structures with image schemas
as primitives.</p>
        <p>Next, in the most important part of the script applier mechanism understander’s process, a conceptual
representation from the conceptual analysis is matched against conceptual representation patterns in
the script. The original SAM [8] was furnished with multiple scripts and had a script activation process
which simulated how a human understander with a large store of commonsense knowledge determines
which script knowledge structures to call into memory for understanding the story. In contrast, in this
simple demonstration of image schema script application, our simplified car accident script has already
been activated previously to the first sentence being analyzed, and it has been pre-determined that the
vehicle accident script patterns will be used for matches.</p>
        <p>As stated above, in SAM, scripts are represented as sequences of conceptual dependency structures
(called patterns) which have script variables in some locations of the structures where a picture producer
would have been expected. In the basic mechanism, ISSA attempts to match the structure from the input
against a pattern in the script that represents one of the events expected to come next in the story. The
match takes into account the image schema which is in the typical position of the CD primitive act, and
the actor, object, to, and from cases in the IS-CD structure. Some patterns do not have all of the possible
conceptual “cases”. In that situation the pattern will match with anything that an IS-CD structure has
for that particular case. When there is a successful pattern match, ISSA binds any script variables in
the pattern to picture producers in the IS-CD structure. The “Variable Assignments” column in Table 2
shows how script variables are assigned in the matching process. We enhanced a script matcher from
the Common Lisp version of Micro SAM [16] to perform the matching.</p>
        <p>The script applier checks the story input CD structure against patterns in the script in the order that
they appear in a search list maintained by the script applier [8]. At the start of the script matching
process, the search list is initialized to contain the patterns in the script’s causal chain order. To reflect
script expectations of what events are likely to appear next in the story, when a particular pattern is
matched, the system reorders the search list to bring pattern IDs in the “predicted” list of the matched
pattern to the beginning. This allows conceptual analyses on sentences and phrases in the story to
match with patterns even if they appear slightly “out of order”. This turns out to be the case for the
third and last sentence of the story. The conceptual analysis results in two diferent CD structures, the
ifrst representing “the driver went to the hospital ...”, and matching with TRE2, the second representing
“... in an ambulance”, and matching with TRE1. The process also performs anaphora resolution. “It”
in the second sentence appears as PHYSOBJECT in the conceptual analysis, and is “merged” with the
&amp;VEHICLE script variable in the script application process.</p>
        <p>Event ID
EVNT1</p>
        <p>Source_Path_Goal
EVNT2
EVNT3
EVNT4
EVNT5
EVNT6
“A car was moving ...”
“ ... and a road supported</p>
        <p>the car”
“The car left the road ...”
“ ... and ran into a tree.”
“The driver was in an</p>
        <p>ambulance ...”
“... and went to the
hospital.”</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Story Representation and Summary Generation</title>
        <p>Once the script application process is complete, the script applier builds a full story representation.
It does this by instantiating all of the patterns in the script and replacing script variables with their
bindings. This includes patterns which matched the story as well as patterns that are present in the
script but did not match any of the CD structures created in the conceptual analysis of the story input.
Because the story representation contains conceptual structures which were part of the script but
not part of the story, these structures comprise inferences of facts and events that the understander
completes which were not explicitly stated in the story. We generated instantiations using code (the
instantiate function) from a Common Lisp version of Micro SAM [16]. Table 3 shows the story
representation built by SAM based on its processing of the story in Table 2 using the script in Table 1.</p>
        <p>The system is also able to generate natural language summaries of stories based on the full story
representation. For this, we used an enhanced version of Neil Goldman’s BABEL system [13], which
generates natural language from non-linguistic conceptual structures. The Image Schema Script Applier
system is able to generate a summary of the story which paraphrases the original and includes inferences
that are the result of the script application process. Here is an example:</p>
        <p>A car was moving and the road supported the car. The car left the road and ran into a tree.</p>
        <p>The driver was in an ambulance and went to the hospital.</p>
        <p>The “Output Text” column of Table 3 shows the sentences and phrases generated by BABEL based on
the story representation. The story understanding structures labeled EVNT1 and EVNT2 corresponding
to “a car was moving and the road supported the car,” are inferences of events and spatial relationships
based on the script application.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Discussion</title>
      <p>Combining image schemas with conceptual dependency primitives in a story understander provides new
and interesting opportunities for juxtaposing the two systems. The patterns in scripts in the original
SAM were mainly focused on conceptual dependency events and acts which brought about changes in
the world, such as movement and state change. More recent work juxtaposing image schemas with
conceptual dependency primitives [11, 17, 18] has raised the importance of spatial relationships in
primitive decomposition representations. In image schema-based scripts, one point of interest is that
we have an occurrence of a “static” spatial relationship being represented in a script pattern, which we
believe would have been rare or nonexistent in the original SAM. For example, CRA2 (Table 1) has a
Support image schema to represent the car being on the road and being supported by it, while TRE1
has a Containment image schema to represent injured persons being in the ambulance. In the case
of CRA2, this stretches considerably the convention that each pattern in a script sequence is causally
linked to the next or happens temporally before the next, since the Support of the car by the road is
conceived to be happening simultaneously with the Source_Path_Goal event of CRA1.</p>
      <p>During the development of the IS-CD script structures, we considered options for representing the
road that the vehicle is traveling on. We may have be able to represent the road as an abstract space
above the road surface so that we could use a Containment image schema to represent the car being
contained in that space. The original SAM also had representations of “settings” and “locales” where
events occurred. For example, it would create a CD structure using a conceptual dependency LOC
predicate to indicate the location where the car ran into the obstruction as being somewhere near
the road that the car was traveling on. We considered representing the setting of the accident, the
larger area around the road that contains the setting of the accident, and the hospital as larger spatial
objects and using Containment or Location image schemas to indicate that other objects in the story
were located in or at these spaces. This would align with recent work on primitive decompositions of
spatial relationships [17], but we were discouraged to attempt this at such an early stage because of our
concern that the BABEL generator might produce awkward-sounding texts such as “the ambulance
went into the scene of the accident”.</p>
      <p>A story about a car accident raises significant questions regarding the ACTOR case of the
conceptualizations, since the engine which is propelling the vehicle is part of the vehicle, but, also, presumably,
there is a person driving. The original SAM partially addressed this issue by having an embedded
$DRIVE script inside the $VEHACCIDENT script [8]. Ideally there should be additional patterns in the
script to represent driving. In both of the Source_Path_Goal conceptual structures, the ACTOR and
the OBJECT are the same, which is supposed to mean that the object was “moving itself”. In the case of
the driver of the crashed vehicle going to the hospital in an ambulance this should not imply that the
driver was driving the ambulance. One possible resolution would be to use an Agency image schema
primitive [19]. Analyzing and representing ACTOR roles in these conceptualizations in consistent ways
is a topic of ongoing research.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Related Work</title>
      <p>An important genre of work maps, compares, and merges image schemas with conceptual dependency
and uses the connections to evolve the set of CD primitives that CD-based AI systems use for in-depth
understanding. In [11], Macbeth, Gromann, and Hedblom, investigate the relationship between image
schemas and CD, two frameworks used to represent meaning in natural language understanding. Image
schemas come from cognitive linguistics and reflect patterns formed through sensorimotor experiences,
while conceptual dependency primitives are from artificial intelligence and aim to model human-like
understanding through a limited set of abstract actions. The considerable overlap that they find suggests
that CD can be grounded more firmly in cognitive theory and that image schemas may benefit from
formal structuring provided by CD.</p>
      <p>Other related work reveals that some CD primitives are potentially redundant, as they can be
expressed as combinations of other primitives. Macbeth and Gromann [20] investigate the potential
of using a formal logic framework based on image schemas to represent the primitives of conceptual
dependency. By applying a formal system called Image Schema Logic, which integrates elements of
spatial and temporal logic, they show that complex CD primitives such as INGEST and EXPEL can be
modeled using simpler, more general components like movement, containment, and direction. The study
concludes that this approach could streamline the CD inventory and improve formal representations of
language understanding in AI systems.</p>
      <p>Macbeth et al. [18] take this work further. They explore removing the INGEST conceptual primitive
from the set of CD primitives and replacing its use with combinations of other CD primitives, namely
PTRANS and CONTAIN, which are the analogs of Source_Path_Goal and Containment image
schemas. In this work, the BABEL system [13] proved efective in generating paraphrases that reveal
how image schemas and CD primitives can be combined and contrasted, ofering insight into improving
CD’s cognitive alignment. The results strongly support replacing INGEST with PTRANS and set
the stage for similar future analyses, including the decomposition of EXPEL, INGEST’s conceptual
opposite. These threads of work seek to make the set of conceptual primitives more compact, increasing
the richness and expressiveness of the primitive-decomposed meaning structures in ways that better
corresponded with human cognition and its capability for complex mappings and manipulations of
meaning structures.</p>
      <p>The current paper combines image schemas with scripts in ways that allow for alternative forms
of logical deduction and inference. Enhancements of Micro SAM [16] that combine multiple scripts
have also been used for in artificial intelligence systems for story generation [ 21] and cyberbullying
prevention [22]. In relation to the script applier’s inference capability, Hedblom et al. [23] examine
how image schemas can be used to represent complex events in a formal and meaningful way through
Image Schema Logic, which enables the formalization of these schemas and their interactions over time
and space. They examine and arrange image schemas sequentially to represent the progressions of
events, a way to capture the dynamics of everyday actions in a way that relates to human reasoning.
Other related work uses image schemas for logical inferences [24].</p>
      <p>In the same way that scripts combine CD conceptualizations into more complex structures, there
has been related work on combining image schemas. Hedblom et al. [25] demonstrate how image
schema profiles can efectively represent the conceptualization of events. Drawing on research in
event segmentation and cognitive linguistics, they show that clusters of image schemas can capture
conceptualizations in particular linguistic contexts. The paper introduces three diferent characters by
which image schemas can be combined: merge, collection and sequence. Again using Image Schema
Logic (ISL), the authors illustrate how everyday actions like dropping an egg and cracking an egg into a
bowl can be decomposed into schema-driven event segments such as Support, Source_Path_Goal,
Containment, and Splitting. The resulting collection of formalized image schemas serves as a
cognitively grounded repository of ontology design patterns for modeling event conceptualizations in
intelligent systems. Besold, Hedblom, and Kutz [26] provide illustrations and a proof of concept for how
the image schemas Object, Contact, and Path are combined in a temporal dimension to form more
complex image schemas and simple events, specifically Blockage, Bouncing, and Caused_Movement.
The authors also present an outline of a proposed conceptual hierarchy of levels of modeling for image
schemas and similar cognitive theories.</p>
      <p>In related work on image schemas, scripts, and narrative, Ranta [27] explores how pictorial storytelling,
especially in static images such as paintings, communicates narrative meaning through cognitive
structures like schemas and scripts. Like script-based inferencing for natural language understanding,
the work highlights how understanding a pictorial narrative often involves the viewer filling in missing
information using prior experience, cultural context, and interpretive expectations. Wicke and Veale
[28] present a novel approach to exploring image schemas in the realm of computational embodied
storytelling and propose a framework that utilizes a storytelling system to explore the causal connections
between image schemas. For this investigation, a system has been implemented on a Nao humanoid
robot, which has a set of 9 prominent image schemas with more than 800 story actions (plot verbs) from
the storytelling system. Kimmel [29] argues that story understanding involves a mental simulation of
interaction between image schemas and, in [30], demonstrates how image schemas are fundamental to
understanding narratives, both in figurative and literal forms. Image schemas such as Path, Container,
Force, Balance, and Part-Whole structure the basic spatial, temporal, and causal-intentional logic of
events, forming the foundation of story comprehension.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion and Future Work</title>
      <p>In this paper we presented a script applier system based on image schemas and showed how it could
process a brief natural language story and perform inferencing and understanding. This shows both
how the system is able to perform script-based inferences based on image schemas, but also how it forms
a language-independent representation of understanding. The findings also move toward unifying
image schemas with CD, grounding the latter in cognitive linguistics and experimental psychology
while supporting cognitive AI applications.</p>
      <p>One important issue that remained unexplored in this work is the question of how to use ontological
knowledge in the script application process. Conceptual dependency theory provides little in the way
of characteristics and primitives to describe or represent the characteristics of objects when they are
picture producers in conceptual structures. The original SAM performed a rolefit process as part of
matching a script pattern to a story structure. Rolefit ascribed a “class” and “type” to all PPs (for example,
making “car” a #STRUCTURE structured object of TYPE *CAR*) and compared these to sets of classes
associated with the &amp;VEHICLE script variable. Image schema research may provide building blocks for
representing PPs in a richer way.</p>
      <p>The original SAM also had many more scripts than the system presented in this paper. Scripts in the
original SAM were larger and had multiple “tracks” and “subscenes”. Future work can explore image
schemas in the script activation process and the simultaneous application of multiple scripts when
understanding a story.</p>
      <p>Having a script applier mechanism for both image schemas and conceptual dependency enriches
the types of studies that can be performed with the two theories. For example, the issues with the CD
ACTOR case in the context of the Source_Path_Goal representing the movement of a vehicle being
controlled by an intelligent being could be resolved with further studies of stories about driving. Further
decompositions of the image schema CD structures’ ACTOR case using an Agency image schema could
be studied with the inferences and paraphrases that are generated using scripts from diferent systems.
Other possible topics of future work include turning ISSA on its head and making it a story generator,
studies of uncertain or non-monotonic reasoning and question answering, and integrations with large
language models.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>We acknowledge Larry Birnbaum and Mallory Selfridge for their development of the original CA
conceptual analyzer. We thank Mark Burstein for providing a version of the original CA code and
transcoding it to Common Lisp, Neil Goldman for providing original code for BABEL, and Richard
Cullingford for helpful discussions about SAM.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>The author(s) have not employed any Generative AI tools in writing this paper.
[2] I. Mirzadeh, K. Alizadeh, H. Shahrokhi, O. Tuzel, S. Bengio, M. Farajtabar, GSM-symbolic:
Understanding the limitations of mathematical reasoning in large language models, 2024. URL:
https://arxiv.org/abs/2410.05229. arXiv:2410.05229.
[3] L. Berglund, M. Tong, M. Kaufmann, M. Balesni, A. C. Stickland, T. Korbak, O. Evans, The reversal
curse: LLMs trained on "A is B" fail to learn "B is A", in: The Twelfth International Conference
on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024, OpenReview.net, 2024.</p>
      <p>URL: https://openreview.net/forum?id=GPKTIktA0k.
[4] R. C. Schank, Conceptual dependency: A theory of natural language understanding, Cognitive</p>
      <p>Psychology 3 (1972) 552–631.
[5] R. C. Schank, Conceptual Information Processing, Elsevier, New York, NY, 1975.
[6] R. C. Schank, R. P. Abelson, Scripts, Plans, Goals and Understanding: An Inquiry into Human</p>
      <p>Knowledge Structures, Lawrence Erlbaum Associates, Mahwah, NJ, 1977.
[7] R. E. Cullingford, Pattern-matching and inference in story understanding, Discourse Processes 2
(1979) 319–334.
[8] R. E. Cullingford, Script Application: Computer Understanding of Newspaper Stories, Ph.D. thesis,</p>
      <p>Yale University, New Haven, CT, 1977.</p>
      <p>[9] M. Minsky, Society of Mind, Simon &amp; Schuster, New York, 1988.
[10] J. M. Mandler, C. Pagán Cánovas, On defining image schemas, Language and Cognition 6 (2014)
510–532.
[11] J. C. Macbeth, D. Gromann, M. M. Hedblom, Image schemas and conceptual dependency primitives:
A comparison, in: Proceedings of The Joint Ontology Workshops, Episode 3: The Tyrolean Autumn
of Ontology, The International Association for Ontology and its Applications, Bolzano-Bozen,
Italy, 2017.
[12] C. K. Riesbeck, An expectation-driven production system for natural language understanding, in:
D. A. Waterman, F. Hayes-Roth (Eds.), Pattern-Directed Inference Systems, Elsevier, New York,
1978, pp. 399–413.
[13] N. M. Goldman, Sentence paraphrasing from a conceptual base, Communications of the ACM 18
(1975) 96–106.
[14] L. Birnbaum, M. Selfridge, Conceptual analysis of natural language, in: R. C. Schank, C. K.
Riesbeck (Eds.), Inside Computer Understanding: Five Programs Plus Miniatures, Lawrence Erlbaum
Associates, Hillsdale, NJ, 1981, pp. 318–353.
[15] L. Birnbaum, M. Selfridge, Problems in Conceptual Analysis of Natural Language, Research Report
#168, Yale University, Department of Computer Science, New Haven, CT, 1979.
[16] R. C. Schank, C. K. Riesbeck, Micro SAM, in: Inside Computer Understanding: Five Programs Plus</p>
      <p>Miniatures, Lawrence Erlbaum Associates, Hillsdale, NJ, 1981, pp. 120–135.
[17] M. Zhou, B. Duah, J. C. Macbeth, Novel primitive decompositions for real-world physical reasoning,
in: K. R. Thórisson (Ed.), Proceedings of the Third International Workshop on Self-Supervised
Learning, volume 192 of Proceedings of Machine Learning Research, PMLR, 2022, pp. 22–34. URL:
https://proceedings.mlr.press/v192/zhou22a.html.
[18] J. C. Macbeth, A. Kilayko, Z. Zhao, S. Song, W. X. Zheng, Image schema decompositions of the
conceptual dependency ingest primitive: A study of paraphrases, in: Proceedings of The Seventh
Image Schema Day (ISD7), The International Association for Ontology and its Applications, Rhodes,
Greece, 2023.
[19] J. M. Mandler, How to build a baby: Ii. conceptual primitives., Psychological review 99 (1992) 587.
[20] J. C. Macbeth, D. Gromann, Towards modeling conceptual dependency primitives with image
schema logic, in: The Fourth Workshop on Cognition And OntologieS (CAOS IV) at The Fifth Joint
Ontology Workshop (JOWO’19), The International Association for Ontology and its Applications,
Graz, Austria, 2019.
[21] M. McKenzie, A. Kilayko, J. C. Macbeth, S. Carter, K. Sieck, M. Klenk, Script combination for
enhanced story understanding and story generation systems, in: Proceedings of the Tenth Annual
Conference on Advances in Cognitive Systems, The Cognitive Systems Foundation, Arlington,
VA, 2022.
[22] J. Macbeth, H. Adeyema, H. Lieberman, C. Fry, Script-based story matching for cyberbullying
prevention, in: CHI 2013 Extended Abstracts: ACM SIGCHI Conference on Human Factors in
Computing Systems, Paris, France, 2013.
[23] M. M. Hedblom, O. Kutz, R. Peñaloza, G. Guizzardi, Image schema combinations and complex
events, KI - Künstliche Intelligenz 33 (2019) 279–291. doi:10.1007/s13218-019-00605-1.
[24] M. M. Hedblom, O. Kutz, T. Mossakowski, F. Neuhaus, Between contact and support: Introducing
a logic for image schemas and directed movement, in: F. Esposito, R. Basili, S. Ferilli, F. A. Lisi
(Eds.), AI*IA 2017 Advances in Artificial Intelligence, Springer International Publishing, Cham,
2017, pp. 256–268.
[25] M. M. Hedblom, O. Kutz, R. Peñaloza, G. Guizzardi, What’s cracking? how image schema
combinations can model conceptualisations of events, in: Proceedings of The Fourth Image
Schema Day (ISD4), Bolzano, Italy, 2018.
[26] T. R. Besold, M. M. Hedblom, O. Kutz, A narrative in three acts: Using combinations of image
schemas to model events, Biologically Inspired Cognitive Architectures 19 (2017) 10–20. doi:10.
1016/j.bica.2016.11.001.
[27] M. Ranta, The role of schemas and scripts in pictorial narration, Semiotica 2021 (2021) 1–27.</p>
      <p>doi:10.1515/sem-2019-0071.
[28] P. Wicke, T. Veale, Wheels within wheels: A causal treatment of image schemas in an embodied
storytelling system, in: ISD4: Image Schema Day IV, Bozen-Bolzano, 2018. URL: https://api.
semanticscholar.org/CorpusID:123769744.
[29] M. Kimmel, From metaphor to the "mental sketchpad": Literary macrostructure and
compound image schemas in heart of darkness, Metaphor and Symbol 20 (2005) 199–
238. URL: https://doi.org/10.1207/s15327868ms2003_3. doi:10.1207/s15327868ms2003\_3.
arXiv:https://doi.org/10.1207/s15327868ms2003_3.
[30] M. Kimmel, Analyzing image schemas in literature, Cognitive Semiotics 9 (2009) 159–188.
doi:10.3726/81609_159.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Greengard</surname>
          </string-name>
          ,
          <article-title>Shining a light on AI hallucinations</article-title>
          ,
          <source>Commun. ACM</source>
          <volume>68</volume>
          (
          <year>2025</year>
          )
          <fpage>9</fpage>
          -
          <lpage>11</lpage>
          . URL: https: //doi.org/10.1145/3715691. doi:
          <volume>10</volume>
          .1145/3715691.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>