=Paper=
{{Paper
|id=Vol-3888/paper7
|storemode=property
|title=“The Time for Action has Arrived”: Extending the IS Catalogue Leveraging Large Language Models
|pdfUrl=https://ceur-ws.org/Vol-3888/Paper_7.pdf
|volume=Vol-3888
|authors=Stefano De Giorgis,Guendalina Righetti
|dblpUrl=https://dblp.org/rec/conf/isd2/GiorgisR24
}}
==“The Time for Action has Arrived”: Extending the IS Catalogue Leveraging Large Language Models==
<pdf width="1500px">https://ceur-ws.org/Vol-3888/Paper_7.pdf</pdf>
<pre>
                        “The time for action has arrived”: Extending the IS
                        Catalogue leveraging Large Language Models
                         Stefano De Giorgis*,†1 , Guendalina Righetti*,†2
                         1
                          Institute for Cognitive Sciences and Technologies - National Research Council (ISTC-CNR), Italy
                         2
                          Department of Philosophy, Classics, History of Art and Ideas, University of Oslo, Blindernveien 31 Georg Morgenstiernes hus 0313
                         Oslo


                                      Abstract
                                      Image Schema research has long been hampered by the scarcity of annotated data, limiting advancements in the
                                      field. This paper presents a novel approach to overcoming this challenge by leveraging Large Language Models
                                      (LLMs) to extend the IS Catalogue. We systematically tested various LLMs to identify the most effective model
                                      for this task, ultimately selecting Claude 3.5 Sonnet. We asked the model to extend the IS catalogue in two ways:
                                      first, by expanding the annotations associated with the sentences in the catalogue to include multiple IS; second,
                                      by generating new literal sentences to be added to the catalogue. To evaluate the model we conducted several
                                      analyses, including accuracy ratings with the original annotations. Our approach demonstrated remarkable
                                      efficacy, with the chosen model successfully retrieving the original annotation in 81% of cases when considering
                                      the entire set of image schemas extracted in the profile. This approach enables rapid processing and annotation
                                      of large text volumes while maintaining high accuracy and consistency. Partial evaluation by domain experts has
                                      found the enriched IS Catalogue to be sound and plausible, suggesting that LLM-assisted extension can produce
                                      high-quality synthetic data aligned with expert knowledge. Our method offers a promising solution to the data
                                      scarcity problem in IS research, potentially accelerating advancements in the field.


                         1. Introduction
                         Image schemas (IS) are foundational conceptual structures within the paradigm of embodied cognition.
                         These schemas encapsulate sensorimotor experiences and play a crucial role in shaping abstract cog-
                         nition, including commonsense reasoning and the semantic underpinnings of natural language (see
                         e.g. Mandler and Hampe [1, 2]). As internally structured gestalts [3], image schemas are composed of
                         spatial primitives (SP) that coalesce into unified wholes of meaning, thereby forming more complex
                         schematic structures [4, 1, 5].
                            The current main IS repository is the Image Schema Catalogue [6, 7]. While valuable, there are
                         some problems to it: (i) each sentence is annotated with only one IS, and (ii) the list of IS used is not
                         comprehensive, due to the open debate about the final full list. As a result, a single annotation often
                         oversimplifies the rich, multi-layered nature of image schematic conceptual structures embedded in
                         everyday language as well as in conceptual metaphors. Consider the sentence “Sally found an idea
                         in the book,” which is annotated solely with the Object schema in the catalogue. This annotation,
                         while not incorrect, fails to capture the full conceptual richness of the expression. The presence of the
                         preposition ‘in’ clearly activates the Containment schema, suggesting that ideas are conceptualized
                         as entities, that can be contained within physical or non-physical objects (in this case, a book). This
                         example underscores a critical issue: the overlapping and often inseparable nature of image schemas
                         in natural language. The Object schema (applied to both ‘idea’ and ‘book’) and the Containment
                         schema are not merely co-present but fundamentally intertwined in conveying the sentence’s meaning.
                            Another level of problematicity (iii) is given by the scarcity of data. More specifically, in the Image
                         Schema Catalogue, the sentences gathered exemplify mostly metaphoric usage of image schemas (see

                          The Eighth Image Schema Day (ISD8), 25–28 November 2024, Bozen-Bolzano, Italy
                          *
                            Corresponding authors.
                          †
                            These authors contributed equally to this work.
                          $ stefano.degiorgis@cnr.it (S. De Giorgis*,† ); guendalina.righetti@ifikk.uio.no (G. Righetti*,† )
                           0000-0003-4133-3445 (S. De Giorgis*,† ); 0000-0002-4027-5434 (G. Righetti*,† )
                                     © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
above the “Sally found an idea in the book” example). While this focus is valuable when the catalogue
is used for research in conceptual metaphor or blending, incorporating examples of more concrete
applications would be advantageous for studies involving practical scenarios, such as applying image
schema research to robotics and similar fields. In such cases, providing concrete examples could also
increase the pedagogical value of the catalogue.
   In this work, we tackle problems (i) and (iii), namely the limited IS Catalogue annotation and the
scarcity of available data, and we do so by exploiting synthetic data generation via the usage of
Generative AI in the form of Large Language Models (LLM).
   We propose the following pipeline. 1. We compare several state-of-the-art LLMs on a classification
task to test their ability to identify image schematic knowledge from natural language; 2. Once identified
the best model, we pass each sentence of the IS Catalogue to the LLM, asking it to indicate the image
schema(s) evoked by the sentence, ranking them in descending order from the most relevant to the
least; 3. We ask the model to reproduce, for each metaphoric sentence contained in the catalogue, a
corresponding literal sentence replicating the same image-schematic pattern found in the metaphorical
one (with results like this: “The time for action has arrived” → “The train has arrived to the station”). We
conducted several analyses to shed light on Claude’s actual capabilities of mastering the identification
and reuse of image schematic content in natural language sentences, including annotation matching,
Is distribution analysis, co-occurrence and confusion matrices analyses. Overall, we achieve an 81%
precision comparing the newly produced annotation with the original one, and we manually validate
(part of) the extended IS Catalogue with the help of domain experts.
   The paper is organised as follows: Section 2 provide some useful references to image schema and
IS Catalogue usage; Section 3 details the methodology, prompting strategy and technical details of
the approach; in Section 4 we show some quantitative analysis of the IS Catalogue extension and we
reflect on interesting findings of these experiments. Finally Section 5 concludes the paper and opens to
possible future works.
   The paper is organised as follows: Section 2 provide some useful references to image schema and
IS Catalogue usage; Section 3 details the methodology, prompting strategy and technical details of
the approach; in Section 4 we show some quantitative analysis of the IS Catalogue extension and we
reflect on interesting findings of these experiments. Finally Section 5 concludes the paper and opens to
possible future works.


2. Background and Related Work
Image schemas, firstly introduced by Lakoff and Johnson [3, 8, 9] are now recognised as sensorimotor
cognitive patterns shaping our way of conceive the world and establish semantic relations, based on our
bodily perception [10, 11, 12, 13, 14]. Recent significant efforts to investigate image schemas and their
compositional nature include the development of Image Schema Logic ISL𝐹 𝑂𝐿 [15] their capabilities in
conceptual blending [16, 17], and the ImageSchemaNet ontology [18]. These frameworks provide robust
tools for analyzing phenomena such as conceptual blending and cognitive metaphors, as well as image
schematic analysis of complex events [19]. Image schemas can be represented also via a diagrammatic
image schema language [20].
  While corpus-based studies [21, 22] and machine learning approaches [23, 24, 25] have explored the
presence of image schemas in natural language, the complexity of image schema annotation in natural
language still presents a significant challenge in the field of cognitive linguistics and computational
semantics.
  We refer to IS Catalogue in its version enriched with the MetaNet source and target domains align-
ment1 . The dataset includes linguistic examples taken from several resources (MetaNet, Lakoff and
Johnson works, Dodge [12], etc.) and for each example it individuates the corresponding conceptual
metaphor, source and target domain, the “sensorimotor” source domain - namely the embodied ground-
ing of the sentence, which sometimes is a single spatial primitive, sometimes a full image schema -
1
    Available here: https://github.com/dgromann/ImageSchemaRepository
and a dedicated column to the image schema evoked by the sentence. The IS Catalogue has been used
previously in several works, with different purposes, for example in ODIN [26] for identifying image
schematic grounding in Ontology Design Patterns; as a support tool for User Interface Design tasks [7],
and in linguistic tasks to train a supervised classifier to classify natural language expressions in order
to detect image schemas from multilingual inputs [25].
   In this context, we employ the term “‘image schema profile” as defined by [27] and [28]. This concept
refers to the collective set of activated image schemas associated with a particular entity, sentence,
situation, or event, providing a comprehensive framework for analyzing the schematic underpinnings
of linguistic expressions. Specifically, we refer to the set of IS annotated by Claude for a sentence as
“IS Profile” of that sentence. Note that, while in original definition of “IS Profile”, the order of IS is not
relevant, in our case it is, since, as detailed in Section 3, we instructed the LLM to order the annotations
listing in descending order IS from the most relevant (the one fitting the most the analysed sentence) to
the least relevant.


3. Methodology
In this section we provide details about the pipeline adopted to perform the LLM annotation, and dataset
extension. The four main steps are: 1. choice of the best LLM model in terms of competence over the
image schematic domain; 2. classification task, passing each sentence of the IS Catalogue for multi-label
annotation; 3. Generation of literal twin sentences to the metaphorical original ones and 4. evaluation
of the results.

Choice of the Model We conducted a comparative analysis of three leading language models:
Anthropic’s Claude 3.5 Sonnet, OpenAI’s GPT-4o, and Google’s Gemini. This evaluation aimed to assess
their capability in identifying image schemas with minimal context and instruction. We employed a
bulk, zero-shot approach, presenting each model with the simple prompt: “Which image schemas can
you identify in the following sentence: What did you have in mind?”. We prompted 5 different sentences
taken from the catalogue and annotated with different image schemas. This method allowed us to
gauge the models’ innate understanding and retrieval of image schema concepts without additional
training or context. We illustrate the performance of the models by discussing the example above, as
the other cases were similar. Full answers are available on the dedicated GitHub repository: https:
//github.com/StenDoipanni/ISD8.
   Gemini demonstrated the weakest performance, identifying only two schemas, one of which (Con-
tainer) is more accurately classified as a spatial primitive rather than a full image schema, while the
other (Possession) appeared to be a hallucination not typically included in standard image schema
listings. GPT-4o showed similar results in the context of this specific example (while, in other cases,
performed generally better than Gemini by providing more accurate results and providing credible
justifications). However, Claude 3.5 Sonnet emerged as the top performer in this task. Not only did it
correctly identify the highest number of plausible image schemas, but it also presented them in the most
conventional format, selecting often the correct naming and using all-caps notation, which is standard
in the field. For this reason we selected Claude 3.5 Sonnet as the primary model for our classification
and generation tasks.

Classification Task Our approach leveraged the Claude 3.5 Sonnet model via the Anthropic API to
annotate sentences with image schemas. We developed a Python script that processes sentences from
the IS Catalogue in the form of a CSV file, sending each sentence to the Claude model for analysis. We
kept all the original sentences included in the catalogue, including both English and German sentences.
The model was prompted to perform the annotation task in the following way: (i) annotate the sentence
with relevant image schemas from a predefined list, taken from IS Catalogue, and (ii) ordering them by
relevance. The prompt adopted a few shot technique, providing three examples, presented in pseudo-
json syntax as key-value pairs, annotated respectively with three, two, and four IS.
The script incorporates error handling and retry logic to manage potential API issues, with a maximum
of 5 retries and a 5-second delay between attempts. The results, including the original sentence and
image schema annotations were output in JSON format. To ensure reproducibility, the full prompt used
for the Claude model is available in our GitHub repository.

Generation Task Parallel to sentence classification the second part of the prompt was refined to
generate a literal transpositions of metaphorical expressions, asking for more literal sentence preserving
the same image schematic pattern. Again, the prompting technique adopted is a few shot, with three
examples in pseudo-json syntax, presented as key-value pairs. To provide an example: Metaphorical
Example: “Our agenda is packed with events.” Literal Sentence: “The bag is packed with clothes.”

Evaluation We performed a number of quantitative analyses to assess Claude’s ability to identify
and reproduce image-schematic patterns, including (i) accuracy ratings, (ii) IS distribution analysis,
(iii) co-occurrence and (iv) confusion matrices. Each analysis is diffusely described in the following
sections. The primary evaluation focuses on the accuracy of the proposed annotations. Specifically, we
assess (i) whether the image schema annotated in the catalogue matches the one identified by the LLM,
and (ii) the position of the correct schema within the LLM’s proposed ranking of preferences. Given
the metaphorical and complex nature of the sentences in the catalogue, we consider the presence of
the human-annotated IS among the profiles proposed by the LLM as a strong indicator of the model’s
performance. A thorough validation of Claude’s annotations would require analysing all IS profiles and
identifying potential errors, which is planned for future work.
    The accuracy rating provides a direct evaluation of the Classification Task but serves only as an
implicit and secondary measure for the Generation Task. A more direct assessment would require
manual validation of the new dataset by domain experts, which has only been started here and is
planned for future work.


4. Analysis and Discussion
In this section we provide some quantitative and qualitative analysis, as well as plausible interpretations
of the output of the analysis we conducted on the Image Schema Catalogue and its LLM enrichment,
and we provide explanation of charts and matrices shown in the followings.
   These visualizations contribute to a comprehensive
understanding of the relationship between human-                 Total Number of Entries             2559
generated and LLM-generated annotations, highlighting
                                                                 Correct Annotations                 2076
the strengths and limitations of the annotation model.
                                                                 Correct Annotations in Pos. 1 1258
                                                                 Correct Annotations in Pos. 2        489
Match Counts Analysis The most immediate and pri-                Correct Annotations in Pos. 3        259
mary analysis we conducted was a bulk comparison be-
                                                                 Correct Annotations in Pos. 4          64
tween the original annotations and those generated by
                                                                 Correct Annotations in Pos. 5           5
Claude 3.5.
                                                                 No Match                             483
   This aims to examine the congruence between the ini-
tial human-assigned labels and the primary predictions of
                                                               Table 1 Number of correct annotations
the LLM annotation system, assessing the model’s capac-
                                                               and their position in Claude’s relevance
ity to accurately replicate original annotations. Further-
                                                               order.
more, this allows us to quantify the occurrence rate of
original annotations within the ranked LLM predictions,
as shown in Table 1, and subsequently visualizing this
distribution via a bar chart representation, shown in Figure 1.
   The main goal of this work, as stated in the Introduction, is to enrich the IS Catalogue with more
than one annotation per sentence. For this reason, in the analysis, we consider an overall “Correct
Annotation” when the original annotation is present in the Image Schema Profile (the set of annotations)
provided by Claude. Overall, Claude correctly classifies 2075 sentences out of 2559, achieving 81% of
accuracy. In more than half of cases (1258 sentences), the original classification is also Claude’s top
suggestion in the order of preference. Table 1 summarises the number of correct annotations and their
position in the relevance orders provided by Claude.


Figure 1: Match Counts Plot for each Image Schema included in the catalogue.


   We also analysed the distribution of correct annotations across the different image schemas collected
into the catalogue. The results are summarised in Figure 1: the highest accuracy rates in terms of correct
annotation is obtained with the image schemas Link and Support, both reaching 100% of accuracy. In
contrast, the lower performance is reached with the image schema Center_Periphery, being correctly
identified only in the 22% of cases. Considering Claude’s preference orders, the best performances are
reached with Support and Link, whereas the lower performances are related to the Center_Periphery
and Object, the latter in particular being correctly identified in around 75% of cases, but only in 10%
of cases in the correct position.

Comparative IS Distribution Analysis The methodology involves calculating the frequency distri-
bution of each annotation type in the dataset. This comparative analysis offers insights into potential
shifts in frequency and prioritization of specific annotations between human-generated and LLM-
predicted labels. The bar charts in Figure 2 are employed to juxtapose the distribution of annotations
between the original human-assigned labels (top bar chart) and the LLM annotations (center and bottom
bar charts), identifying trends and discrepancies in annotation priorities between human-labeled and
machine-labeled data.
   In the case of the LLM annotations, we analysed here both the IS distribution considering the
first element annotated by Claude (center) and the distribution of IS considering all annotation by
Claude (bottom). As shown in Figure 2, the most used image schemas (both in the first position and
generally, although with inverted order) are Containment and Source_Path_Goal. This finding is
consistent with the original human annotation of the catalogue. Looking at the distribution of the
human annotations, aside from Containment and Source_Path_Goal, the most frequent annotations
are, in order, Object, Verticality, Force, and Center_Periphery.
   It is important to clarify that these 6 IS account for 91% of the entire catalogue. Notably, Force
and Verticality are also among the most common first annotations made by Claude, while Object
and Center_Periphery are much less frequently used in the first position, ranking 7th and last for
distribution, respectively. Considering the totality of Claude’s annotation instead (Figure2, bottom
chart), the distribution of Object gains the fourth position, showing it is frequently used by Claude,
but not as its first preference.
   The case of Center_Periphery
is different, as its distribution is
the lowest (excluding Claude’s         Figure 2: Top: Original Annotations; Center: First Element
hallucination) even in the case of               Annotations; Bottom: All Annotations
all Claude’s annotations. These
findings align with the accuracy
analysis, which highlights Claude’s
difficulty in recognising the Cen-
ter_Periphery image schema.
More insights and data interpre-
tations are provided ins Section
4.1.

Co-Occurrence Matrix Analyses
The co-occurrence matrices are im-
plemented to elucidate the interre-
lationships between annotation la-
bels that frequently manifest in tan-
dem. These matrices facilitates the
identification of patterns and cor-
relations within the LLM annota-
tion sets that might otherwise re-
main obscured in discrete analyses.
Each matrix cell quantifies the co-
occurrence frequency of annotation
pairs within a given set, revealing
potentially significant associations.
The procedural approach involves it-
erating through all LLM annotation
lists, enumerating co-occurrences
between annotation pairs, and pre-
senting the results in a heatmap for-
mat.
   We conducted two kinds of co-
occurrence analysis.       The first
(cf. Figure 3) analyses the co-
occurrences of Image Schemas for
all Claude’s annotations; the second
(cf. Figure 4) repeats the analysis
considering those cases for which
the original annotation is matching
the first element of the LLM IS pro-
file.
   As shown in Figure 3, the
most frequent co-occurrences
(over      500    instances)      are
Source_Path_Goal and Force,
followed by Source_Path_Goal
and Containment, Force and
Containment, Source_Path_Goal and Object and Containment and Object. The matrix also
accounts for Claude’s hallucinations. Figure 4 shows that the results for correct annotations align with
those observed across all annotations.


Figure 3: Co-occurrence matrix on all Claude’s annotations.


Confusion Matrix for Exact Match Accuracy The confusion matrix serves as a critical evaluation
tool for assessing the exact match accuracy between the original annotations and the first elements of
the LLM annotations. This matrix quantifies both misclassifications and correctly predicted annotations.
The rows represent the ground truth labels (original annotations), while the columns denote the
predicted labels (primary elements of LLM annotations). By constructing and analyzing this confusion
matrix, we evaluate the system’s discriminative capabilities across annotation categories.
   The Confusion matrix is shown in Figure 5: the most common misclassifications (>65 times) are
between Object and Containment (113 times), Source_Path_Goal and Force (110 times), Con-
tainment and Source_Path_Goal (81 times) and Source_Path_Goal and Center_Periphery (69
times).

4.1. Data Interpretation
Image Schemas classification Overall, Claude performed relatively well on the Image Schema
Catalogue, achieving an accuracy rate of 81%. Although Claude frequently identified the correct image
schema among those composing the sentence IS profile, only in less than 50% of cases this was selected
as the top most relevant choice.
   The analysis we conducted in terms of accuracy shows the LLM’s ability to identify the correct image
schema with respect to human annotation. At the same time, by asking Claude to extend the annotation
to more than one Image Schemas, its chances of guessing (‘shooting in the heap’) the correct image
Figure 4: Co-occurrence matrix on first Claude’s annotation.


Figure 5: Confusion Matrix for Exact Match Accuracy


schema increase, given the limited number of image schemas to choose from. A complete and qualitative
validation of Claude’s annotations for each entry would require (i) analysing all the IS profiles and (ii)
checking for possible misinterpretations, and this is a matter of future work. Some examples, collected
in Table 2, may however give some insight on the relevance and necessity of this work as well as the
quality of the model in identifying meaningful Image Schemas.

  ISC Sentence                               Original                 Claude’s Annotations
                                             Annotation
  Breaking social ties                       Link                     Splitting, Link, Force
  Put more force into your punches.          Force                    Force, Containment, Object
  There’s no way out, I have to do it.       Containment              Containment,
                                                                      Source_Path_Goal, Force

                               Table 2 Examples of Claude’s annotations

    The distribution of image schemas in the catalogue is uneven. For example, there are only 4 sentences
for Link but 602 for Containment. Some results from the analysis (e.g. cf. Figure 1) should therefore
be adjusted based on the frequency of each schema in the catalogue. For instance, Link and Support
were correctly identified 100% of the time, but they appeared only 4 and 9 times, respectively, which
distorts the sample to some extent. A similar case applies to Covering, which has only 6 entries in the
catalogue.
    More interesting cases are Object and Center_Periphery. As mentioned earlier, Object was
correctly identified in around 75% of cases, though only in 10% of cases was it in the correct position
(i.e., as the first choice). According to the frequency analysis, Object is also one of the most frequently
used image schemas, annotated by Claude 1,014 times, but selected as the first preference in only 6% of
cases. This pattern may suggest some uncertainty on Claude’s part in applying the schema, although it
could also reflect its broad role in conceptualising entities (see also below).
    Claude struggled the most with the Center_Periphery schema, correctly identifying it in only about
20% of cases. Considering the frequency analysis, it emerges that it is also the least-used Image Schema,
showing Claude’s difficulty in recognising its pattern. Despite this challenge in the classification task,
the performance in the generation task remained quite strong. A few examples follow.
Metaphorical Example: “She put the idea to the back of her mind.” Literal Sentence: “She placed the book at
the back of the shelf.”.
Metaphorical Example: “These colors aren’t quite the same, but they’re close.” Literal Sentence: “These
buildings aren’t quite adjacent, but they’re near each other.”
Metaphorical Example: “Stands to reason nobody’d dare to bomb us, because, we’d do the same to them.”
Literal Sentence: “The ball bounces back when it hits the wall.”

Co-occurrences and confusion matrices The relevance of the valuable insights provided especially
by the co-occurrence matrices and confusion matrix resides in the assumption that LLMs have ingested
an enormous amount of data, and are for this reason the biggest commonsense approximate knowledge
repositories we ever had. Some approaches [29] suggest that, although large language models are not
embodied per se, they have processed such an amount of textual material to manifest, in their inductive
reasoning - derived from statistical generalisation - a certain sparkle of embodiment.
   From Claude’s IS profile extraction several notable patterns emerge: Force and Source_Path_Goal
show the highest co-occurrence rates, suggesting a strong conceptual link between force dynamics and
directed motion or processes. Containment also frequently co-occurs with these schemas, indicating
that bounded spaces often interact with forces and paths. Object schema has significant overlap with
many others, which is unsurprising, given its fundamental nature in conceptualizing entities and the
observations above. In fact, many spatial primitives co-participating to the realization of image schemas,
can be conceived as instantiations of Object. Scale and Verticality show moderate co-occurrence,
reflecting how vertical orientation often relates to scalar concepts. Interestingly, hallucinated images
schemas like *Sound and *Possession have very low co-occurrence rates, suggesting they may represent
more specialized or isolated conceptual domains. The Substance schema shows notable co-occurrence
with Containment, hinting at the frequent conceptualization of substances within containers.
   It is also intriguing to note cases where expected co-occurrences are surprisingly low, despite the
intuitive connections between certain image schemas. A notable example is the relationship between
Support and Contact, which are instead prominent in knowledge representation’s formalization of
image schematic approaches to cognitive robotic applications [30, 31].
   In many concrete scenarios, usually adopting a naive physics and reduced complexity spatio-relational
dimensions representations—such as “an apple placed on a table”—these two schemas would typically
appear together: the apple is in Contact with the table, and the table Supports it. However, when
examining the co-occurrence matrix, the association between these two schemas is much lower than
anticipated (they only co-occur 11 times), perhaps also due to the metaphorical nature of many of the
phrases in the catalogue. Analogous is the case of Link and Source_Path_Goal as well as Link and
Contact. Arguably, when a Link is present, it can be often envisioned a Path between two objects or
situations (or vice versa) or at least some sort of connection that makes in Contact the two linked
entities. However, within the catalogue, this correlation between image schemas occurs relatively a few
times: only around a hundred times for Link and Source_Path_Goal and 52 for Link and Contact.
   Co-occurrence and confusion measures interact with each other. The image schemas that are most
frequently confused, according to the Confusion Matrix (Figure 5), also tend to co-occur frequently
and are among the most commonly used across all annotations. This is true for pairs like Object and
Containment, Source_Path_Goal and Force, and Containment and Source_Path_Goal, which
are often confused but also co-occur frequently. In many of these cases, the correct schema is likely
included in the annotations, just not in the first position.
   A different scenario arises with Center_Periphery, which is often confused with
Source_Path_Goal but co-occurs with it only in about one-third of cases. This suggests a
genuine confusion between the two schemas. As mentioned above, Center_Periphery proved to be
the most challenging schema for Claude. In addition to being confused with Source_Path_Goal, it
was also frequently confused with Link (with which it co-occurs in nearly 50% of cases) and, more
unexpectedly, with Contact (despite never co-occurring with it).


5. Conclusions and Future Work
In this paper, we have presented a novel approach to extending the Image Schema (IS) catalogue using
Claude 3.5 Sonnet large language model. Our methodology has demonstrated remarkable efficacy, with
the model successfully retrieving the original annotation in 81% of the annotated sentences, considering
the whole set of Image Schemas extracted in the profile by the LLM. This very promising retrieval rate
underscores the potential of leveraging advanced AI models in this kind of specific linguistic annotation
tasks.
   The enrichment of the IS catalogue resulting from this approach has been partially evaluated by
domain experts, who have found it to be both sound and plausible. This preliminary validation lends
credibility to our method and suggests that LLM-assisted extension of linguistic resources can produce
high-quality synthetic data that align with expert knowledge, although further assessment is still needed
to fully confirm this insight.
   Given the current limitations in expanding the Image Schema Catalogue manually, our approach
using large language models appears to be the most promising avenue for overcoming these constraints.
The ability to rapidly process and annotate large volumes of text while maintaining accuracy and
consistency offers a significant advantage over traditional methods.
   However, this work also opens up several paths for future research. The first and most needed
step would be a comprehensive expert evaluation: a full evaluation of the Extended IS Catalogue by
domain experts is necessary to further validate and refine our approach. This will ensure the robustness
and reliability of the expanded catalogue across diverse linguistic contexts. This step is needed both
in relation to the IS Profiles annotated by Claude and for the newly generated sentences. Secondly,
topicalisation of IS: future works should focus on developing methods for annotating the specific textual
chunks that evoke particular Image Schemas within sentences. This finer-grained analysis will provide
deeper insights into how IS are linguistically realised. Finally, we envision a multimodal extension:
with the advent of powerful multimodal models, there is potential to extend our approach beyond
text. Incorporating visual and possibly auditory information could lead to a more comprehensive
understanding of Image Schemas across different modalities of human conceptualization and cognition.


Acknowledgment
This work was supported by the Future Artificial Intelligence Research (FAIR) project, code PE00000013
CUP 53C22003630006.


References
 [1] J. M. Mandler, C. Pagán Cánovas, On defining image schemas, Language and Cognition (2014)
     1–23. doi:10.1017/langcog.2014.14.
 [2] L. Talmy, The fundamental system of spatial schemas in language, in: B. Hampe, J. E. Grady
     (Eds.), From perception to meaning: Image schemas in cognitive linguistics, volume 29 of Cognitive
     Linguistics Research, Walter de Gruyter, 2005, pp. 199–234.
 [3] M. Johnson, The Body in the Mind: The Bodily Basis of Meaning, Imagination, and Reason, The
     University of Chicago Press, Chicago and London, 1987.
 [4] B. Hampe, Image schemas in cognitive linguistics: Introduction, From perception to meaning:
     Image schemas in cognitive linguistics 29 (2005) 1–14.
 [5] M. M. Hedblom, O. Kutz, F. Neuhaus, Choosing the right path: image schema theory as a foundation
     for concept invention, Journal of Artificial General Intelligence 6 (2015) 21–54.
 [6] J. Hurtienne, J. H. Israel, Image schemas and their metaphorical extensions: intuitive patterns for
     tangible interaction, in: Proceedings of the 1st international conference on Tangible and embedded
     interaction, 2007, pp. 127–134.
 [7] J. Hurtienne, S. Huber, C. Baur, Supporting user interface design with image schemas: The iscat
     database as a research tool., in: ISD, 2022.
 [8] G. Lakoff, M. Johnson, Metaphors we live by, University of Chicago press, 1980.
 [9] G. Lakoff, M. Johnson, et al., Philosophy in the flesh: The embodied mind and its challenge to
     western thought, volume 640, Basic books New York, 1999.
[10] R. W. Langacker, Foundations of cognitive grammar: Theoretical prerequisites, volume 1, Stanford
     university press, 1987.
[11] R. W. Langacker, Cognitive grammar, Basic Readings 29 (2008).
[12] E. Dodge, G. Lakoff, Image schemas: From linguistic analysis to neural grounding, From perception
     to meaning: Image schemas in cognitive linguistics (2005) 57–91.
[13] B. Bennett, C. Cialone, Corpus guided sense cluster analysis: a methodology for ontology develop-
     ment (with examples from the spatial domain)., in: FOIS, 2014, pp. 213–226.
[14] A. Cienki, Image schemas and gesture, From perception to meaning: Image schemas in cognitive
     linguistics 29 (2005) 421–442.
[15] M. M. Hedblom, O. Kutz, T. Mossakowski, F. Neuhaus, Between contact and support: Introducing
     a logic for image schemas and directed movement, in: Conference of the Italian Association for
     Artificial Intelligence, Springer, 2017, pp. 256–268.
[16] G. Righetti, D. Porello, N. Troquard, O. Kutz, M. M. Hedblom, P. Galliani, Asymmetric hybrids:
     Dialogues for computational concept combination, in: Formal Ontology in Information Systems,
     IOS Press, 2021, pp. 81–96.
[17] G. Righetti, O. Kutz, The moving apple: An image-schematic investigation into the leuven concept
     database, in: Proceedings of The Seventh Image Schema Day co-located with The 20th International
     Conference on Principles of Knowledge Representation and Reasoning (KR 2023), Rhodes, Greece,
     September 2nd, 2023, CEUR-WS, 2023.
[18] S. De Giorgis, A. Gangemi, D. Gromann, Imageschemanet: Formalizing embodied commonsense
     knowledge providing an image-schematic layer to framester, Semantic Web Journal forthcoming
     (2022).
[19] M. Hedblom, O. Kutz, R. Penaloza, G. Guizzardi, et al., What’s cracking? how image schema
     combinations can model conceptualisations of events, in: CEUR WORKSHOP PROCEEDINGS,
     volume 2347, CEUR-WS, 2019.
[20] M. M. Hedblom, F. Neuhaus, T. Mossakowski, The diagrammatic image schema language (disl),
     Spatial Cognition & Computation (2024) 1–38.
[21] A. Papafragou, C. Massey, L. Gleitman, When English proposes what Greek presupposes: The
     cross-linguistic encoding of motion events, Cognition 98 (2006) B75–B87.
[22] J. A. Prieto Velasco, M. Tercedor Sánchez, The embodied nature of medical concepts: image schemas
     and language for pain., Cognitive processing (2014). doi:10.1007/s10339-013-0594-9.
[23] D. Gromann, M. M. Hedblom, Body-mind-language: Multilingual knowledge extraction based on
     embodied cognition, in: AIC, 2017, pp. 20–33.
[24] D. Gromann, M. M. Hedblom, Kinesthetic mind reader: A method to identify image schemas in
     natural language, in: Proceedings of Advancements in Cogntivie Systems, 2017.
[25] L. Wachowiak, D. Gromann, Systematic analysis of image schemas in natural language through
     explainable multilingual neural language processing, in: Proceedings of the 29th International
     Conference on Computational Linguistics, 2022, pp. 5571–5581.
[26] S. De Giorgis, A. Gangemi, Introducing odin: Ontological design grounded in image-schematic
     knowledge., in: WOP@ ISWC, 2022.
[27] M. M. Hedblom, O. Kutz, R. Peñaloza, G. Guizzardi, Image schema combinations and complex
     events, KI-Künstliche Intelligenz 33 (2019) 279–291.
[28] T. Oakley, Image schemas, The Oxford handbook of cognitive linguistics (2007) 214–235.
[29] S. Nolfi, On the unexpected abilities of large language models, Adaptive Behavior (2023)
     10597123241256754.
[30] M. Pomarlan, S. De Giorgis, M. M. Hedblom, M. Diab, N. Tsiogkas, Thinking in front of the
     box: Towards intelligent robotic action selection for navigation in complex environments using
     image-schematic reasoning., in: JOWO, 2022.
[31] M. Pomarlan, S. De Giorgis, R. Ringe, M. M. Hedblom, N. Tsiogkas, Hanging around: Cognitive
     inspired reasoning for reactive robotics, in: Formal Ontologies for Information Systems (FOIS)
     2024, 2024.

</pre>