<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>The Seventh Image Schema Day, September</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Does Stable Difusion Dream of Electric Sheep?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Simone Melzi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rafael Peñaloza</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandro Raganato</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Milano-Bicocca</institution>
          ,
          <addr-line>Milano</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>2</volume>
      <issue>2023</issue>
      <fpage>0000</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>Stable Difusion is a text-to-image generation model that is based on latent difusion. It works by first translating the textual prompt into a multidimensional latent space, which can be seen as an internal representation of a conceptual space. For other kinds of generative models, it has been argued that relationships between concepts can be deduced from the geometrical properties of the latent space. In this paper we explore this claim for a pre-trained Stable Difusion model. In particular, we verify its capabilities to produce images that blend two concepts without any fine-tuning.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;conceptual blending</kwd>
        <kwd>conceptual spaces</kwd>
        <kwd>stable difusion</kwd>
        <kwd>generative models</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>“frog lizard”
prompt</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>Conceptual blending refers to the cognitive task of combining the properties of two diferent
concepts to produce a new distinguished concept. A successful conceptual blend is predicated
on the existence of a (potentially internal) conceptual representation where the relevant features
of each concept can be identified and manipulated.</p>
      <p>
        In recent years, generative methods have gained attention within the AI community. In a
nutshell, generative models try to generate an output that is pertinent to a given input. For
example, text-to-image models receive a textual prompt and produce (generate) an image
which visually represents the prompt. Although they difer greatly in their architecture and
implementation details, most modern generative models share the same high level structure. A
given input (e.g. the textual prompt) is first encoded as a highly dimensional vector, which is
later decoded as the desired output medium (e.g. the image). The space where all the vectors
reside—known as the latent space—can be thought of as a kind of conceptual space [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] where all
the information about the concepts and their relationship is encoded. Assuming the manifold
hypothesis [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], this representation should be able to encode all relevant concepts. In particular,
each conceptual primitive has a point or region in this space. This architecture can be seen as a
metaphor for cognition, where an individual e.g. reads a piece of text, producing an internal
mental representation which can then be externalised in diferent manners like an image.
      </p>
      <p>
        Almost since their conception, text generative models like BERT [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], GPT [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], and earlier
incarnations have been analysed for their capacity to solve some cognitive tasks. A task that
has gained interest for text generative models is that of analogical reasoning [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The question
of analogical reasoning has been studied by analysing the latent space directly [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], or using the
model as a black-box generator [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ]. In the former case, it has been argued that analogical
reasoning has a correspondence with vector operations in the latent space. This means that the
latent space assigns a point or a region to each primitive concept, and image schemas arise from
geometric operations in this space. In other words, these results suggest that one can produce
analogies simply by navigating the latent space. A natural question that arises is whether other
kinds of cognitive tasks exhibit similar geometric properties; and if so, which.
      </p>
      <p>Despite some clear similarities between the tasks from a cognitive point of view—both require
a representation and extraction of the relevant features from a class of concepts—conceptual
blending has not received the same kind of attention as analogical reasoning. This can be
explained in part by the fact that conceptual blends are not easy to verify textually; that is, it is
dificult to design an experiment for controlling wether a text-generative model is producing
conceptual blends or not. Indeed, the externalisation of a conceptual blend is of a more graphical
nature, at least for concrete concepts. An image representing the blend of two concepts will
visually showcase the features of each of the original concepts.</p>
      <p>
        Our goal in this paper is to verify whether conceptual blending corresponds also to vector
operations on the latent space. That is, we want to see whether it is possible to produce
obtain the blend of two concepts, simply by moving through the latent space, without any
additional input to the model. This analysis will provide some insights into the properties of the
(usually opaque) intermediate representation space, and its potential similarities to a cognitive
conceptual space. To achieve our goal, we use a text-to-image generative model (specifically
Stable Difusion) to produce images that blend two diferent concepts through an interpolation
of the two separate encodings; that is, a point in the line segment connecting them over the
latent space. Our work difers from previous suggestions like [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ] in that we do not generate
diferent prompts to describe the blended concept, but rather manipulate directly the latent
space. As we argue, this is akin to analysing and manipulating the encoding of conceptual
primitives in the internal space of the model. For comparison purposes, we also generate an
image using a natural textual prompt for the conceptual blend.
      </p>
      <p>As a first empirical study, we consider eight diferent concept pairs, selected with diferent
criteria in mind. In particular, half of them are decompositions of common compound words,
while the other half refers to novel notions which we do not expect to observe in the training set.
Although further analysis is needed, our results suggest that both, the interpolation method and
a direct simple prompt, provide simple and cheap ways to obtain images of blended concepts.
An analysis of the resulting outputs also provides some further insights on bias, ambiguity, and
abstraction in the generative model.</p>
      <p>Importantly, we do not blend images, but rather attempt to produce an image representing
the blend of two concepts (given in a textual prompt). By our approach, we try to answer
whether concept blends (and by extension, image schemas) arise naturally in models not
trained to produce them. This study can shed light on the behaviour of abstract image schema
representations. By the same token, we are not interested in prompt engineering for finding the
best prompts to produce an adequate conceptual blend.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Stable Difusion</title>
      <p>
        We briefly introduce the main components of difusion models and Stable Difusion which are
relevant for understanding this work. A full-fledged description is beyond the scope of this
work; we refer the interested reader the main source material [
        <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
        ].
      </p>
      <p>
        From an abstract point of view, difusion models belong to a class of encoder/decoder
architectures which use two separate neural models. The first model (the encoder) translates the
input into a point in the highly dimensional space R (typically with a large ), also known as
the latent space. The decoder model, on the other hand, transforms each point of the latent space
into an element of the output space. Hence, for text-to-image systems like Stable Difusion and
Dall-E [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], the encoder takes as input sentences (or more in general strings) and the decoder
generates an image from each latent space element.
      </p>
      <p>Diferent approaches are distinguished by the ways they manipulate the various elements
throughout the two translation steps. In particular, some modern architectures which handle
text as input use a so-called attention mechanism, which takes advantage of the context provided
by the whole sentence to disambiguate and better characterise the purpose of each word in the
input during the encoding phase. What characterises difusion models in general is that they
are trained to remove the noise from a randomly generated base until the output (in our case, a
picture) is obtained. Since the starting noise is randomly generated, one single (unchanged)
prompt—and in particular, one single point from the latent space—may yield many diferent
outputs. It is worth noting that this latter feature makes it dificult to systematically evaluate
the behaviour of difusion models.</p>
      <p>
        One can view the latent space—that is, the intermediate, internal representation that links
the input text to the output picture—of such an architecture as a conceptual space in which all
the possible concepts recognisable by the model are encoded. Indeed, the point in the latent
space that connects the input (sentence) with the output (picture) is often considered an abstract
representation of the concept they refer to; see Figure 2. Intuitively, every point in the latent
space represents a potentially complex concept, and nearby points are expected to represent
similar notions. Thus, it is expected to satisfy the general properties of conceptual spaces à la
Gärdenfors [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>The idea of translating textual concepts to a latent space is not exclusive of difusion models.</p>
      <p>Prompt:
“walking on water wetplate”</p>
      <p>
        It has been successfully used in natural language processing from the development of
embeddings [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Already from early vector space representations of text, it was argued that the
geometric properties of the latent space allow for operation-based reasoning. Mikolov et al.
argue that spatial ofsets can be understood as abstract relationships between concepts [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. This
insight provided the basis for performing analogical reasoning based on vector operations.
      </p>
    </sec>
    <sec id="sec-4">
      <title>3. Conceptual Blending</title>
      <p>
        Conceptual blending [
        <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
        ] is a reasoning task in which diferent concepts are combined (or
blended) to form a new concept which keeps the defining characteristics of its parts. As argued
in the work first introducing it [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], blending belongs to the same class of cognitive operations
as analogy, and mental modelling, among many others. It can be seen as a task of invention
from knowledge: starting from two distinct concepts, produce one that is suficiently distinct to
be considered a new concept, but whose properties can be traced to its composing concepts.
      </p>
      <p>
        At the moment, there is no consensus on how conceptual blending, as a cognitive task,
actually works, but there is no question about the capacity of humans in performing it. It is
commonly observed in comic book characters [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] or more in general in fantasy, but also as a
metaphorical means to elicit certain mental images—like in the expression “sausage dog” for
describing a dachshund. Linguistically, at least in English, concept blends can be produced by
applying one concept as a modifier of the other. Hence a “spider man” is a man which has some
characteristic borrowed from a spider. From this we can readily see that conceptual blending is
not symmetric: your friendly neighbourhood spider man is not the same as the terrifying man
spider. Our goal here is to verify whether concept blending capabilities can arise from the latent
space of an architecture as the ones from the previous section.
      </p>
      <p>It is not straightforward to analyse the availability of concept blending through text; the
result tends more to be a (mental) image, an not easily verbalisable. One possible approach for
showcasing blending capabilities is to use a text-to-image system. Specifically, given one such
system—we focus on Stable Difusion in this work—we want to verify whether it can produce
imagery that depicts the blend of two concepts. We argue that it is easier to (subjectively)
observe whether an image depicts a blended concept, than to do so for a long textual description.
Importantly, we consider Stable Difusion as is, with no fine-tuning or further training, to

“snake horse”
“horse”

analyse its intrinsic capabilities.</p>
      <p>For comparison purposes, we consider two approaches to concept blending in Stable Difusion.
The first approach takes advantage of the linguistic capabilities of the text-to-latent space
encoder, and provides the prompt (henceforth called the blended prompt) which describes the
blend. Hence, for instance, to produce a blend of a snake and a horse, we introduce the blended
prompt “snake horse.” If the latent space of Stable Difusion has geometric properties akin to
those observed in language models, then it should be possible to blend two concepts through
shifts in the latent space itself. Intuitively, each concept should be represented by a region in the
latent space. When the encoder maps one prompt to a point 1 in the latent space, all points that
are close to 1 should represent similar notions, which become more distinct as the distance
increases. Thus, as we move from 1 to the encoding 2 of a second prompt, the concept shifts
from the first to the second prompt. This motivates the second approach.</p>
      <p>Our second method is based on interpolation in the latent space: if 1 and 2 are the latent
space encodings of concepts 1 and 2, then all the points in the line connecting 1 and 2
represent blends of 1 and 2 giving more or less weight to each of the original concepts. For
the scope of this paper we consider the point exactly midway between the two encodings and,
to go beyond the symmetrical blending, also the points at 14 and 34 of the way. Intuitively, if the
prompts are “snake” and “horse,” the three interpolation points should construct a “snake horse”
(25% snake, 75% horse), a “horse snake,” and a mix half horse and half-snake. A comparison of
both approaches is depicted in Figure 3. Note that by the nature of the encoder, the latent space
point corresponding to the blended prompt may be quite far away from the two individual
prompts being blended. We emphasise once again that we are not interested in blending two
images, but rather in producing a picture representation of the blend of two concepts. The limit
images (in the example “snake” and “horse”) are only provided as reference points. Our input is
not these images, but rather the textual prompts, which we use to analyse the capabilities of the
latent space to produce conceptual blends.</p>
    </sec>
    <sec id="sec-5">
      <title>4. Imagine an Elephant Duck</title>
      <p>In order to test the capacity of Stable Difusion to produce blends through the two approaches
described before, we constructed the blends for eight diferent pairs of concepts. The choice of
the concepts used responds to several needs. First of all, to avoid artefacts caused by complex
prompting, each concept should be describable with a single word. Second, all concepts must
be concrete, with intuitive graphical representations. Third, the blended prompt must conjure a
plausible (if not necessarily existing) conceptual image.</p>
      <p>
        In addition to these constraints, we wanted to verify that Stable Difusion had not previously
“learned” the blended concept, and see how it behaves when it has. Thus, four pairs of words were
constructed separating common compound words into their components, and the remaining
four were chosen from previous blending attempts [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] and popular culture aiming for seldom
(if at all) represented imagery. The eight pairs are shown in Table 1. Note that the blended
prompts for compound words use their elements as distinct entities, and hence we ask for e.g. a
“jelly fish” rather than a “jellyfish.” An adequate blend should produce a ifsh with the properties
(or made) of jelly, rather than the well known medusozoön.
      </p>
      <p>
        We generate all images using Stable Difusion 2-1 [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].1 For each of the eight pairs, we
generate six images: one corresponding to the blended prompt, and five corresponding to the
interpolation from the first individual prompt (e.g. “snake”) to the second prompt (“horse” in
the example) at 25% intervals as in Figure 3. Recall that in difusion models, the decoding phase
going from the latent space to the generated image depends on a random noise initialisation.
To avoid diferences caused by the random noise generator, we fix a common seed for all six
images. In this way, changes are attributable to difusion (i.e., the decoding step) and not to the
random initialization. The risk of fixing a seed is that the quality of the resulting blend may
depend on the seed choice. We deal with this issue generating 10 diferent sets of images for
each pair, each one with a diferent random seed. Thus, overall we produce 60 images for each
concept pair, for a total of 480 pictures in the experiment. All figures are publicly accessible at
https://git-ricerca.unimib.it/rafael.penalozanyssen/isd7-images/.
      </p>
      <sec id="sec-5-1">
        <title>4.1. Novel Blends</title>
        <p>Figure 4 shows the results for the blended prompt “elephant duck.” As it can be seen, the quality
of the resulting blends is highly variable. Considering the exact mid-point in the interpolation
(third column) it is possible to observe cases which can be thought of as “blends.” In particular,
1https://huggingface.co/stabilityai/stable-difusion-2-1
(1,0)</p>
        <p>Latent space interpolation (duck, elephant)
(0.75,0.25) (0.5,0.5) (0.25,0.75)
(0,1)</p>
        <p>Blended prompt
“elephant duck”
1
2
3
4
5
6
7
8
9
10
the rows 3, 7, and 10 showcase animals combining the properties of elephants and ducks. The
other partial interpolations (second and fourth columns) on the other hand mainly represent the
limit concept (a duck, or an elephant, respectively) except for rows 7 and 10, where one could
plausibly interpret the image as an elephant duck. On the other hand, the blended prompt (last
column) generates several positive instances, but has the issue that “elephant” dominates over
“duck” (despite duck being the main noun in the prompt). Nonetheless, the results do showcase
blended concepts.</p>
        <p>Not all pairs show this kind of behaviour. The blended prompt for “bumblebee lion” produces
a clearly distinguishable lion; typically a lion face. In general, also the middle interpolation
produces lions, although in this case a few more bumblebee features become visible—albeit,
when explicitly searching for them. Regardless, the concept lion seems to be dominant w.r.t.
bumblebee. This gives us some insights about the relative weight of primitive concepts within
the latent space.</p>
      </sec>
      <sec id="sec-5-2">
        <title>4.2. Compound Words</title>
        <p>Unsurprisingly, the blended prompt for pairs derived from compound words produce images of
the compound word (despite the space separating the components) rather than the expected
blend. This behaviour showcases the presence of bias in the training data, where the blended
prompt word is likely considered a misspelling of the original compound word. An extreme
example is given by the pair “spider” “man,” where the blended prompt invariably produces
the well-known comic-book superhero dressed in red and blue. Extreme bias is apparent from
knowledge that neither spiders nor men are associated to the bright coloring of Spider-man,
which only exists due to printing limitations and artistic whims.</p>
        <p>According to the geometric interpretation of the latent space, the interpolation-based
approach should bypass this bias, and produce a higher variability in the blends. In general, the
interpolation approach did not produce any clearly recognisable blends, with the only exception
being some conceptual fish imagery which may be interpreted as being made of jelly. We
speculate that this could be partially caused by the choice of relatively abstract or ambiguous
concepts in the pairs. This can be seen by analysing the pictures generated for the concept
“bow” which include a knot, the action of bowing, but also nature scenery.</p>
        <p>We found a surprising result through interpolation, though. Figure 5 depicts one full
interpolation chain from “butter” (left) to “fly” (right). The interpolation makes a natural transition
from a depiction of butter, to a depiction of a fly while, interestingly, producing a depiction of a
butterfly in the process. We believe that it is worth investigating this behaviour further.</p>
      </sec>
      <sec id="sec-5-3">
        <title>4.3. Further Insights</title>
        <p>A compulsory analysis of the generated images showcases bias in unexpected places. We have
mentioned the not-so-amazing spider man results. More surprising are the results regarding the
prompt “bumblebee.” Where one may expect a chubby stripped insect, Stable Difusion returns,
in 10 out of 10 calls, the modular self-configuring autobot from the Transformers franchise.
Figure 6 shows six of those ten results; the remaining four fall into the same class.</p>
        <p>From a diferent perspective, we find that Stable Difusion struggles with concepts which are
too general or abstract. One example was the notion of bow. Another one is the concept “man”
for which the model produces everything going from a crowd, to a street, to a city, to a grid of
houses (along two pictures classifiable as men). In hindsight, we should have expected these
results, as “man” can have many diferent interpretations depending on the context.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Conclusions</title>
      <p>We studied the capacity of Stable Difusion to generate blended concepts through two techniques:
one by making a direct prompt, and the other by manipulating the points in the latent space.
Our work uses a pre-trained Stable Difusion model and requires no additional training or
ifne-tuning. The results are promising, although they still leave some space for improvement.
In particular, the output evaluation relies on a subjective observation of the generated images.
From a conceptual space point of view, our results provide evidence that the latent space is
capable of encoding primitive concepts, and manipulations in this space provide the basis for
more complex image schemas.</p>
      <p>
        The blended prompt approach is limited by the impact of bias from the training corpus. This
is particularly relevant given the simplicity of the prompts that we chose, where each concept is
described by a single word. More complex and detailed prompts could alleviate this issue. This
idea was explored in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], where a large language model (LLM) is used to generate prompts. We
chose not to follow this path because first, it becomes dificult to separate the capacity of the
LLM to write a good prompt from the capacity of Stable Difusion to generate a good blend; and
second, we are more interested in the properties of the latent space, and the blended prompt
was chosen for comparison purposes.
      </p>
      <p>For the interpolation method, we used three intermediate points to generate the images. The
results suggest that moving only one quarter of the way from a latent space point to another
does not produce noticeable changes in general. It may be interesting to analyse how the
behaviour changes at diferent interpolation distances. From our observations, the latent space
seems to partially encode conceptual blends without any special training. It would be interesting
to analyse the possibilities of using blends during training to improve the latent representation.
Another avenue for future research is to analyse diferent geometric or space manipulation
methods for conceptual blending beyond the two presented here. To emphasise, the Stable
Difusion generator model has the scope of understanding graphically the abstract concept that
is encoded at diferent points of the latent space.</p>
      <p>
        Our view on conceptual blending does not follow the classical view by Fauconnier and
Turner [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], where diferent spaces are constructed. The reason is that our goals difer. While
Fauconnier and Turner try to explain how are conceptual blends constructed (and how they may
be generated automatically), we are only exploring whether blended concepts are represented
in the latent space at a position discoverable through simple geometric manipulations.
      </p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work was partially supported by the MUR for the Department of Excellence DISCo at the
University of Milano-Bicocca and under the PRIN project PINPOINT Prot. 2020FNEB27, CUP
H45E21000210001; and by the NVIDIA Corporation with the RTX A5000 GPUs granted through
the Academic Hardware Grant Program to the University of Milano-Bicocca for the project
“Learned representations for implicit binary operations on real-world 2D-3D data.” The authors
also wish to acknowledge CSC–IT Center for Science, Finland, for computational resources
provided.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Gärdenfors</surname>
          </string-name>
          ,
          <article-title>Conceptual spaces - the geometry of thought</article-title>
          , MIT Press,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>B. C. A.</given-names>
            <surname>Brown</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Caterini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. L.</given-names>
            <surname>Ross</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Cresswell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Loaiza-Ganem</surname>
          </string-name>
          ,
          <article-title>The union of manifolds hypothesis and its implications for deep generative modelling</article-title>
          ,
          <source>CoRR abs/2207</source>
          .02862 (
          <year>2022</year>
          ). doi:
          <volume>10</volume>
          .48550/arXiv.2207.02862.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          ,
          <article-title>BERT: pre-training of deep bidirectional transformers for language understanding</article-title>
          , CoRR abs/
          <year>1810</year>
          .04805 (
          <year>2018</year>
          ). URL: http://arxiv. org/abs/
          <year>1810</year>
          .04805.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Narasiamhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Salimans</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Sutskever</surname>
          </string-name>
          ,
          <article-title>Improving language understanding with unsupervised learning (</article-title>
          <year>2018</year>
          ). URL: https://openai.com/research/ language-unsupervised.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Mitchell</surname>
          </string-name>
          ,
          <article-title>Abstraction and analogy-making in artificial intelligence</article-title>
          , Ann. of the New York Ac.
          <source>of Sciences</source>
          <volume>1505</volume>
          (
          <year>2021</year>
          )
          <fpage>79</fpage>
          -
          <lpage>101</lpage>
          . doi:https://doi.org/10.1111/nyas.14619.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          , W.-t. Yih, G. Zweig,
          <article-title>Linguistic regularities in continuous space word representations</article-title>
          ,
          <source>in: Proc. of the 2013 Conf. of the North American ACL: Human Language Technologies</source>
          ,
          <string-name>
            <surname>ACL</surname>
          </string-name>
          ,
          <year>2013</year>
          , pp.
          <fpage>746</fpage>
          -
          <lpage>751</lpage>
          . URL: https://aclanthology.org/N13-1090.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>T.</given-names>
            <surname>Webb</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. J.</given-names>
            <surname>Holyoak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <source>Emergent analogical reasoning in large language models</source>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2212</volume>
          .
          <fpage>09196</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ushio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. Espinosa</given-names>
            <surname>Anke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schockaert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Camacho-Collados</surname>
          </string-name>
          ,
          <article-title>BERT is to NLP what AlexNet is to CV: Can pre-trained language models identify analogies?</article-title>
          ,
          <source>in: Proc. of the 59th Annual Meeting of the ACL (Volume</source>
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <source>ACL</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>3609</fpage>
          -
          <lpage>3624</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2021</year>
          .
          <article-title>acl-long</article-title>
          .
          <volume>280</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Parikh</surname>
          </string-name>
          ,
          <article-title>Visual conceptual blending with large-scale language and vision models</article-title>
          ,
          <source>in: Proc. of the 12th Intern. Conf. on Computational Creativity</source>
          ,
          <string-name>
            <surname>ACC</surname>
          </string-name>
          ,
          <year>2021</year>
          , pp.
          <fpage>6</fpage>
          -
          <lpage>10</lpage>
          . URL: https://computationalcreativity.net/iccc21/wp-content/uploads/2021/09/ICCC_2021_ paper_90.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Petridis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Kwon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ma</surname>
          </string-name>
          , L. B.
          <string-name>
            <surname>Chilton</surname>
          </string-name>
          , Popblends:
          <article-title>Strategies for conceptual blending with large language models</article-title>
          ,
          <source>in: Proc. of 2023 CHI Conference on Human Factors in Computing Systems, CHI '23</source>
          ,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          ,
          <year>2023</year>
          . doi:
          <volume>10</volume>
          .1145/3544548.3580948.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>R.</given-names>
            <surname>Rombach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Blattmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lorenz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Esser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ommer</surname>
          </string-name>
          ,
          <article-title>High-resolution image synthesis with latent difusion models</article-title>
          ,
          <source>in: IEEE/CVF Conf. on Computer Vision</source>
          and Pattern Recognition,
          <string-name>
            <surname>CVPR</surname>
          </string-name>
          <year>2022</year>
          , IEEE,
          <year>2022</year>
          , pp.
          <fpage>10674</fpage>
          -
          <lpage>10685</lpage>
          . doi:
          <volume>10</volume>
          .1109/CVPR52688.
          <year>2022</year>
          .
          <volume>01042</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Sohl-Dickstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. A.</given-names>
            <surname>Weiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Maheswaranathan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ganguli</surname>
          </string-name>
          ,
          <article-title>Deep unsupervised learning using nonequilibrium thermodynamics</article-title>
          ,
          <source>in: Proc. of the 32nd Intern. Conf. on ML, ICML 2015</source>
          , volume
          <volume>37</volume>
          <source>of JMLR Workshop and Conf. Proceedings, JMLR.org</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>2256</fpage>
          -
          <lpage>2265</lpage>
          . URL: http://proceedings.mlr.press/v37/sohl-dickstein15.
          <fpage>html</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ramesh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pavlov</surname>
          </string-name>
          , G. Goh,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Voss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Sutskever</surname>
          </string-name>
          ,
          <article-title>Zero-shot text-to-image generation</article-title>
          ,
          <source>in: Proc. of the 38th Intern. Conf. on Machine Learning, ICML 2021</source>
          , volume
          <volume>139</volume>
          <source>of Proceedings of Machine Learning Research, PMLR</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>8821</fpage>
          -
          <lpage>8831</lpage>
          . URL: http://proceedings.mlr.press/v139/ramesh21a.html.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          , I. Sutskever,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. S.</given-names>
            <surname>Corrado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          ,
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          ,
          <source>in: Proc. of the 27th Annual Conf. on Neural Information Processing Systems</source>
          ,
          <year>2013</year>
          , pp.
          <fpage>3111</fpage>
          -
          <lpage>3119</lpage>
          . URL: https://proceedings.neurips. cc/paper/2013/hash/9aa42b31882ec039965f3c4923ce901b-Abstract.html.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>G.</given-names>
            <surname>Fauconnier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Turner</surname>
          </string-name>
          ,
          <article-title>Conceptual integration networks</article-title>
          ,
          <source>Cognitive Science 22</source>
          (
          <year>1998</year>
          )
          <fpage>133</fpage>
          -
          <lpage>187</lpage>
          . doi:
          <volume>10</volume>
          .1207/s15516709cog2202_
          <fpage>1</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>L. D.</given-names>
            <surname>Ritchie</surname>
          </string-name>
          ,
          <article-title>Lost in “conceptual space": Metaphors of conceptual integration</article-title>
          ,
          <source>Metaphor and Symbol</source>
          <volume>19</volume>
          (
          <year>2004</year>
          )
          <fpage>31</fpage>
          -
          <lpage>50</lpage>
          . doi:
          <volume>10</volume>
          .1207/S15327868MS1901_
          <fpage>2</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>G.</given-names>
            <surname>Guizzardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Peñaloza</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. Hedblom</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Kutz</surname>
          </string-name>
          ,
          <article-title>Under the super-suit: What superheroes can reveal about inherited properties in conceptual blending</article-title>
          ,
          <source>in: Proc. of the 9th Intern. Conf. on Computational Creativity</source>
          ,
          <string-name>
            <surname>ACC</surname>
          </string-name>
          ,
          <year>2018</year>
          , pp.
          <fpage>216</fpage>
          -
          <lpage>223</lpage>
          . URL: http://computationalcreativity.net/iccc2018/sites/default/files/papers/ICCC_2018_ paper_56.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>P.</given-names>
            <surname>Martins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Urbancic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pollak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Lavrac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cardoso</surname>
          </string-name>
          ,
          <article-title>The good, the bad, and the AHA! blends</article-title>
          , in
          <source>: Proc. of the 6th Intern. Conf. on Computational Creativity</source>
          ,
          <article-title>ICCC 2015, computationalcreativity</article-title>
          .net,
          <year>2015</year>
          , pp.
          <fpage>166</fpage>
          -
          <lpage>173</lpage>
          . URL: http://computationalcreativity.net/ iccc2015/proceedings/7_3Martins.pdf.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>