<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Crowdsourcing Image Schemas</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dagmar GROMANN</string-name>
          <email>dagmar.gromann@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jamie C. MACBETH</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Smith College</institution>
          ,
          <addr-line>Northampton, Massachusetts</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>TU Dresden, International Center for Computational Logic</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>With their potential to map experiental structures from the sensorimotor to the abstract cognitive realm, image schemas are believed to provide an embodied grounding to our cognitive conceptual system, including natural language. Few empirical studies have evaluated humans' intuitive understanding of image schemas or the coherence of image-schematic annotations of natural language. In this paper we present the results of a human-subjects study in which 100 participants annotate 12 simple English sentences with one or more image schemas. We find that human subjects recruited from a crowdsourcing platform can understand image schema descriptions and use them to perform annotations of texts, but also that in many cases multiple image schema annotations apply to the same simple sentence, a phenomenon we call image schema collocations. This study carries implications both for methodologies of future studies of image schemas, and for the inexpensive and efficient creation of large text corpora with image schema annotations.</p>
      </abstract>
      <kwd-group>
        <kwd />
        <kwd>Crowdsourcing</kwd>
        <kwd>natural language annotation</kwd>
        <kwd>image schema</kwd>
        <kwd>natural language understanding</kwd>
        <kwd>cognitive linguistics</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Image schemas offer one possible explanation for the transition from perception to
meaning [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Studies have shown that even abstract concepts are grounded in our
sensorimotor experiences (e.g. [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]). When people talk or think about a “chair” it is associated
with a simulation of the movement of “easing into a chair” and its associated multimodal
representations (e.g. how a chair looks and feels, etc.), which leads to a slight neural
activation in the respective motion areas [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. A similar activation can be observed when we
think about a more abstract idiom, e.g. “playing first chair”. The main assumption here
is that natural language, whether abstract or concrete, is grounded in image schemas.
      </p>
      <p>
        A significant segment of the cognitive linguistics community argued that the
conceptual structure derived from our sensorimotor experiences and episodic memories shapes
semantic structure of natural language (see e.g. [
        <xref ref-type="bibr" rid="ref22 ref26">22,26</xref>
        ]). One theory that aims at
capturing this conceptual structure arising from our bodily sensations is the theory of
image schemas, introduced by Lakoff [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] and Johnson [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] within the paradigm of
embodied cognition. Image schemas are internally structured, that is, composed by a few
spatial primitives that make up more complex image schemas and schematic
integrations [
        <xref ref-type="bibr" rid="ref10 ref12 ref22">10,22,12</xref>
        ]. While their existence in natural language has been studied by means
of corpus-based (e.g. [
        <xref ref-type="bibr" rid="ref23 ref24">23,24</xref>
        ]) and machine learning methods (e.g. [
        <xref ref-type="bibr" rid="ref8 ref9">8,9</xref>
        ]), few empirical
studies with human subjects have been proposed [
        <xref ref-type="bibr" rid="ref4 ref7">4,7</xref>
        ].
      </p>
      <p>
        The majority of image-schematic experimental setups involved visual stimuli or
materials rather than descriptions of image schemas (e.g. [
        <xref ref-type="bibr" rid="ref23 ref6">6,23</xref>
        ]), asking subjects to narrate
(e.g. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]) or visualize what is being narrated (e.g. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]). One exception is that of Gibbs et
al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] who asked students to map image schemas to bodily experiences of physical
exercises they performed, which required the description of image schemas to participating
students [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. The work closest to ours is Cienki’s [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], who asked participants to annotate
videos with image schemas. We draw inspirations from these previous descriptions of
image schemas in our annotation task [
        <xref ref-type="bibr" rid="ref4 ref7">7,4</xref>
        ].
      </p>
      <p>In this paper, we contribute a study to test image schemas on a significant number
of human participants, that is, one hundred, to determine the coherence between
imagistic aspects and lexical representations, and to study methods for future investigations on
connecting image schemas to language. Our study presents natural language sentences to
human subjects and gives them the tasks of identifying image schemas within their
understanding of the sentences and annotating the sentences with specific image schemas.
We use crowdsourcing as a time-efficient and low-cost method of obtaining large
number of these image schema annotations. We evaluate its utility for image schema
annotations of natural language sentences that may be used as data in future studies, or for
machine learning-based natural language processing with image schemas. This evaluation
includes the analysis of inter-rater reliabilities as well as natural language justifications
of selections made by participants in the study. We found that human subjects recruited
from a crowdsourcing platform could perform the annotation task, but the annotation
cannot be treated as a simple classification, since annotations for many sentences resulted
in collocations of image schemas.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Crowdsourced Human Participants Study</title>
      <p>
        We performed a study of human subjects in which they read simple English sentences
and matched their conceptualizations of the meanings of the sentences to a number of
image schemas. To the best of our knowledge, this is the first study of this kind on
image schemas, and the first using crowdsourcing. Thus, we had to look to other studies of
highly-abstract cognitive building blocks for guidance on how to set up such a task. To
this end, study of crowdsourcing conceptual primitives of Schank’s Conceptual
Dependency theory [
        <xref ref-type="bibr" rid="ref20 ref25">20,25</xref>
        ] was consulted, from which we took the sentences used in this
experiment and provided in Table 2. A similarity between Conceptual Dependency primitives
and image schemas has been shown in previous comparative studies [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ].
      </p>
      <sec id="sec-2-1">
        <title>2.1. Selecting Image Schemas</title>
        <p>
          In the classic exposition of image schemas, Johnson [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] provided in total 29 to which
Lakoff [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] added several more, and, over the years, many additional image schemas,
such as SUPPORT [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ], have been proposed. In order not to overwhelm participants of
this study, we selected a subset of image schemas appearing in the literature, restricting
to those that are well specified and equipped with concise descriptions.
        </p>
        <p>
          Several image schemas initially proposed by Johnson were very general and
difficult to grasp without detailed explanations (e.g. PROCESS). Meanwhile others, such as
ENABLEMENT as one of the FORCE family of schemas, were highly specific. In order to
provide a more balanced account of image schemas to participants in this study, we
follow the lead of Cienki [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] and select image schemas with a moderate level of specificity.
Furthermore, since this study requires the formulation of self-explanatory descriptions
of utilized image schemas to non-experts, we limited our study to those for which simple
and complete descriptions are available in the literature. In a previous comparison of
image schemas and Conceptual Dependency primitives, the target sentences of this study
are implicitly annotated with image schemas by three experts [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. This availability of
annotations was another factor influencing our selection strategy. The final set of selected
image schemas is provided in Table 1 alongside their descriptions.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Explaining Image Schemas</title>
        <p>Crowdsourcing the annotation of sentences with image schemas requires the description
of image schemas to non-expert participants. This experiment assumes and tests a
certain degree of coherence in the imagistic aspects of lexical representations. To this end,
the main question to be answered is whether subjects agree on the mapping of image
schemas to linguistic expressions.</p>
        <p>
          Linguistically motivated analyses of image schemas have rarely involved the
description of image schemas and spatial primitives to human subjects. As one of the few
exceptions, Gibbs et al. [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] presented 22 students with brief descriptions of 12 different
image schemas after having conducted several bodily exercises to represent stand, who
were asked to rate the degree of relatedness between schema and exercise on a scale
of one to seven. The predominant schemas were: BALANCE, VERTICALITY,
CENTERPERIPHERY, RESISTANCE, LINK. The same descriptions were re-used in two similar
related experiments. The descriptions used are strongly related to the active bodily
experience of motion that subjects undergo within the experiment. For instance,
CONTAINMENT is described as follows: “Container refers to the experience of boundedness and
enclosure. As you stand there, do you feel a sense of container?” [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
        </p>
        <p>
          Cienki [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] conducted an experiment with 80 students who annotated gestures in
videos with image schemas, which required the description of image schemas to the
participants. We adapted Cienki’s set of image schema descriptions where available to
our purpose and the task of text annotation, and provided them as they are represented
in Table 1 to the participants of this study. In addition, we provided participants with
the option to specify their own category to annotate the sentences by using the option
“OTHER”.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Crowdsourcing the Study</title>
        <p>The study was performed on the Amazon Mechanical Turk crowdsourcing platform2.
It was posted as a set of Mechanical Turk human intelligence tasks (HIT) and
advertised as “a survey on language and sentence categorization” for both “masters” level and
“non-masters” workers. Participants performed the study remotely and entirely through
Mechanical Turk by accepting the task and filling out an HTML form on a Web page
on the Turk platform. We conducted two smaller pilot studies (with 12 and 10
participants respectively) to ensure the validity of our task, in particular the intelligibility of our</p>
        <p>
          Description
A container has a boundary that separates an inside from an outside. It
can hold things. We can be contained (for example, in a room), and our
own bodies are containers [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
        <p>
          A path is a route for moving from a starting point to an end point. We or
a thing can follow an existing path, or make a path with our or its own
movement (adapted from [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]).
        </p>
        <p>
          Contact between two objects in the vertical dimension [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]. For
instance, a book can be supported by a table when the book is in contact
with the table’s surface.
        </p>
        <p>
          Force usually implies the exertion of physical strength in one or more
directions. We can experience force in terms of compulsion, blockage,
or enablement [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
        <p>
          Part-whole describes whole(s) consisting of parts and a configuration
of the parts. Our bodies can be seen as a whole with several parts. An
object can be a whole with many parts (adapted from [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] ).
        </p>
        <p>If none of the categories seems appropriate, select this category (6) to
signify “other” and detail in your explanation a new category that you
think would better fit the sentence.
descriptions. Only workers who had an overall HIT approval rate greater than or equal
to 95% and more that 1000 approved HITs were allowed to perform the task. The HIT
was set up such that any unique Turk worker could only perform it once. A total of 100
participants took part in the study.</p>
        <p>The HTML crowdsourcing interface to the study had descriptions of the five image
schemas and the “Other” category. The image schema descriptions were followed by
the texts of the twelve target sentences (see Table 2). Along with each target sentence
were six checkboxes and six text input areas, each corresponding to the image schemas
and “Other”. The interface instructed participants to check one or more boxes to identify
the image schemas (described above) that matched the meaning of the sentence. For
each image schema box that participants checked, they were asked to provide a short
explanation in the corresponding text input area (of at least one sentence) for why the
image schema was a match to the sentence. A “submit” button in the HTML interface
uploaded participants’ answers to Mechanical Turk.</p>
        <p>Because the study was performed entirely online without any in-person interactions,
one challenge was to determine whether participants were honestly and sincerely
performing the requested task or just filling in the form randomly in exchange for
payment. We rejected HIT submissions from several participants based on the comments
that they made in explanation of their answers. For example, for the sentence “Jim held
on to the railing”, one participant checked the boxes for PATH, SUPPORT, and FORCE,
and left the explanations “he was going up the stairs”, “he was going down the stairs”,
and “he thought he was going to fall” respectively; this subject and subjects who gave
similar nonsensical answers were rejected. Others were rejected for leaving nonsensical
comments that we determined were not in any way connected to the image schema and
conceptualizing the sentence. They may have been due to not understanding the task or
not understanding the explanations of the categories. For example, a different participant
left explanations such as “one idea here”, and “he did one entire thing” to justify
PARTWHOLE for different sentences. Submissions were also rejected when it was clear that
their comments and selected image schemas did not match. Still others were rejected for
having repeated identical answers for each sentence (e.g. one participant checked
PARTWHOLE for each sentence and answered “humans have many parts” repeatedly for each,
including a sentence about a gecko). Several incomplete submissions and submissions
that were obviously computer generated were also rejected. Finally, we rejected seven
submissions which were perfect duplicates in terms of their answers but were submitted
under different Turk worker IDs. This evaluation process of participant’s answers left us
with a total of 100 participants as opposed to the 120 initially submitting participants,
whose answers we analyze more closely in the results section.</p>
        <p>
          As a further evaluation step, we were interested in the agreement among
respondents. Each respondent of this crowdsourcing task was presented with multiple
possible categories representing image schemas to annotate a specific given sentence, which
means answers do not fall into one of several mutually exclusive categories. Resulting
multi-response data cannot be analyzed with the traditional Pearson Chi-squared tests
for independence due to within-subject dependence among responses. As a result, we
opted for a kappa-based inter-rater reliability measure on the participants’ responses. In
general, statistics such as Krippendorff’s alpha [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] depend on mutually exclusive
categorical ratings. Kraemer [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] relaxes this constraint and uses rank order statistics to deal
with the case where a rater can mark a subject with more than one category [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. This,
however, requires processing our multi-response to rank ordered data. Fleiss suggests a
relaxation of the original kappa measures to allow for multiple raters for categorical data,
which can be calculated on a per category-basis for non-mutually exclusive classification
tasks as ours [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. This per-category calculation of inter-rater reliability could in our study
first be applied to image schemas and second to sentences.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Results</title>
      <p>We generally found that study participants were able to perform the task of
comprehending the image schema descriptions and connecting them to their interpretation of the
target sentences. We only rejected 5 participants out of 120 who were honestly attempting
to perform the task, but demonstrated clear difficulties in understanding or doing it—we
rejected 15 other responses which were malicious, computer generated, or clearly not
attempting the task for other reasons.</p>
      <sec id="sec-3-1">
        <title>3.1. Statistical Results</title>
        <p>Statistical results of the answers of the 100 participants are provided in Table 2, which
represents the selected image schemas for each sentence as well as the percentage of
people who selected this category since we have exactly 100 participants. However, a
participant could select more than one image schema per sentence. The highlighted
numbers in bold represent the highest number for each row corresponding to the per-sentence
annotations. For instance, in the first sentence (sentence 1), the image schema SUPPORT
was selected 92 times, which represents the highest number of selections in this row and
thus is highlighted in bold. We consider this highlighted number the predominant image
schema for this sentence. The last two columns represent how many participants selected
one category only or more than one for each sentence. For instance, for sentence 6 a total
of 81 people (81%) selected only one image schema while only 19 participants selected
more than one. Finally, the last row, the total average, presents the total number of times
an image schema was selected across sentences divided by the overall total number of
selections in the task. Here, the most frequently selected image schema in the task turns
out to be FORCE followed by PATH. The numbering of the sentences represent the order
of presentation to the study’s participants.</p>
        <p>The relative frequencies in Table 2 show that for several sentences one specific image
schema turned out to be predominant (highlighted in bold). For instance, 100% of all
participants selected FORCE for the second sentence, “Lisa kicked the ball”. However,
for some sentences, no particular image schema was dominant; sentences 7, 9, 10, and
11 show a stronger distribution of answers across up to three image schemas. To provide
a deeper understanding of these results, we analyze the participant’s explanations offered
alongside their annotations.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Interpreting Natural Language Explanations</title>
        <p>In this section, we discuss a first interpretation of natural language explanations provided
by participants to justify their selection of image schema(s) for a specific sentence. Our
discussion is guided by the grouping of sentences by image schema in Table 2 and based
on a codification of the results into categories generated in the annotation process. For
instance, we used categories such as “body is a container” to classify all explanations
that justify their CONTAINMENT selection in this way, e.g. “Amy’s body is a container
for the air she inhaled” in reference to sentence 9. Our focus is on the most frequently
selected image schemas in each group.</p>
        <p>Explanations in the first group of SUPPORT are highly homogenous with all
participants in both sentences agreeing that the predominant schema is SUPPORT because there
is a contact to an object (railing, wall) that provides support. The type of FORCE applied
is considered physical strength of the body, human or gecko, by all participants. The
PART-WHOLE explanations are the most varied where half of the participants consider
body parts (hand of Jim or the gecko) as part of the whole, the body. The remaining half
in sentence 1 refers to the railing being part of a path or a ship, while the remaining half
in sentence 5 refers to the gecko as a complex being composed of parts. For PATH in
sentence 1, the majority considered the railing itself to constitute some kind of path and
only four stated it is a person moving along a path.</p>
        <p>In the second group the predominant schema is FORCE, for which the explanations
in each of the four sentences corresponds. In sentences 2, 4, and 12 the selection is
attributed to physical strength applied by the actor or their body parts in the described
situation, whereas in sentence 8 the strength is assigned to the car that hit the actor. The
second significant schema here is PATH, which for sentences 2, 4, and 8 is attributed to
an object moving (ball, lunch, car) along a path, while in sentence 12 it is a body part
that moves (the fist). The frequently selected CONTAINER in sentence 4 is justified with
Michelle’s body or her stomach.</p>
        <p>As regards the group of predominantly PATH-annotated sentences, the explanations
mainly refer to a person moving along a path with some exceptions that describe the
path as abstract entity without detailing the object or person moving. The CONTAINER
in sentence 3 is in all but 2 cases the airplane, where the two cases refer to the actor
being a container himself. The agreement in explanation for this group is reflected in the
k values below.</p>
        <p>
          In the last group the most varied selection of categories and explanations can be
found, which requires a more detailed a analysis. For sentences 7, 9, and 10 annotators
agree that the body (or its parts, such as the lungs) functions as CONTAINER. This act
of becoming contained (hamburger, air) is associated with FORCE. This association with
force is considerably more frequent with the expulsion of the containee in 4. The
annotators selecting PATH in sentences 7,9 and 10 considered the way from the outside to the
inside of the container as traveling along a PATH, much in line with a formalization of
the CONTAINMENT scheme [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. For PART-WHOLE in Sentence 7, the annotators
provide different answers, that is, 3 stating the hamburger has parts, 7 considering the
hamburger to become part of the whole of Charles, and 9 stating that Charles used parts of
his body (mostly mouth, 2 say arms) to eat the hamburger. In the explanations of the 14
times that “Other” was selected, people mostly suggested an additional category called
“consumption” or “energy”.
        </p>
        <p>When we examine which image schemas took second place in the sentence
annotations, “Amy took a deep breath” had the highest second-place percentage. In this case the
FORCE image schema, with 47%, came in second place only to CONTAINMENT, with
57%. All FORCE explanations are related to physical strength, where a small proportion
of subjects explicitly assigns the force of inhaling to the body part lungs. All
annotators selecting CONTAINMENT referred to the body or body part as CONTAINER. Finally,
annotators differentiated between body parts being part of a whole or the air becoming
part of the body when selecting PART-WHOLE. For “Other” people mostly suggested a
new category of “living being” or only stated that none of the other categories fit in their
mind.</p>
        <p>A majority of participants marked PART-WHOLE as a match for the sentence
“Stephanie bled from a cut on her leg”, while nearly two-thirds (69) did for “Kevin
crossed his arms”. For sentence 10 the predominant justification is that the leg is a part
of the body, whereas in sentence 11 the predominant argumentation is that person is a
whole with many parts. While describing two different perspectives, the underlying idea
matches. For 19 of 33 participants, CONTAINMENT and PATH belong together in
sentence 10 since the blood leaves the CONTAINER along a path. The remaining 14
participants selected PATH but not CONTAINMENT and state that an object (blood) travels along
a PATH leaving. The 25 annotators selecting FORCE in Sentence 10 explained that it is
outer forces pulling blood out (3), such as gravity, that bodily functions force blood out
(11), or that FORCE had to be applied in order for the cut to come into existence (11). To
summarize, participants to some degree agree that the container of her body/leg is
forcefully disrupted by a cut that causes parts of the containee to leave the container along
a path. In the explanations of “Other” people suggest the introduction of an additional
category of “involuntary action” or “injury” because of the cut.</p>
        <p>In case of Sentence 11, arms are considered to move along a path (17) and by
interlinking them they create a support structure with the chest (19). In addition, 30 of 32
participants stated that it takes physical FORCE to cross one’s arms, while 2 stated that
the resulting posture is a defiant and forceful one.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Inter-Rater Reliability</title>
        <p>
          To statistically measure agreement in our data, we calculate the inter-rater reliability
based on the kappa proposed by Fleiss [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Since more than one category can be selected
for each sentence and we have multiple raters, we decided to calculate this kappa value
on an image schema basis represented in Table 3. Each column represents one image
schema and row one the Fleiss kappa calculated on all sentences. The highest agreement
can be found for SUPPORT followed by FORCE and the lowest on PART-WHOLE.
k
        </p>
        <p>CONTAINER
0.268</p>
        <p>PATH SUPPORT FORCE
0.357 0.639 0.392
Table 3. Fleiss’ kappa per image schema</p>
        <p>PART-WHOLE
0.223
all
0.266</p>
        <p>
          While we believe that those results are interesting in terms of understandability
of individual image schemas, it might also be the case that the design as
multiclassclassification problem negatively impacts the results since one rater could select more
than one category. To account for this problem we apply Kraemar’s method [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] to turn
our multiclass-classifications into a ranked ordinal set and then calculate the Kendall rank
correlation coefficient [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], which returns a correlation of 0.405 as a second measure for
comparison and amoderate agreement for the whole dataset.
        </p>
        <p>k</p>
        <p>To complete the calculation of agreement in our dataset, we adapt the per-category
measurement of Fleiss’ kappa to a sentential level depicted in Table 4. This gives a
considerably higher agreement than the per-image schema basis. Sentences with the lowest
values are sentences 7, 9, 10, and 11. These are all grouped into the CONTAINER and
PART-WHOLE group in Table 2 and are the ones with the strongest indication for image
schema collocations. Highest agreements are achieved for sentences were all (sentence
2) or almost all (sentence 6) participants select a specific image schema. In fact, the four
sentences with the highest agreement (sentences 6, 2, 8, 3 in order of k score) obtain
the highest number of selections for a single image schema (category per sentence) and
are annotated with the most frequently selected image schemas, that is, FORCE (32%)
and PATH (23%). It seems that the sentences in the last group of Table 2 are the most
controversial and most difficult to annotate.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Discussion</title>
      <p>The design of our crowdsourcing task allowed participants to select more than one
image schema per sentence, which was a frequently utilized option. One possible
explanation for the annotation of a sentence with multiple image schemas is that participants
conceived a conceptual collocation of image schemas, which we can confirm from the
explanations. For instance, “Amy took a deep breath” was perceived as combination of
CONTAINMENT and FORCE in most cases, where the air enters the container represented
by Amy’s lungs or her body and the moving of in the body requires FORCE. Such
movements in and out the body are frequently also collocated with PATH along which the
objects (air, food, people if the container is an object, etc.) enter or leave the container.
An option to explicitly indicate the semantics of a sentence with a collocation of image
schemas was not considered in this task, but would be important future work.</p>
      <p>To quantify this phenomenon, on average 38% of all annotators selected more than
one image schema per sentence (see Table 2). This suggests that specific sentences are
perceived as grounded in a collocation of image schemas. This has implications for the
utilization of crowdsourcing to obtain large corpora of annotated natural language
sentences, namely that participants of such studies should be given the option to select more
than one image schema for their annotation of natural language sentences. A large-scale
dataset with annotated image schemas could be highly useful to boost the field of image
schemas and the utilization of machine and deep learning applications.</p>
      <p>
        As an additional comparison of our study, we evaluate the results in the light of
previous annotations of the same sentences by experts provided in Macbeth et al. [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ].
In that previous study, image schemas were mapped to Conceptual Dependency
primitives utilizing the same set of sentences of the study presented herein, which indirectly
provided an expert annotation of those sentences. Experts and crowd agree on the
sentences primarily annotated with SUPPORT (sentence 1 and 5), PATH (sentences 3 and
6), and FORCE (sentences 2 and 8). Experts and crowd also agree on a stronger image
schema collocation in sentence 4, namely a combined annotation of PATH, CONTAINER,
and FORCE. For sentence 12, experts see PATH and PART-WHOLE as the predominant
schemas, while the crowd mainly annotated the sentence with FORCE, followed by PATH
and PART-WHOLE. Finally, for the last group of sentences in Table 2 the experts
annotate sentences 7, 9, and 10 with CONTAINER, PATH, and FORCE and sentence 11 with
PATH and PART-WHOLE. In sentences 9 and 10, the crowd is less interested in PATH and
for sentence 10 the predominant image schema is PART-WHOLE. The schemas selected
by the experts are also considered by the crowd, but with less importance. Sentence 11
shows an agreement in PART-WHOLE, but not in PATH, which is assigned considerably
less importance by the crowd than FORCE. To sum up this comparison, the biggest
discrepancies can be seen in sentence 12 were the experts ignore FORCE and in the
sentences in the CONTAINMENT and PART-WHOLE group in Table 2, where PATH and
PART-WHOLE are assigned different significances by the two groups. Nevertheless, the
overall agreement of experts and crowd provides further validity to the task and setup of
this crowdsourcing study, even though this point has to be subjected to further large-scale
crowdsourcing experiments with more and more varied sentences.
      </p>
      <p>
        A discussion on crowdsourcing image schemas would not be complete without
explicitly addressing several lessons learned. In this study, a comparatively small corpus of
sentences was annotated to ensure a significant number of annotations per sentence. This
can potentially inhibit generalizations to some extent. Furthermore, the chosen set of
sentences mainly describes concrete sensorimotor experiences that participants might have
experienced themselves at one point or another. It is desirable to repeat the experiment
with a larger and more abstract set of sentences. The task setup could be improved with
a view to facilitating the evaluation of the obtained results. Instead of allowing for
multiple selections, a ranking of the answers by most important to least important schema for
a specific sentence could considerably facilitate statistical processing and provide more
expressive annotations. One potential method to this end could be best-worst scaling [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]
or simple ranking of chosen image schemas for each individual sentence. However, we
believe that a more substantial change of task is needed to truly account for an expressive
annotation of perceived image schema collocations. This could be done by explicitly
allowing participants to describe the semantics of a sentence by a combination of image
schemas, which could lead to very interesting results.
      </p>
      <p>From the detailed analysis of the results in this study several implications follow.
It can be safely stated that image schemas turn out to be useful heuristics for the
interpretation of natural language sentences, for experts as much as non-experts. A certain
degree of agreement in annotations (between crowd workers and between crowd and
experts) shows that image schema annotations of natural language can also be performed
without a cognitive linguistic background. This agreement also implicitly validates our
natural language descriptions of image schemas, since a sufficient and homogeneous
understanding of these descriptions is required to reach any agreement on the annotations.
Those validated descriptions mark one major contribution of this paper, since it is
generally perceived as difficult to describe abstract cognitive building blocks to non-experts.
Nevertheless, the proposed set of descriptions could benefit from further experiments and
especially extensions, since it only covers five image schemas in its current state.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>Empirical studies of image schemas involving human subjects have been challenging
due to their highly abstract nature, and only few studies have attempted to explain image
schemas to non-experts—a prerequisite for those types of tasks. To the best of our
knowledge, this is the first study that proposes the use of crowdsourcing to annotate natural
language sentences with image schemas, which also required the explanation of image
schemas to naive subjects. A high agreement of expert and crowd annotations acts in
favor of the proposed method for image schema annotations of natural language.</p>
      <p>Our results show collocations of image schemas for individual sentences, that is,
multiple image schemas are chosen for each sentence, which has ramifications for using
crowdsourcing to gather labeled training data for machine learning. While the current set
of sentences is restricted to ensure sufficient annotations for each sentence, it still shows
a high agreement of annotators regarding the image-schematic content of sentences. The
results also hint at certain common combinations of image schemas, such as SUPPORT
and FORCE that, however, require further large-scale investigations to allow for
generalization. In addition, we envision to extend the type of sentences to be annotated from
strictly physical movements to more abstract ones and test the annotation task on crowds
of different languages.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Mousumi</given-names>
            <surname>Banerjee</surname>
          </string-name>
          , Michelle Capozzoli,
          <string-name>
            <surname>Laura McSweeney</surname>
            ,
            <given-names>and Debajyoti</given-names>
          </string-name>
          <string-name>
            <surname>Sinha</surname>
          </string-name>
          .
          <article-title>Beyond kappa: A review of interrater agreement measures</article-title>
          .
          <source>Canadian Journal of Statistics</source>
          ,
          <volume>27</volume>
          (
          <issue>1</issue>
          ):
          <fpage>3</fpage>
          -
          <lpage>23</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Lawrence</surname>
            <given-names>W Barsalou.</given-names>
          </string-name>
          <article-title>Grounded cognition</article-title>
          .
          <source>Annu. Rev. Psychol</source>
          .,
          <volume>59</volume>
          :
          <fpage>617</fpage>
          -
          <lpage>645</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Casasanto</surname>
          </string-name>
          and
          <string-name>
            <given-names>Sandra</given-names>
            <surname>Lozano</surname>
          </string-name>
          .
          <article-title>The meaning of metaphorical gestures</article-title>
          .
          <source>Metaphor and Gesture</source>
          .
          <source>Gesture studies, Amsterdam</source>
          , the Netherlands: John Benjamins Publishing.
          <source>(date of access: 9 Dec</source>
          .
          <year>2012</year>
          ),
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Alan</given-names>
            <surname>Cienki</surname>
          </string-name>
          .
          <article-title>Image schemas and gesture. From perception to meaning: Image schemas in cognitive linguistics</article-title>
          ,
          <volume>29</volume>
          :
          <fpage>421</fpage>
          -
          <lpage>442</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Joseph</surname>
            <given-names>L</given-names>
          </string-name>
          <string-name>
            <surname>Fleiss</surname>
          </string-name>
          .
          <article-title>Measuring nominal scale agreement among many raters</article-title>
          .
          <source>Psychological bulletin</source>
          ,
          <volume>76</volume>
          (
          <issue>5</issue>
          ):
          <fpage>378</fpage>
          ,
          <year>1971</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Orly</given-names>
            <surname>Fuhrman</surname>
          </string-name>
          ,
          <string-name>
            <surname>Kelly</surname>
            <given-names>McCormick</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Eva</given-names>
            <surname>Chen</surname>
          </string-name>
          , Heidi Jiang, Dingfang Shu, Shuaimei Mao, and
          <string-name>
            <given-names>Lera</given-names>
            <surname>Boroditsky</surname>
          </string-name>
          .
          <article-title>How linguistic and cultural forces shape conceptions of time: English and Mandarin time in 3D</article-title>
          .
          <source>Cognitive science</source>
          ,
          <volume>35</volume>
          (
          <issue>7</issue>
          ):
          <fpage>1305</fpage>
          -
          <lpage>1328</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Raymond</surname>
            <given-names>W Gibbs</given-names>
          </string-name>
          <string-name>
            <surname>Jr</surname>
            , Dinara A Beitel,
            <given-names>Michael</given-names>
          </string-name>
          <string-name>
            <surname>Harrington</surname>
          </string-name>
          , and Paul E Sanders.
          <article-title>Taking a stand on the meanings of stand: Bodily experience as motivation for polysemy</article-title>
          .
          <source>Journal of Semantics</source>
          ,
          <volume>11</volume>
          (
          <issue>4</issue>
          ):
          <fpage>231</fpage>
          -
          <lpage>251</lpage>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Dagmar</given-names>
            <surname>Gromann and Maria M Hedblom.</surname>
          </string-name>
          <article-title>Body-mind-language: Multilingual knowledge extraction based on embodied cognition</article-title>
          .
          <source>In AIC</source>
          , pages
          <fpage>20</fpage>
          -
          <lpage>33</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Dagmar</given-names>
            <surname>Gromann and Maria M. Hedblom</surname>
          </string-name>
          .
          <article-title>Kinesthetic mind reader: A method to identify image schemas in natural language</article-title>
          .
          <source>In Proceedings of Advancements in Cognitive Systems</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Beate</given-names>
            <surname>Hampe</surname>
          </string-name>
          .
          <article-title>Image schemas in cognitive linguistics: Introduction. From perception to meaning: Image schemas in cognitive linguistics</article-title>
          ,
          <volume>29</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>14</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Maria</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Hedblom</surname>
          </string-name>
          , Dagmar Gromann, and
          <article-title>Oliver Kutz. In, out and through: Formalising some dynamic aspects of the image schema containment</article-title>
          . In Stefano Bistarelli, Martine Ceberio, Francesco Santini, and Eric Monfroy, editors,
          <source>Proceedings of the Knowledge Representation and Reasoning Track(KRR) at the Symposium of Applied Computing (SAC)</source>
          , pages
          <fpage>918</fpage>
          -
          <lpage>925</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Maria</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Hedblom</surname>
            , Oliver Kutz, and
            <given-names>Fabian</given-names>
          </string-name>
          <string-name>
            <surname>Neuhaus</surname>
          </string-name>
          .
          <article-title>Choosing the Right Path: Image Schema Theory as a Foundation for Concept Invention</article-title>
          .
          <source>Journal of Artificial General Intelligence</source>
          ,
          <volume>6</volume>
          (
          <issue>1</issue>
          ):
          <fpage>22</fpage>
          -
          <lpage>54</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Mark</given-names>
            <surname>Johnson</surname>
          </string-name>
          .
          <article-title>The Body in the Mind: The Bodily Basis of Meaning, Imagination, and Reason</article-title>
          . The University of Chicago Press, Chicago and London,
          <year>1987</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Maurice</surname>
            <given-names>G</given-names>
          </string-name>
          <string-name>
            <surname>Kendall</surname>
          </string-name>
          .
          <article-title>A new measure of rank correlation</article-title>
          .
          <source>Biometrika</source>
          ,
          <volume>30</volume>
          (
          <issue>1</issue>
          /2):
          <fpage>81</fpage>
          -
          <lpage>93</lpage>
          ,
          <year>1938</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Helena</surname>
            <given-names>Chmura</given-names>
          </string-name>
          <string-name>
            <surname>Kraemer</surname>
          </string-name>
          .
          <article-title>Extension of the kappa coefficient</article-title>
          .
          <source>Biometrics</source>
          , pages
          <fpage>207</fpage>
          -
          <lpage>216</lpage>
          ,
          <year>1980</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Klaus</given-names>
            <surname>Krippendorff</surname>
          </string-name>
          .
          <article-title>Reliability in content analysis: Some common misconceptions and recommendations</article-title>
          .
          <source>Human communication research</source>
          ,
          <volume>30</volume>
          (
          <issue>3</issue>
          ):
          <fpage>411</fpage>
          -
          <lpage>433</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>George</given-names>
            <surname>Lakoff. Women</surname>
          </string-name>
          , Fire, and
          <string-name>
            <given-names>Dangerous</given-names>
            <surname>Things</surname>
          </string-name>
          .
          <article-title>What Categories Reveal about the Mind</article-title>
          . The University of Chicago Press,
          <year>1987</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Jordan J Louviere and George G Woodworth</surname>
          </string-name>
          .
          <article-title>Best-worst scaling: A model for the largest difference judgments</article-title>
          . University of Alberta: Working Paper,
          <year>1991</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Jamie</surname>
            <given-names>Macbeth</given-names>
          </string-name>
          , Dagmar Gromann, and
          <string-name>
            <surname>Maria</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Hedblom</surname>
          </string-name>
          .
          <article-title>Image schemas and conceptual dependency primitives: A comparison</article-title>
          .
          <source>In Proceedings of the Joint Ontology Workshop (JOWO)</source>
          .
          <source>CEUR</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Jamie</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Macbeth</surname>
            and
            <given-names>Marydjina</given-names>
          </string-name>
          <string-name>
            <surname>Barionnette</surname>
          </string-name>
          .
          <article-title>The coherence of conceptual primitives</article-title>
          .
          <source>In Proceedings of the Fourth Annual Conference on Advances in Cognitive Systems. The Cognitive Systems Foundation</source>
          ,
          <year>June 2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Jean</surname>
            <given-names>M</given-names>
          </string-name>
          <string-name>
            <surname>Mandler.</surname>
          </string-name>
          <article-title>How to build a baby: II. conceptual primitives</article-title>
          .
          <source>Psychological review</source>
          ,
          <volume>99</volume>
          (
          <issue>4</issue>
          ):
          <fpage>587</fpage>
          ,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Jean</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Mandler</surname>
          </string-name>
          and Cristo´bal Paga´n Ca´novas.
          <article-title>On defining image schemas</article-title>
          .
          <source>Language and Cognition</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>23</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Anna</surname>
            <given-names>Papafragou</given-names>
          </string-name>
          , Christine Massey, and
          <string-name>
            <given-names>Lila</given-names>
            <surname>Gleitman</surname>
          </string-name>
          .
          <article-title>When English proposes what Greek presupposes: The cross-linguistic encoding of motion events</article-title>
          .
          <source>Cognition</source>
          ,
          <volume>98</volume>
          (
          <issue>3</issue>
          ):
          <fpage>B75</fpage>
          -
          <lpage>B87</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <article-title>Juan Antonio Prieto Velasco and Maribel Tercedor Sa´nchez. The embodied nature of medical concepts: image schemas and language for pain</article-title>
          .
          <source>Cognitive processing</source>
          ,
          <volume>1</volume>
          <fpage>2014</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Roger</surname>
            <given-names>C</given-names>
          </string-name>
          <string-name>
            <surname>Schank. Conceptual Information Processing. Elsevier</surname>
          </string-name>
          , New York, NY,
          <year>1975</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>Leonard</given-names>
            <surname>Talmy</surname>
          </string-name>
          .
          <article-title>The fundamental system of spatial schemas in language</article-title>
          . In Beate Hampe and Joseph E Grady,
          <article-title>editors, From perception to meaning: Image schemas in cognitive linguistics</article-title>
          , volume
          <volume>29</volume>
          <source>of Cognitive Linguistics Research</source>
          , pages
          <fpage>199</fpage>
          -
          <lpage>234</lpage>
          . Walter de Gruyter,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>