=Paper=
{{Paper
|id=None
|storemode=property
|title=Iconic Gestures with Spatial Semantics: A Case Study
|pdfUrl=https://ceur-ws.org/Vol-620/paper10.pdf
|volume=Vol-620
}}
==Iconic Gestures with Spatial Semantics: A Case Study==
Iconic Gestures with Spatial Semantics: A Case Study
Elizabeth Hinkelman1
1
Galactic Village Games, Inc., 110 Groton Rd., Westford MA 01886 USA
elizh@galactic-village.com
Abstract. The spontaneous gestures that accompany spoken language are
particularly suited to conveying spatial information, yet their briefness,
individuality, and lack of conventional linguistic structure impede their
integration into NLU systems. The current work characterizes spontaneous size
gestures in a manual task corpus, clarifying their form, discourse role and
representation as a first step toward incorporating them into NLU systems.
Keywords: gesture, spatial language, knowledge representation.
1 Introduction
When gesture carries the primary load of communication, as in the major sign
languages, it develops linguistic properties such as verb subcategorization [1] and
lexicalization [2,3]. The spontaneous hand gestures that accompany speech, in
contrast, do not show linguistic structure [4]. For this reason, computational research
on spontaneous gesture has focused primarily on discourse functions, such as using
long range video features to signal repair strategies [5] or shifts in topic [6]. Discrete-
valued features extracted from gaze and body orientation have also been used for
discourse functions such as signaling grounding. Much of this work emphasizes
gesture production rather than recognition [7, 8, 9].
Yet the spontaneous hand gestures that accompany speech are increasingly
recognized both as a cognitive aid to the gesturer, and an encoding of meaning [10,
11, 12]. Among the spontaneous gestures that accompany speech, iconic gestures are
those which present “images of concrete entities and actions”[4]. Iconic gestures have
in some cases (though not yet broadly) been shown to be effective in communicating
spatial information between discourse participants [4, 11, 13].
The current work pursues the incorporation of spontaneous gesture into NLU
systems: much groundwork must be laid. Amid the fluidity and abstractness of
spontaneous gesture, we focus on concrete gestures with (relatively) straightforward
spatial interpretations. We seek to answer the questions:
• What is the discourse purpose of the gestures?
• Do the gestures constitute intended communication?
• To what extent are they lexicalized?
• What are their semantics?
• How can they be related to the semantics of the co-ocurring speech?
2 Corpus study
We collected a reference corpus for dialogue with intonation and gesture in a physical
task context. The subjects were twelve pairs of University of Chicago undergraduate
and graduate students, who were familiar with each other and had some cooking
experience. They were recorded while performing a 30-45 minute cooking task
(making chocolate truffles), using a single camera and lapel microphones. Some
elements of the task include locating ingredients and equipment, dividing the labor,
choosing flavorings, and activities such as measuring and washing up.
The resulting eight hours of videotape were examined for spatial gestures. These
included pointing, displaying, miming of physical actions and manner[14], and size
gestures. We selected the size gestures as a focus for possible NLU because they are
the simplest and most imagistic of these groupings, and because they were relatively
uniform in form.
All of the size gestures in our corpus stemmed from the recipe step: “Take a hunk
of set ganache and roll into a walnut-sized ball between your palms.” An example can
be seen in Illustration 1, where subject Chris reads the recipe step aloud, envisions the
ball he will roll, and enlists Jason to confirm the ball size. In total he performs the
gesture for about three seconds; Jason eventually turns his head to view it for about
800ms. We will refer to this example and similar gestures as 'the ball size gesture'.
2.1 results: ball size gesture use and discourse purpose
Of twelve pairs of subjects, two did not communicate about truffle size beyond
reading the recipe. Ten discussed truffle size verbally; of these, three did not use
gestures, and three used displays of ganache (dough). Four used size gestures: three
1
ball size gestures and one caliper size gesture . Gestures were used in two main ways:
to inform the partner of a desired size, or to request confirmation that a size was
correct. In one case, multiple ball gestures were used to explain how an incorrect ball
size leads to difficulties in baking. All gestures were used with co-occurring speech.
1 A 'caliper gesture' shows the size of a small object using parallel thumb and forefinger .
2.2 Intended communication – ball size and display
We classify five of the seven gestures as intended communication, on the basis that:
in three cases the gesturer used motion or location to attract visual attention; in two
cases the gesturer made a verbal reference to the gesture (e.g.“like this?”), and in one
case both were used. For the seventh gesture (the incorrect ball size explanation) we
have no evidence that the gesture per se was intended communicatively. A further
analysis of gaze and uptake in these cases is in progress. Although this is a very small
sample, most of these gestures showed evidence of communicative intent.
2.3 Form constraints on the ball size gesture
We initially suspected that the ball size gesture was strongly lexicalized in
comparison with spontaneous gesture generally. In all cases the thumb and forefinger
circle to touch each other and embrace a notional ball, and are displayed as the focal
side of the gesture. However, there is notable variation in other parameters. Either
hand could be used, as in ASL. The position of the other three fingers is not
conventionalized (where it might or might not be constrained in a sign language.)
The location of the gesture relative to the gesturer is not as conventionalized as it
would be in ASL. In the table, we refer to the gesturer as G and the observer as O.
The third column, the explanation of how two balls may melt into each other while
baking, is more typical of spontaneous gesture in showing dynamic configurational
elements with extended duration. The ball size gesture is not as conventionalized as
an ASL gesture – nor can we say what lexicon it would belong to. More work is
needed on this point. The ball size gesture contrasts with the caliper gesture in form.
Lexicalized? Chris&Jason Chris&Trish Josh&Naomi
Hand left right both
Handform 'OK' 'OK' 'OK', 'OK'
Fingers splayed curled splayed, splayed
Orientation O's visual plane O's visual plane Off G's vis plane
Location At G's eye level Near O's focus Near G's chest
Path static static Slowly together
Duration >3000ms (G) 260ms 1500ms
(ASL=250ms) > 700ms (O)
3 Representing Size
Finally we consider semantic representation. A size is a property of a physical object,
generally represented as a value on a scale, where a scale is a partial ordering on a set
of elements. The majority of verbal size descriptions followed the recipe text: 'the
size of a” small object, or simply mentioned a small object: walnut, half a walnut,
meatball. The comparative “...smaller”, and (negated) intensifier “don't make it too
big!” also occurred. The scale in this case seems to be based on the generics (types)
of ball shaped food items, and the asserted relation is purely qualitative. Qualitative
representations [15, 16] may prove extensible. Gesture's spatial medium, by contrast,
is continuous rather than discrete; the underlying scale is tied to the visual or perhaps
kinesic system. What representation could plausibly be generated by the visual
system? Our preliminary work investigates low level features in the spirit of [17, 18].
Acknowledgments. This work was supported in part by NSF grant no. IRI-9109914.
K-E. McCullough, C. Sidner and R. Jacobs provided valuable discussion.
References
1. Supalla, T.: Serial verbs of motion in American Sign Language. In S. Fischer (Ed.),
Theoretical Issues in Sign Language Research. University of Chicago Press (1990)
2. Hoiting, N., Slobin, D.: From Gestures to Signs in the Acquisition of Sign Language. In
Duncan, S. D., Cassell, J., Levy, E. T. (Eds.), Gesture and the Dynamic Dimension of
Language, pp. 51 - 66. John Benjamins Publishing Company, Philadelphia (2007)
3. Goldin-Meadow, S. Gesture with Speech and Without It. In Duncan Cassell Levy, pp 31-50.
4. McNeill, D.(Ed.), Language and Gesture, pp.2-7. Cambridge Univ. Press, New York (2000).
5. Chen, L., Harper, M., Quek, F.: Gesture Patterns during Speech Repairs. In Proc. icmi,
pp.155- Fourth IEEE International Conference on Multimodal Interfaces (ICMI'02), (2002)
6. Eisenstein, J., Barzilay, R., and Davis, R. 2008. Discourse topic and gestural form. In Cohn,
A. (Ed.): Proceedings of the 23rd NCAI, pp. 836-841. AAAI Press (2008)
7. Cassell, J., Nakano, Y.I., Bickmore, T.W., Sidner, C., Rich, C.: Non-verbal cues for
discourse structure, Proceedings of the 39th Annual Meeting on Association for
Computational Linguistics, p.114-123. Toulouse (2001)
8. Traum, D., Morency, L-P.: Integration of Visual Perception in Dialogue Understanding for
Virtual Humans in Multi-Party Interaction. In Proc. AAMAS (in press). Toronto (2010)
9. Rich, C., Ponsler, B., Holroyd, A., Sidner, C.L.: Recognizing Engagement in Human-Robot
Interaction. In: Proc. Human-robot Interaction. Osaka (2010)
10. McNeill, D.: Gesture and Thought. University of Chicago Press, Chicago (2005)
11. Tversky, B., Lozano, S. C.: Gestures aid both communicators and recipients. In K.
Coventry, J. Bateman, T. Tenbrink (Eds.), Spatial language and dialogue. Oxford: Oxford
University Press (forthcoming)
12. Goldin-Meadow, S. Hearing gesture: How our hands help us think. Cambridge, MA:
Harvard University Press (2003)
13. Beattie, G., Shovelton, H.: When Size Really Matters. Gesture, 6:1., pp. 63-84 (2006)
14. Hinkelman, E.: Spatiomotor Routines as Spontaneous Gestures. Spatial Cognition (2010)
15. Lovett, A., Forbus, K.: Shape is like Space: Modeling Shape Representation as a Set of
Qualitative Spatial Relations. AAAI Spring Symposium Series, North America, Mar. 2010.
16. Bateman, J.A., Hois, J., Ross, R. J., Tenbrink, T. A Linguistic Ontology of Space for
Natural Language Processing. In Artificial Intelligence, in press (2010)
17. Regier, T., Carlson, L.A.: Grounding Spatial Language in Perception: An Empirical and
Computational Investigation. Journal of Experimental Psychology, Vol. 130, No. 2, pp 273-
298 (2001)
18. Franconieri, S.L., Scimeca, J.M., Roth, J.C., Helseth, S.A.: Visual Spatial Relationship
Representation as a sequence of attentional shifts. Subm. J. Cognitive Science.