=Paper= {{Paper |id=Vol-1419/paper0110 |storemode=property |title=Multimodal Discourse: In Search of Units |pdfUrl=https://ceur-ws.org/Vol-1419/paper0110.pdf |volume=Vol-1419 |dblpUrl=https://dblp.org/rec/conf/eapcogsci/KibrikFN15 }} ==Multimodal Discourse: In Search of Units== https://ceur-ws.org/Vol-1419/paper0110.pdf
                                   Multimodal Discourse: In Search of Units
                                           Andrej A. Kibrik (aakibrik@gmail.com)
                              Institute of Linguistics RAS and Lomonosov Moscow State University
                                            B. Kislovskij per. 1, Moscow, 125009, Russia
                                          Olga V. Fedorova (olga.fedorova@msu.ru)
          Lomonosov Moscow State University, Russian Academy of National Economy and Public Administration,
                                           and Institute of Linguistics RAS
                                    Leninskie Gory 1, Moscow, 119899, Russia
                                          Julia V. Nikolaeva (julianikk@gmail.com)
                              Lomonosov Moscow State University and Institute of Linguistics RAS
                                        Leninskie Gory 1, Moscow, 119899, Russia

                            Abstract                                          In this study we focus on three components of spoken
  Human communication is inherently multimodal. In this study
                                                                           discourse: the verbal component, prosody, and gesticulation.
  we focus on three channels of spoken discourse: the verbal               These components can be viewed separately to an extent but
  component, prosody, and gesticulation. We address the                    they are all interwoven in natural communication. As any
  question of units that can be identified within these                    human behavior, multimodal discourse has structure. If so,
  components and in spoken multimodal discourse as a whole.                what are its units? We discuss the basic units found within
  The basic unit of the verbal channel is the clause, reporting an         the three channels considered separately (sections 2–4), and
  event or a state. A set of prosodic criteria help to define              proceed with suggestions on coordinated basic units of
  elementary discourse units, that is prosodic units serving as
  quanta of discourse production. The gestural channel consists            multimodal discourse (section 5). In section 6 we discuss
  of individual gestures, each defined by a set of features.               larger, more complex units of spoken discourse, and offer
  Elementary discourse units are strongly coordinated with both            conclusions in section 7. This study is based on a corpus of
  clauses and gestures and can thus be considered basic units of           Russian discourse, but some English examples are cited
  multimodal discourse. Larger units can also be identified,               below for the ease of exposition.
  such as prosodic sentences and series of gestures that again
  demonstrate coordination. By identifying units of natural
  discourse, coordinated across various channels, we make a
                                                                                            2. The verbal channel
  step towards multimodal linguistics.                                     The verbal component of discourse largely consists of
  Keywords: discourse structure; multimodal discourse; clause;
                                                                           reporting events and states (Chafe, 1994). Languages have
  prosody; gesture; elementary discourse unit; sentence.                   developed a universal syntactic structure for packaging
                                                                           events and states: the clause. Each clause reports an event or
                      1. Introduction                                      a state, along with their participants, or referents. For
                                                                           example, the minimal narrative Veni, vidi, vici consists of
In modern linguistics, as well as in other domains of                      three events, each reported with a clause consisting of a
cognitive science, there is a growing understanding that                   single word: a verbal predicate, encoding in its inflection the
human communication is inherently multimodal. When we                      subject participant. Consider a natural spoken example
communicate orally, we not only produce chains of words,                   (from text SBC032 of the Santa Barbara corpus of spoken
but also intonate, gesticulate, interact with eye gaze, etc.               American English, see www.linguistics.ucsb.edu/research/
(Gibbon et al. eds., 2000; Kress, 2002; Hugot, 2007; So et                 santa-barbara-corpus), consisting of two clauses, each
al., 2009; Loehr, 2012; Ford, Fox, & Thompson, 2013;                       reporting an event:
Goldin-Meadow, 2014, inter alia). A research program of
multimodal linguistics is gradually evolving (Kibrik, 2010;                  And then I was forced out,
Kress, 2010; Knight, 2011; Adolphs & Carter, 2013; Müller                    because I failed a promotion to commander!
et al. eds., 2014) that treats the verbal structure on a par with
non-verbal devices. Among non-verbal devices, sometimes                       Clauses may report events of various complexity and with
only kinetic-visual behaviors are considered. But we find it               various amount of detail, and they may include additional
very important to identify prosody (see e.g. Kodzasov,                     elements, especially connectors indicating the semantic
2009), that is non-segmental aspects of the vocal signal, as a             relationships between clauses, such as and then or because
distinct communication channel.                                            in the example above. In various theories of discourse
   Kibrik and Molchanova (2013) considered three                           structure (e.g. Mann & Thompson, 1988; Carlson, Marcu &
communication channels employed in multimodal                              Okurowski, 2003; Wolf & Gibson, 2005) clauses are
discourse: the verbal component, prosody, and kinetic-                     organized in a hierarchical network of nodes connected with
visual behavior. They found that all three channels play an                discourse-semantic relations. Groups of clauses are often
important (and comparable) role in the overall process of                  organized into syntactic units known as sentences, with the
conveying a message from a speaker to an addressee.


                                                                     662
links between clauses being tight to various degrees, see e.g.          (Kendon, 1986), are manual signs with fixed form and
Givón, 2009; Laury & Ono, 2014.                                         relatively fixed meaning, widely shared by a given linguistic
                                                                        community. Second, “illustrative” or “spontaneous”
               3. The prosodic channel                                  (McNeill, 1992) gesticulation, also called “co-speech” or
Prosody directly encodes the dynamics of how thought                    “speech-associated” gestures, (for an overview, see Kendon,
unfolds during discourse production. There is a set of                  2004) consists of less conventional and more context-
prosodic phenomena, including pausing, intonation                       sensitive gestures. Illustrative gestures are incomparably
contours, tempo patterns, loudness patterns, and accent                 more common in natural discourse (Nikolaeva, 2013). It is
placement, that converge in a unit of speech variously                  well established that illustrative gestures substantially
dubbed syntagm (Shcherba, 1955), intonation unit (Chafe,                participate in conveying a message from the speaker to the
1994), prosodic unit (e.g. Genetti & Slater, 2004), etc. We             addressee (Cassell et al., 1999; Melinger & Levelt, 2004;
prefer the term elementary discourse unit (EDU), see Kibrik             Hostetter, 2011; Hall & Knapp eds., 2013). We posit the
& Podlesskaya eds., 2009; Kibrik, 2011. EDUs are building               following major kinds of illustrative gestures: depictive
blocks, or quanta, of spoken discourse. They are coordinated            (“iconic” + “metaphoric” in McNeill, 1992; “descriptive” in
with breathing: one EDU is normally produced during an                  Kendon, 2004), metadiscursive (“pragmatic” in Payrató &
exhalation, and boundary pauses coincide with an                        Tessendorf, 2014), pointing (“deictic” in McNeill, 1992),
inhalation. EDUs are linguistic representations of successive           and beats (“batons” in Efron, 1941/1972).
cognitive states, termed foci of consciousness in Chafe,                   This study is primarily limited to depictive gestures,
1994. EDU identification in speech is a procedure based on              because they are particularly frequent in our corpus (59%)
expert assessment. Well trained transcribers of spoken                  and contribute semantically (either in a redundant or in a
discourse strongly agree in EDU segmentation.                           complementary fashion) to the propositional content
  A remarkable fact about EDUs is their significant                     conveyed in the corresponding verbal component. Depictive
correlation with clauses. In a number of studies of various             gestures represent objects or act out events/states. Consider
languages (Chafe, 1994 for English; Matsumoto, 2003 for                 two initial EDUs from ex. 3 in the Appendix. EDU #17 tam
Japanese; Genetti & Slater, 2004 for Newari; Wouk, 2008                 derevo ‘there is a tree’ is accompanied by the following
for Sasak; Kibrik & Podlesskaya eds., 2009 for Russian,                 depictive gesture: the right hand palm faces down, fingers
inter alia) the share of EDUs coinciding with clauses was               are half curled and widely spaced, the right hand moves up
found to vary between 50% and 70%. In the following                     in front of the speaker’s face, the left hand palm faces up at
example         (from      the      same       text;      see           the chest level, with fingers half curled. EDU #18 k derevu
spokencorpora.ru/showtranshelp.py        for    transcription           prižata lestnica ‘to the tree a ladder is pressed’ is
conventions) lines #12 and #14 are clausal EDUs, while line             accompanied by two identical depictive gestures, the first of
#13 is a parcellated adjunct semantically belonging to the              which cooccurs with the initial pause, and the second with
preceding clause but expressed with a subclausal EDU:                   the word lestnica ‘ladder’: the right hand faces the listener,
                                                                        fingers half curled, moves along a slanted line from the
00:22.9   12 ····(1.0) /My friend stood up /behind his \desk,           center right and down, the left hand remains at the chest
00:26.0   13 ··(0.2) in his /\fu-ull \f-four \–stripes,                 level, faces up, with half curled fingers. Our dataset also
00:28.0   14 and \said:                                                 includes metadiscursive gestures (see ex. 4) that
                                                                        demonstrate more recurrent properties compared to the
   Properties of EDUs have clear parallels in goal-directed             depictive gestures, but still are a lot more variable than the
behavior of non-human mammals. The exploratory                          emblems.
movement of rodents in a new environment is organized in                   We use the term gesture to refer to the basic unit of co-
quanta (runs); runs are identified through initial acceleration         speech gesticulation. Gesture is a communicatively
and final deceleration, they are targeted at an                         significant manual movement, characterized by a unified
informationally rich goal (analog of primary accent in                  pattern that includes trajectory, handshape and position, as
discourse segments), they are separated by periods of                   well as other features. According to Kendon (1980, 2004)
freezing, etc. (see e.g. Kafkafi et al., 2001, Cherepov &               and McNeill (1992), the gestural structure includes units,
Anokhin, 2008). These similarities suggest that the                     phrases, and phases. The gesture unit (G-unit) “begins the
quantized structure of discourse and its specific prosodic              moment the limb begins to move and ends when it has
aspects have deep behavioral, neurocognitive, and                       reached a rest position again” (McNeill, 1992: 83). A G-
evolutionary roots.                                                     phrase consists of the following phases: a non-obligatory
                                                                        preparation, a non-obligatory pre-stroke hold, an obligatory
                4. The gestural channel                                 stroke, a non-obligatory post-stroke hold, while a retraction
                                                                        (or recovery) is a part of G-unit (Kendon, 1980, 2004).
In the human kinetic-visual behavior, manual gesticulation
                                                                        There can be one or more G-phrases in a G-unit. Our
plays a particularly important role. There are two widely
                                                                        understanding of “gesture” is close to G-phrase, but unlike
accepted polar kinds of manual gestures. First, “emblems”
                                                                        the latter a gesture may include (though not obligatorily) a
(Efron, 1941/1972; Ekman & Friesen, 1969), also named
“autonomous” (Kendon, 1983), or “quotable” gestures



                                                                  663
retraction phase. In other words, a gesture ends either when             1994, Genetti & Slater, 2004, Kibrik, 2008, 2011). Spoken
the rest position is resumed or when another gesture begins.             sentence is established on the basis of prosodic criteria, such
                                                                         as target tone level (so-called period intonation), and
            5. Coordination of basic units                               functions as a structural unit larger than an EDU but shorter
A key issue in the research program of multimodal                        than an episode. Cognitively, in Chafe’s (1994: 148) terms,
linguistics is the question of coordination between the                  a sentence is verbalization of a “superfocus of
verbal, prosodic, and gestural channels. If we see discourse             consсiousness”. Is there a correlate of prosodic sentence in
as a fundamentally multimodal process, we need to identify               the gestural channel?
a unified basic unit of this process. A possible approach is to            By default, co-speech gestures are independent of each
select one of the already established units as the basic one.            other. However, McNeill et al. (2001) discovered what can
As has been shown in section 3, EDU is a good candidate                  be called gesture assimilation. Some gestures are organized
for this role, particularly because of its close connection              in series with repeated properties. McNeill et al. (2001)
with the quanta of non-linguistic behavior. Also note that               differentiate between the following two phenomena:
prosody, serving as the source of criteria for EDU                        in so-called catchments, formal properties of gestures
identification, is the ontogenetically earliest communication              (such as location in space, handshape and trajectory, etc.)
channel (see e.g. Crystal 1979, Blake 2000), preceding not                 may be repeated from one gesture to another, formal
only segmental speech but also gesticulation. We already                   similarity conveying certain repeated semantic features;
know that EDUs strongly correlate with clauses. How do                    in gesture inertia, formal properties are shared in a series
EDUs relate to gestures?                                                   of gestures, but no semantic relatedness may be observed.
   We explored this question on the basis of 14 Russian
retellings of the Pear Film (Chafe, 1980), videorecorded and                Fig. 1 illustrates four gestures, two of which accompany
transcribed. Transcription, including temporal dynamics,                 EDU #9 and two accompany EDUs #10–11 in example 1.
pausing, annotation of EDUs, and other prosodic                          These gestures depict:
phenomena, was done with the help of the PRAAT program                    Fig. 1a — the abundance of pears;
(www.fon.hum.uva.nl/praat). Gesture annotation was done                   Fig. 1b — self-directed movement, putting pears into the
in the ELAN program (www.lat-mpi.eu/tools/elan). A                         apron;
requirement observed in this work was independent                         Fig. 1c — downward movement with the pears;
annotation of clauses, EDUs, and gestures. The corpus                     Fig. 1d — outward movement of the pears,
consists of 37 minutes of videorecording, 1232 EDUs, and                   corresponding to the verb vykladyval ‘was taking out’.
705 gestures (414 of which are depictive).                                  The uniform hand configuration with the slightly curled
   We found that a prototypical EDU cooccurs with one                    fingers depicts pears in the gardener’s hands (Nikolaeva,
depictive gesture, about 20% of EDUs cooccur with more                   2013). This is an instance of catchment.
than one gesture, see ex. 1: 91; ex. 3: 18, 19 in the
Appendix. This reminds of the well-known generalization:
“A general rule is one gesture, one clause <...> some clauses
have more than one gesture and some gestures cover more
than one clause” (McNeill, 1992: 94).
   Typically (approx. 90%), a depictive gesture falls within
the temporal bounds of a single EDU. We also found that
depictive gestures often (approx. 60%) cooccur with a                       a                 b              c              d
whole EDU (ex. 3: 20, 21). When a gesture is shorter than
the corresponding EDU, it is often temporally coordinated                                     Figure 1. Catchment.
with the later part of the EDU, that is the typical locus of               Catchments as series of gestures are possible candidates
rhematic information (ex. 3: 17, 18). We can thus specify                for gestural correlates of prosodic sentences. Out data
McNeill’s claim, positing not just the relatedness of gestures           includes about 150 instances of catchments. They split into
to the vocal part of a message, but also a high degree of                two groups of equal size. In the first group, each gesture
temporal coordination between gestures and EDUs.                         falls within the bounds of the corresponding EDU, and the
                                                                         boundaries of the gesture series coinсide with the
           6. Coordination of larger units                               boundaries of the prosodic sentence, cf. ex. 3. These kinds
EDU being the basic unit of talk, there are higher order                 of instances apparently support the coordination between
units, too. In particular, in various languages spoken                   the prosodic and gesture units. In the second group, a
correlates of written sentence have been found (Chafe,                   gesture series is coordinated with a certain part of a prosodic
                                                                         sentence (ex. 1; ex. 4). Looking into the second kind of
                                                                         instances more closely, it turns out that they mark the most
  1
    Here and below, the number after the colon refers to the EDU         informationally rich parts of sentences (ex. 4: 75, 76),
number within the given example. Examples are provided in the            whereas some other EDUs of the sentence are accompanied
Appendix.                                                                by independent gestures — ex. 4: 71 demonstrates two



                                                                   664
metadiscursive gestures “palm up, open hand” illustrating                 Even though we are looking for structure and units in
the process of information transfer (conduit metaphor).                 discourse, those should not be understood in the sense of
Overall, catchments are coordinated with prosodic                       absolute discreteness. Units, or quanta, do exist, but the
sentences. Given that catchments are a special case of G-               boundaries between them are typically less than discrete.
units (see section 4 above), we hypothesize that                        There are many instances of outliers and hybrids that
coordination with prosodic sentences can be extended to G-              complicate crisp and neat unit boundaries. As is shown by
units in general. This latter point requires further                    Kibrik (2015), this property of discourse structure is
investigation.                                                          common with other levels of language, as well as cognition
  Turning to gesture inertia, consider Fig. 2 that illustrates          in general. Non-discrete effects abound both between
three gestures, accompanying the three EDUs in example 2.               syntagmatic units and between paradigmatic types. This
These gestures depict:                                                  resonates with McNeill’s (2005) suggestion that gestures
 Fig. 2a — the sudden halt;                                            may be classified into dimensions rather than discrete
 Fig. 2b — the falling bicycle;                                        categories, and a given gesture may, for instance, combine
 Fig. 2c — the falling hat (a gesture similar in                       features of a depictive and a pointing gesture.
  configuration and trajectory to the previous one but with a
  larger amplitude).                                                                      Acknowledgment
  In this case gesture assimilation is only formal, in contrast           This study is supported by the Russian Science
to catchments, in which similar gestures contain shared                 Foundation (grant #14-18-03819).
semantic features.
                                                                                               References
                                                                        Adolphs, S., & Carter, R. (2013). Spoken corpus linguistics:
                                                                          From monomodal to multimodal. N.-Y.: Routledge.
                                                                        Blake, J. (2000). Routes to Child Language: Evolutionary
                                                                          and Developmental Precursors. Cambridge: CUP.
                                                                        Carlson, L., Marcu, D., & Okurowski, M. E. (2003).
                                                                          Building a discourse-tagged corpus in the framework of
   a                     b                      c                         Rhetorical Structure Theory. In J. van Kuppevelt &
                   Figure 2. Gesture inertia.                             R. Smith (Eds.), Current and new directions in discourse
                                                                          and dialogue. Dordrecht: Kluwer.
  In a first approximation, infrequent instances of gesture
                                                                        Cassell, J., McNeill, D., & McCullough, K. E. (1999).
inertia appear to be coordinated with the unit of discourse
                                                                          Speech-gesture mismatches: Evidence for one underlying
known as episode (van Dijk, 1981). We are not aware of
                                                                          representation     of   linguistic   and    non-linguistic
robust methods of episode identification, either semantic or
                                                                          information. Pragmatics and Cognition, 7(1), 1–33.
prosodic, so we have identified episodes intuitively.
                                                                        Chafe, W. (1994). Discourse, consciousness, and time. The
Example 2 illustrates a typical situation, in which gesture
                                                                          flow and displacement of conscious experience in
inertia is a series of gestures bridging a sentence boundary
                                                                          speaking and writing. Chicago: University of Chicago
and joining a group of EDUs that qualifies as a small
                                                                          Press.
episode.
                                                                        Chafe, W. (Ed.) (1980). The pear stories: Cognitive,
                                                                          cultural, and linguistic aspects of narrative production.
                      7. Conclusion                                       Norwood, N.J.: Ablex.
We have found that the basic units of the three channels of             Cherepov, A., & Anokhin, K. (2008). Development of
multimodal discourse — verbal, prosodic, and gestural —                   automatic analysis and recognition of mouse behavior by
are coordinated between each other. More specifically, the                segmentation and t-pattern method using video tracking.
prosodically identified elementary discourse unit can be                  Proceedings of Measuring Behavior 2008 (pp. 253–254).
shown to be coordinated with the verbal channel and with                  Maastricht, The Netherlands, August 26–29, 2008.
the gestural channel. We have chosen the prosodic unit as               Crystal. D. (1979). Prosodic development. In P.J. Fletcher &
the central one because it is established on the basis of                 M.A. Garman (Eds.), Language acquisition (pp. 33–48).
general behavioral criteria. Unlike gesture, prosody is                   Cambridge: CUP. (2nd edn., 1986, pp. 174–97.)
always present in talk. In the studies reported in Kibrik &             Efron, D. (1941/1972). Gestures, race and culture. The
Molchanova, 2013 it turned out difficult to individually                  Hague: Mouton.
separate the verbal channel, as talking inevitably involves             Ekman, P., & Friesen, W. V. (1969). The repertoire of
prosody.                                                                  nonverbal behavior: Categories, origins, usage, and
  Apart from basic units, we have also discussed larger                   coding. Semiotica, 1, 49–98.
units of spoken discourse. It appears that prosodically                 Ford, C. E., Thompson, S. A., & Drake, V. (2012). Bodily-
identified sentences and episodes are coordinated with                    visual practices and turn continuation. Discourse
gesture series known as catchment and inertia.                            Processes, 49(3-4), 192–212.



                                                                  665
Ford, C. E., Fox, B., & Thompson, S. A. (2013). Units                  Slavic linguistics in a cognitive framework. N.Y.: Peter
  and/or action trajectories? The language of grammatical              Lang.
  categories and the language of social action. In                   Kibrik, A. A. (2015). The problem of non-discreteness and
  B. Szczepek Reed & G. Raymond (Eds.), Units of talk –                spoken discourse structure. Computational Linguistics
  Units of action. Amsterdam: Benjamins.                               and Intelligent Technologies, 14, vol. 1, 225–233.
Genetti, C., & Slater, K. (2004). An analysis of syntax and          Kibrik, A. A., & Podlesskaja, V. I. (Eds.) (2009). Rasskazy
  prosody interactions in a Dolakhā Newar: Rendition of                o snovidenijax: Korpusnoe issledovanie ustnogo russkogo
  the Mahābhārata (with appendices and sound files).                   diskursa [Night Dream Stories: A corpus study of spoken
  Himalayan Linguistics, 3, 1–91.                                      Russian discourse]. Moscow: Jazyki slavjanskix kul'tur.
Gibbon, D., Mertins, I., & Moore, R. K. (Eds.) (2000).               Kibrik, A. A., & Molchanova, N. B. (2013). Channels of
  Handbook of multimodal and spoken dialogue systems:                  multimodal communication: Relative contributions to
  Resources, terminology and product evaluation. Berlin:               discourse understanding. In M. Knauff, M. Pauen,
  Springer.                                                            N. Sebanz, & I. Wachsmuth (Eds.), Proceedings of the
Givón, T. (2009). Multiple routes to clause union: The                 35th Annual Conference of the Cognitive Science Society
  diachrony of complex verb phrases. In T. Givón &                     (pp. 2704–2709). Austin, TX: Cognitive Science Society.
  M. Shibatani (Eds.), Syntactic complexity: Diachrony,              Knight, D. (2011). Multimodality and active listenership: A
  acquisition, neuro-cognition, evolution. Amsterdam:                  corpus approach. London: Bloomsbury.
  Benjamins.                                                         Kodzasov, S. V. (2009). Issledovanija v oblasti russkoj
Goldin-Meadow, S. (2014). Widening the lens: What the                  prosodii [Studies in the field of Russian prosody].
  manual modality reveals about language, learning, and                Moscow: Jazyki slavjanskix kul’tur.
  cognition. Philosophical Transactions of the Royal                 Kress, G. (2002). The multimodal landscape of
  society, 369.                                                        communication. Medien Journal, 4, 4–19.
Hall, J. A., & Knapp, M. L. (Eds.) (2013). Handbooks of              Kress, G. (2010). Multimodality: A social semiotic
  communication science: Nonverbal communication.                      approach to communication. London: Routledge Falmer.
  Berlin: De Gruyter Mouton.                                         Laury, R., & Ono, T. (2014). The limits of grammar: Clause
Hostetter, A. B. (2011).When do gestures communicate? A                combining in Finnish and Japanese conversation.
  meta-analysis. Psychological Bulletin, 137(2), 297–315.              Pragmatics, 24(3), 561–592.
Hugot, V. (2007). Eye gaze analysis in human-human                   Loehr, D. (2012). Temporal, structural, and pragmatic
  interactions. Master of science thesis. Stockholm,                   synchrony between intonation and gesture. Laboratory
  Sweden.                                                              Phonology, 3(1), 71–89.
Kafkafi, N., Mayo, C. L., Drai, D., Golani, D., &                    Mann, W. C., & Thompson, S. A. (1988). Rhetorical
  Elmer, G. I. (2001). Natural segmentation of the                     structure theory: Toward a functional theory of text
  locomotor behavior of drug-induced rats in a photobeam               organization. Text, 8(3), 243–281.
  cage. Journal of Neuroscience Methods, 109, 111–121.               Matsumoto, K. (2003). Intonation units in Japanese
Kendon, A. (1980). Gesticulation and speech: Two aspects               conversation. Amsterdam: John Benjamins.
  of the process of utterance. In M. R. Key (Ed.), The               McNeill, D. (1992). Hand and mind. Chicago: University of
  relation between verbal and nonverbal communication.                 Chicago Press.
  The Hague: Mouton.                                                 McNeill, D. (2005). Gesture and thought. Chicago:
Kendon, A. (1983). Gesture and speech. How they interact.              University of Chicago Press.
  In J. M. Wiemann & R. P. Harrison (Eds.), Nonverbal                McNeill, D., Quek, F., McCullough, K.-E., Duncan, S.,
  Interaction. Beverly Hills: Sage.                                    Furuyama, N., Bryll, R., Ma, X.-F., & Ansari, R. (2001).
Kendon, A. (1986). Some reasons for studying gesture.                  Catchments, prosody, and discourse. Gesture, 1, 9–33.
  Semiotica, 62, 3–28.                                               Melinger, A., & Levelt, W. J. M. (2004). Gesture and the
Kendon, A. (2004). Gesture. Visible action as utterance.               communicative intention of the speaker. Gesture, 4, 119–
  Cambridge: Cambridge University Press.                               141.
Kibrik, A. A. (2008). Est’ li predloženie v ustnoj reči [Is          Müller, C., Fricke, E., Cienki, A., McNeill, D. (Eds.)
  there a sentence in spoken speech]. In A. V. Arxipov,                (2014). Body – Language – Communication. Berlin:
  L. V. Zaxarov, A. A. Kibrik et al. (Eds.), Fonetika i                Mouton de Gruyter.
  nefonetika. K 70-letiju Sandro V. Kodzasova [Phonetics             Nikolaeva, Ju. V. (2013). Illustrativnyje žesty v russkom
  and non-phonetics. Festschrift for 70 of Sandro V.                   diskurse [Gesticulation in Russian discourse]. Diss. cand.
  Kodzasov]. Moscow: Jazyki slavjanskix kul’tur.                       philol. science. Moscow, Russia.
Kibrik, A. A. (2010). Mul’timodal’naja lingvistika                   Payrató, L., & Tessendorf, S. (2014). Pragmatic gestures. In
  [Multimodal      linguistics].   In    Yu. I. Aleksandrov,           Müller, C., Fricke, E., Cienki, A., McNeill, D. (Eds.)
  V. D. Solov’jev (Eds.), Kognitivnyje issledovanija                   Body – Language – Communication. Berlin: Mouton de
  [Cognitive studies], IV. Moscow: Institute of psychology.            Gruyter.
Kibrik, A. A. (2011). Cognitive discourse analysis: Local
  discourse structure. In M. Grygiel and L. A. Janda (Eds.),



                                                               666
Shcherba, L. V. (1955). Fonetika francuzskogo jazyka                     van Dijk, T. (1981). Episodes as units of discourse analysis.
  [French phonetics]. Moscow: Izdatel'stvo literatury na                   In D. Tannen (Ed.), Analyzing discourse: Text and talk.
  inostrannyx jazykax.                                                     Georgetown: Georgetown University Press.
So, W. C., Kita, S., & Goldin-Meadow, S. (2009). Using the               Wolf, F., & Gibson, E. (2005). Representing discourse
  hands to identify who does what to whom: Gesture and                     coherence: A corpus-based study. Computational
  speech go hand-in-hand. Cognitive Science, 33, 115–125.                  Linguistics, 31(2), 249–287.
                                                                         Wouk, F. (2008). The syntax of intonation units in Sasak.
                                                                           Studies in Language, 32, 137–162.

                                                     Appendix. Examples2
 1    time, s   EDU #      Transcript                                                                   gesture type
      00:16     7          [···(0.5) u nego stojalo tri korziny] s grušami,                             depictive
                           ‘[he had three baskets] with pears,
      00:18     8          i on {[podnimalsja] na lestnicu,                                             depictive
                           and he {[was climbing up] the ladder,
      00:20     9          [··(0.3) sobiral eti gruši] v [əə(0.3) fartuk],                              depictive, depictive
                           [was collecting these pears] into [the apron],
      00:22     10         [··(0.2) spu][skalsja                                                        depictive
                           [was climbing] [down
      00:23     11         i vykladyval]} eti gruši v korzinu.                                          depictive
                           and was taking out]} these pears into the basket.’

 2    time, s   EDU #      Transcript                                                                   gesture type
      00:59     29         <[···(0.8) i ɯɯɯ(0.8) ego velosiped] vre= vrezalsja v kamen'.                depictive
                           ‘<[and his bicycle] ran into a rock.
      01:02     30         ··(0.4) [on] upal,                                                           depictive
                           [he] fell down,
      01:04     31         ···(0.7) [s nego sletela] šljapa.>                                           depictive
                           his hat fell off. (lit. [from him fell] the hat.>)’

 3    time, s   EDU #      Transcript                                                                   gesture type
      00:29     17         tam {[derevo],                                                               depictive
                           ‘there is {[a tree],
      00:30     18         [····(1.2)] k derevu prižata [lestnica],                                     depictive, depictive
                           [ ] to the tree [a ladder] is pressed,
      00:32     19         [i vnizu lestnicy stojat] [tri korzinki],                                    depictive, depictive
                           [and under the ladder there are] [three baskets],
      00:34     20         [dve iz kotoryx polnyje gruš],                                               depictive
                           [two of which are full of pears],
      00:36     21         [a vtoraja pustaja].}                                                        depictive
                           [and the second one is empty].}’

 4    time, s   EDU #      Transcript                                                                   gesture type
      02:24     71         ···(0.6) əəə(0.6) [əəə(0.7) əəə(0.6)] [əəə(0.8) i vdrug] pered nim ··(0.2)   meta, meta
                           okazyvajutsja ··(0.1) neskol’ko ··(0.1) parnej,
                           ‘[ ] [and suddenly] in front of him show up a few guys,
      02:28     72         ···(0.6) troe,
                           three of them,
      02:29     73         ···(0.5) niotkuda,
                           from nowhere,
      02:30     74         neponjatno otkuda vzjavšixsja,
                           not clear where they are coming from,
      02:31     75         i oni {[··(0.2) načinajut sobirat’ eti gruši],                               depictive
                           and they {[begin picking up these pears],
      02:33     76         i [pomogat’ emu skladyvat’]} v korzinu.                                      depictive
                           and [helping him put them]} into the basket.’


  2
    Notation in examples: Dots followed by decimal numbers — absolute pauses and their length in seconds; əə(0.3) and ɯɯɯ(0.8) —
plain and nasal filled pauses; symbol = indicates a truncated word; comma indicates a non-sentence final EDU, period a sentence-final
EDU; square brackets indicate the boundaries of individual gestures, curly brackets — catchments, angle brackets — gesture inertia.



                                                                   667