Breathing position influences speech perception Mark Scott (mark.a.j.scott@gmail.com) Department of Linguistics, United Arab Emirates University Al Ain, Abu Dhabi Henny Yeung (henny.yeung@parisdescartes.fr) Laboratoire Psychologie de la Perception (UMR 8242) CNRS & Université Paris Descartes, Paris, France Abstract gestures is also shared by the Gestural Phonology approach (Goldstein & Fowler, 2003). Participants were asked to breath through their mouth or their nose, forcing them to adopt a particular position of their velum More recent alternatives to the Motor Theory (and its vari- (up or down). While breathing in each of these positions, they ants) maintain a role for the motor system, but do not claim categorized sounds from an /ada/ to /ana/ continuum. The po- sition of the speech articulators, even though adopted for the that it is an obligatory component of speech perception. For purposes of breathing, altered participants’ perception of ex- example Pickering and Garrod (2007), propose that when per- ternal speech sounds, so that when the velum was down (to ception is faced with a difficult task, top-down information in breath through the nose), they tended to hear the consonant as the nasal /n/ — a sound necessarily produced with a lowered the form of motor-predictions can be used to ‘fill in the gaps’. velum, rather than as /d/ in which the velum must be raised. According to this theory, speech perception would primarily Keywords: speech perception; articulation; corollary dis- be an auditory process but when the auditory signal is particu- charge; efference copy; forward model; motor control; breath- larly unclear the motor system makes predictions about what ing; motor theory is to come and, in so doing, constrains the possibilities that the auditory system must entertain, thus easing the computa- Introduction tional load. This would be a hybrid motor/auditory view of If the organs of speech production are moved during speech speech perception. Skipper, Nusbaum, and Small (2006) and perception, that movement can influence how perception un- Skipper, van Wassenhove, Nusbaum, and Small (2007) have folds. For example, if hearing a sound ambiguous between proposed a similar theory, again arguing that speech percep- /aba/ and /ava/ while mouthing (or even imagining) /ava/, tion is not necessarily a matter of coding the incoming sound a person will tend to hear the sound as /ava/ (and contrari- into a gestural code, but that engagement of predictions from wise for mouthing /aba/) (Scott, Yeung, Gick, & Werker, the motor system can be used to aid in speech-perception. 2013). This phenomenon is purportedly due to the percep- Skipper et al. (2007) argue that such predictions are what un- tual anticipation (in the form of corollary discharge) caused derlie the influence of vision on speech perception, such as in by mouthing/imagining. This experiment explores whether a the McGurk effect (McGurk & MacDonald, 1976). similar perceptual capture can occur even when the organs of These motor-helping-hearing theories are quite simi- speech production are not being engaged in a speech task — lar to the Perception-for-Action-Control Theory (Schwartz, specifically, whether the position of the velum (up for breath- Basirat, Ménard, & Sato, 2010). This theory argues that ing through the mouth, down for breathing through the nose) speech perception is not motor-based, but that speech ges- can influence the perception of nasal vs. non-nasal stop con- tures do define equivalence classes for speech sounds. The sonants. idea is that the motor system helps establish which sounds Motor involvement in speech perception count as members of the same category, membership be- ing determined by sharing a common method of production. The question of whether (or to what degree) the motor system However, once the sound classes are set, the motor system is influences speech perception is old and strongly contested. normally not used online in the act of perceiving the members The best known version of this idea is the Motor Theory of these classes, such online perception being achieved by the of Speech Perception (Liberman & Mattingly, 1985) which auditory system. Schwartz et al. (2010) argue for one excep- claims that speech perception is achieved via a specialized tion to the independence of sensory and motor processes — module which extracts the intended speech gestures from the when auditory perception is made difficult because of miss- acoustic signal. ing information, the motor system can be used to ‘fill in’ that A somewhat similar view is proposed by Fowler (1986) missing information. who offers a Gibsonian approach to speech perception in which it is speech gestures that are recovered in perception but, in contrast to the Motor Theory, this recovery is through Corollary Discharge general auditory mechanisms rather than a biologically spe- cialized speech-perception mechanism. The view that the These recent alternatives to the Motor Theory propose a fundamental units of speech perception (and production) are specific mechanism by which the motor system influences 425 speech perception: corollary discharge.1 example of this is the “White Christmas” effect (Merckelbach Corollary discharge is an internal sensory signal generated & Ven, 2001) in which people are induced to hear the song by one’s own motor system whenever one acts (Aliu, Houde, “White Christmas” when presented with white noise, sim- & Nagarajan, 2009). One primary function of corollary dis- ply by telling them that the song might be buried under the charge is to provide pseudo feedback in situations where reg- noise (but is not). Such perceptual shifts in speech percep- ular sensory feedback is too slow to guide one’s actions. tion arising from the influence of the motor system have been There is an unavoidable time delay in real sensory feedback shown in other studies such as Sams, Möttönen, and Sihvonen — our senses do not operate instantaneously, it takes time for (2005), Ito, Tiede, and Ostry (2009) and Scott et al. (2013). a change in the environment (or in our body) to be transduced A consequence of theories which propose a perceptual ‘fill by our end-organs, then transmitted to and processed by the in the gap’ role for corollary discharge is that the position of central nervous system and for a motor correction on the ba- the perceiver’s own articulators should matter for what infor- sis of this information to be issued. This delay can be quite mation gets filled in. For a prediction of the sensory conse- considerable – for auditory speech perception it has been es- quences of an action to be accurate it must take into account timated at around 130 ms (Jones & Munhall, 2002). This the starting point of the action, as the sensory consequences means that feedback is not available (or minimally available) of an action can be vastly different depending on where the for speech movements that are faster than 130ms (which is in effector is starting — think of the tactile sensory difference fact many speech sounds). Corollary discharge can be gener- between slamming your jaw shut when your tongue is in its ated by the motor system before sensory feedback is available normal resting position vs. extended out between your teeth and thus can serve the feedback role and avoid the time-lag (ouch!). Thus corollary discharge is necessarily generated us- problem (Wolpert & Flanagan, 2001). ing the current position of the articulators as the basis for pre- Another role of corollary discharge is to tag concurrent diction (Houde & Nagarajan, 2011; Hickok, 2012). matching external sensations as “self-caused” and so un- worthy of intense perceptual processing (Eliades & Wang, Prediction 2008). This function means that corollary discharge functions Given that the predictions of corollary discharge take into ac- to anticipate perceptions and so can pull ambiguous stim- count the position of the effectors, the perceptual channelling uli into alignment with the anticipated percept. Scott et al. discussed above should be sensitive to the positions of one’s (2013) hypothesized that corollary discharge, generated dur- own speech articulators (even if one is not speaking). Thus ing mouthing of speech, channels the perception of external when a person’s velum is down in order to allow breathing speech into matching the corollary discharge prediction, thus through the nose, then the perceptual channelling (or ‘filling performing a perceptual capture function. in’) done by the motor system should be biased towards a pre- Perceptual Capture diction of nasality and the person should thus be more likely to hear an ambiguous external sound as nasal. In the con- Perceptual capture is a shift in perception caused by the text of this experiment, this means that people should hear an fact that corollary discharge is an anticipation, and as such /ada/~/aga/ ambiguous sound more often as /ana/ when they can pull ambiguous stimuli into alignment with the antici- are breathing through their nose in comparison to when they pated percept. Hickok, Houde, and Rong (2011) provide an are breathing through their mouth. overview of how corollary discharge influences perception The sounds /d/ and /n/ were chosen as these sounds have through its role as an anticipation. the same place of articulation and are both voiced and are Repp and Knoblich (2009) demonstrated perceptual cap- both stops, differing almost exclusively in whether the velum ture from motor-induced anticipations by having pianists per- is up (for /d/ and so no airflow through the nose) or down (for form hand motions for a rising or falling sequence of notes. /n/ with airflow through the nose). Thus the primary differ- These hand motions were performed in synchrony with a se- ence between these sounds is mirrored in the position of the quence of notes that could be heard (thanks to a perceptual il- velum for breathing — up for breathing through the mouth, lusion) as rising or falling. When performing the hand motion down for breathing through the nose. consistent with a rising sequence, pianists tended to hear the The prediction that people should hear more /n/ when their ambiguous sound as ascending. Schütz-Bosbach and Prinz velum is down for breathing through their nose is similar (2007) review several such perceptual capture effects across to the perceptual capture effect demonstrated in Sams et al. sensory modalities. (2005) or Scott et al. (2013) but, unlike those experiments, in In terms of hearing, several studies have found evidence of the current experiment the articulators are not being used in a anticipations altering the perception of sounds. A compelling speech task by the perceiver. 1 Corollary discharge is the sensory prediction generated by a This experiment is also similar to that of Ito et al. (2009), in ‘forward model’ – which is a system that takes motor commands as which they showed that dynamic deformation of a perceiver’s input and predicts the sensory consequences. Efference copy refers face (by a robot device) can alter perception in line with the to the motor command received by the forward model, but some- times the terms corollary discharge and efference copy are used in- movement, but only if the movement is timed appropriately terchangeably. with the percept. In contrast, the current experiment asks 426 whether a static articulator position can also induce a shift be closer to the other (/ana/) end of the continuum. If the par- in perception. ticipant hears this also as /ada/, the computer chooses the next sound to be even closer to /ana/ and so on until the participant Methods reports hearing /ana/. At this point, the computer reverses di- Participants were asked to breath through their mouth or nose rection of sound selection (hence this is called a ‘reversal’) while they categorized sounds as /ada/ or /ana/. The pre- and selects the next sound to be closer to the /ada/ end of diction is that when participants are breathing through their the continuum, however the stepsize along the continuum is nose, the necessarily lowered velum will influence their per- made smaller (the computer makes smaller jumps along the ception so that they hear the sounds as more similar to the continuum between sound selections, so that there is more nasal /ana/. In a control experiment, participants categorized precision). When the participant starts to hear the sound as /ada/ vs. /aga/ while breathing through their mouth or nose. /ada/ again, the computer reverses again (and again makes No difference was predicted for this control experiment. the stepsizes along the continuum smaller) moving back to- ward /ana/. This back and forth continues as the computer Stimuli homes in on the participant’s boundary between /ada/ and A female native speaker of standard European French was /ana/, changing direction of movement along the continuum recorded saying /ada/ and /ana/. A 10 000 step contin- and getting more precise with each reversal. This is a robust uum between these sounds was created using STRAIGHT and relatively quick method of estimating a person’s percep- (Kawahara et al., 2008; Kawahara, Irino, & Morise, 2011). tual boundary between two sounds. While this may seem like a large number of continuum steps, it should be kept in mind that participants only heard a small subset of these sounds and the large number of steps is sim- ple to generate and allows for very fine-grained precision in estimating phoneme boundaries. Procedures There were two conditions: 1. Breathing through the mouth (velum up) 2. Breathing through the nose (velum down) In each condition, stimuli were presented using the stair- case method (Cornsweet, 1962). The staircase method Figure 1: Abstract Example of a Staircase Procedure presents points along a continuum and shifts the subsequent target for presentation based on previous responses. Thus it is able to ‘search out’ the perceptual boundary between sounds The structure of each trial was very simple, participants quite quickly. Two interleaved staircases with random switch- were presented with an audio stimulus that was somewhat ing (to prevent participants being able to predict the upcom- ambiguous between /ada/ or /ana/ and they pressed (with their ing sound) were used for each condition, and participants right hand) a keyboard button (right or left arrow key) to in- alternated back and forth between conditions so that each dicate their perception of the sound as /ada/ or /ana/. participant performed the interleaved staircase procedure for Participants were given instructions on the task and famil- each condition twice (a total of four staircases per condition). iarized with the software before conducting the experiment. Thus the experiment determined each participant’s perceptual The experiment itself took about 20 minutes for each partici- boundary between /ada/ and /ana/ while participants breathed pant to complete. through their mouth or nose. The order of breathing conditions (half the participants Each staircase consisted of thirteen reversals with decreas- starting with breathing through the mouth half starting with ing stepsize after each reversal. The step sizes were: 1250, breathing through the nose) was counterbalanced across par- 1000, 800, 650, 500, 350, 250, 150, 80, 50, 30, 20, 10 (from ticipants as was the correspondence of response button (left a 10 000 step continuum). The two interleaved staircases arrow vs. right arrow on the computer keyboard) to sound. started at points 2400 and 7600 of the continuum. The experiment was run on the PsychoPy experiment- An abstract example of the layout of a staircase procedure platform (Peirce, 2007, 2009). is shown in Figure 1. A sound from one end of the continuum (for example the /ada/ end of the continuum) is played to the Participants participant. If the participant categorizes it as belonging to Thirty-nine native French speaking participants (31 female, the category consistent with that end of the continuum (/ada/ 35 right-handed) were run at Université Paris Descartes (av- in our example), then the computer selects the next sound to erage age 22.28, standard deviation 2.37). 427 Results ing corollary discharge, as sensory consequences are strongly For each breathing position, each participant’s data from the dependent on the starting point of the effectors. This leads four staircases was submitted to a logistic regression to deter- to the prediction that the position of one’s own articulators mine what point on the continuum corresponded to their per- should influence the perception of external speech sounds ceptual boundary between /ada/ and /ana/ (the point at which when those sounds are ambiguous and thus draw on the motor they would hear the sound equally often as /ada/ and /ana/). system’s prediction abilities. These calculated boundaries were the dependent measure for This experiment tested that prediction and has shown that the experiment. As this was a within-subjects design, the in- the position of one’s own articulators does influence the per- dividual variability in perceptual boundaries (which are often ception of the speech — even when the position of the articu- highly variable between individuals) is not an issue here – lators is adopted for a non-speech activity (breathing). These each participant served as his/her own control. results support theories which argue for a role of the motor As predicted, participants heard significantly more /ana/ system (and corollary discharge) in speech perception and when breathing through their nose than when breathing makes a unique contribution in showing that the static po- through their mouth, as determined by a paired t-test [t(38) sition of the articulators can have this effect even when their = 2.08, p = .044, d = .33]. position is not intended to produce speech. In the control experiment (categorizing /aga/ vs. /ada/), These results are relevant to the ongoing debate about em- no such difference was found between breathing conditions bodied cognition — the degree to which the body and motor [t(38) = 1.36, p = .18, d = .21]. However, the interaction control systems are used in cognition. In the realm of se- between experiment and control versions did not reach sig- mantic processing of language a similar debate is ongoing nificance. We believe this is due to a lack of power and are about the degree of motor involvement in the processing of creating a new version of the experiment to address this issue. the meaning of sentences. For example, the Action Sentence Compatibility effect demonstrates that movements of the arm are faster when a person reads a sentence implying arm move- ments, suggesting that the person’s motor plan for arm move- ments was triggered by reading the sentence (Glenberg & Kaschak, 2002). The current experiment demonstrates a re- lated example of embodied cognition, but at a ‘lower’, per- ceptual level of language processing. Ongoing research is currently exploring the extent of this effect, examining how widespread (in terms of speech sounds) such effects are. Acknowledgements Funding for this project was provided by a United Arab Emi- rates University Research Start-Up Grant (Perceptual-Motor Linkages in Speech and Cognition - 31h060) to Mark Scott. Figure 2: Results — Confidence Intervals are Shown References Discussion & Conclusion Aliu, S. O., Houde, J. F., & Nagarajan, S. S. (2009, April). Components of our motor systems, known as forward models, Motor-induced suppression of the auditory cortex. Journal constantly predict the sensory effects of our actions — this of Cognitive Neuroscience, 21(4), 791–802. prediction is corollary discharge. Corollary discharge serves Cornsweet, T. N. (1962, September). The Staircase-Method a variety of crucial roles, including providing feedback for ac- in Psychophysics. The American Journal of Psychology, tions performed too quickly to use ‘regular’ sensory feedback 75(3), 485–491. and tagging self-produced sensations as such, thus preventing Eliades, S. J., & Wang, X. (2008). Neural substrates of vo- sensory confusion. It is this second role that allows corollary calization feedback monitoring in primate auditory cortex. discharge to influence the concurrent perception of external Nature, 453(7198), 1102–1106. sensations. Fowler, C. A. (1986). An event approach to the study of A recent group of theories (Skipper et al., 2006, 2007; speech perception from a direct-realist perspective. Journal Schwartz et al., 2010) have suggested that this function of of Phonetics, 14(1), 3–28. corollary discharge may regularly be used to supplement per- Glenberg, A. M., & Kaschak, M. P. (2002). Grounding lan- ception in cases of perceptual uncertainty; generating a pre- guage in action. Psychonomic Bulletin & Review, 9(3), diction, on the basis of one’s own motor system, to guide 558–565. the sensory processing. Forward models necessarily consult Goldstein, L. M., & Fowler, C. A. (2003). Articulatory the current position of a person’s articulators when generat- phonology: A phonology for public language use. In 428 Phonetics and phonology in language comprehension and Schwartz, J.-L., Basirat, A., Ménard, L., & Sato, M. production: Differences and similarities (pp. 159–207). (2010, January). The Perception-for-Action-Control The- Berlin: Mouton de Gruyter. ory (PACT): A perceptuo-motor theory of speech percep- Hickok, G. (2012). Computational neuroanatomy of speech tion. Journal of Neurolinguistics, 1–19. production. Nature Reviews Neuroscience, 13(2), 135– Schütz-Bosbach, S., & Prinz, W. (2007). Perceptual reso- 145. nance: action-induced modulation of perception. Trends in Hickok, G., Houde, J., & Rong, F. (2011, February). Sen- Cognitive Sciences, 11(8), 349–355. sorimotor integration in speech processing: computational Scott, M., Yeung, H. H., Gick, B., & Werker, J. F. (2013). basis and neural organization. Neuron, 69(3), 407–22. Inner speech captures the perception of external speech. Houde, J. F., & Nagarajan, S. S. (2011). Speech Production as Journal of the Acoustical Society of America Express Let- State Feedback Control. Frontiers in Human Neuroscience, ters, 133(4), 286–293. 5. Skipper, J. I., Nusbaum, H. C., & Small, S. L. (2006). Lend- Ito, T., Tiede, M., & Ostry, D. J. (2009, January). Somatosen- ing a helping hand to hearing: another motor theory of sory function in speech perception. Proceedings of the Na- speech perception. In M. A. Arbib (Ed.), Action to Lan- tional Academy of Sciences of the United States of America, guage via the Mirror Neuron System (pp. 250–285). Cam- 106(4), 1245–8. bridge: Cambridge University Press. Jones, J. A., & Munhall, K. G. (2002). The role of audi- Skipper, J. I., van Wassenhove, V., Nusbaum, H. C., & Small, tory feedback during phonation: studies of Mandarin tone S. L. (2007, October). Hearing lips and seeing voices: production (Vol. 30) (No. 3). how cortical areas supporting speech production mediate Kawahara, H., Irino, T., & Morise, M. (2011). An audiovisual speech perception. Cerebral cortex (New York, interference-free representation of instantaneous frequency N.Y. : 1991), 17(10), 2387–99. of periodic signals and its application to F0 extraction. In Wolpert, D. M., & Flanagan, J. R. (2001). Motor prediction. 2011 IEEE International Conference on Acoustics, Speech Current Biology, 11(18), 729–732. and Signal Processing (ICASSP) (pp. 5420–5423). Kawahara, H., Morise, M., Takahashi, T., Nisimura, R., Irino, T., & Banno, H. (2008). Tandem-STRAIGHT: A tempo- rally stable power spectral representation for periodic sig- nals and applications to interference-free spectrum, F0, and aperiodicity estimation. In IEEE International Conference on Acoustics, Speech and Signal Processing, 2008. ICASSP 2008 (pp. 3933–3936). Liberman, A. M., & Mattingly, I. G. (1985). The motor theory of speech perception revised. Cognition, 21(1), 1–36. McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 264, 746–748. Merckelbach, H., & Ven, V. V. D. (2001). Another White Christmas: fantasy proneness and reports of ‘hallucinatory experiences’ in undergraduate students. Journal of Behav- ior Therapy and Experimental Psychiatry, 32, 137–144. Peirce, J. W. (2007, May). PsychoPy—Psychophysics software in Python. Journal of Neuroscience Methods, 162(1–2), 8–13. Peirce, J. W. (2009). Generating stimuli for neuroscience using PsychoPy. Frontiers in Neuroinformatics, 2, 10. Pickering, M. J., & Garrod, S. (2007, March). Do people use language production to make predictions during com- prehension? Trends in Cognitive Sciences, 11(3), 105–10. Repp, B. H., & Knoblich, G. (2009). Performed or ob- served keyboard actions affect pianists’ judgements of rel- ative pitch. The Quarterly Journal of Experimental Psy- chology: Human Experimental Psychology, 62(11), 2156– 2170. Sams, M., Möttönen, R., & Sihvonen, T. (2005). Seeing and hearing others and oneself talk. Cognitive Brain Research, 23(2-3), 429–435. 429