=Paper= {{Paper |id=Vol-1419/paper0068 |storemode=property |title=Breathing Position Influences Speech Perception |pdfUrl=https://ceur-ws.org/Vol-1419/paper0068.pdf |volume=Vol-1419 |dblpUrl=https://dblp.org/rec/conf/eapcogsci/ScottY15 }} ==Breathing Position Influences Speech Perception== https://ceur-ws.org/Vol-1419/paper0068.pdf
                             Breathing position influences speech perception
                                            Mark Scott (mark.a.j.scott@gmail.com)
                                    Department of Linguistics, United Arab Emirates University
                                                       Al Ain, Abu Dhabi

                                        Henny Yeung (henny.yeung@parisdescartes.fr)
                                        Laboratoire Psychologie de la Perception (UMR 8242)
                                          CNRS & Université Paris Descartes, Paris, France


                            Abstract                                       gestures is also shared by the Gestural Phonology approach
                                                                           (Goldstein & Fowler, 2003).
  Participants were asked to breath through their mouth or their
  nose, forcing them to adopt a particular position of their velum            More recent alternatives to the Motor Theory (and its vari-
  (up or down). While breathing in each of these positions, they           ants) maintain a role for the motor system, but do not claim
  categorized sounds from an /ada/ to /ana/ continuum. The po-
  sition of the speech articulators, even though adopted for the           that it is an obligatory component of speech perception. For
  purposes of breathing, altered participants’ perception of ex-           example Pickering and Garrod (2007), propose that when per-
  ternal speech sounds, so that when the velum was down (to                ception is faced with a difficult task, top-down information in
  breath through the nose), they tended to hear the consonant as
  the nasal /n/ — a sound necessarily produced with a lowered              the form of motor-predictions can be used to ‘fill in the gaps’.
  velum, rather than as /d/ in which the velum must be raised.             According to this theory, speech perception would primarily
  Keywords: speech perception; articulation; corollary dis-                be an auditory process but when the auditory signal is particu-
  charge; efference copy; forward model; motor control; breath-            larly unclear the motor system makes predictions about what
  ing; motor theory
                                                                           is to come and, in so doing, constrains the possibilities that
                                                                           the auditory system must entertain, thus easing the computa-
                        Introduction                                       tional load. This would be a hybrid motor/auditory view of
If the organs of speech production are moved during speech                 speech perception. Skipper, Nusbaum, and Small (2006) and
perception, that movement can influence how perception un-                 Skipper, van Wassenhove, Nusbaum, and Small (2007) have
folds. For example, if hearing a sound ambiguous between                   proposed a similar theory, again arguing that speech percep-
/aba/ and /ava/ while mouthing (or even imagining) /ava/,                  tion is not necessarily a matter of coding the incoming sound
a person will tend to hear the sound as /ava/ (and contrari-               into a gestural code, but that engagement of predictions from
wise for mouthing /aba/) (Scott, Yeung, Gick, & Werker,                    the motor system can be used to aid in speech-perception.
2013). This phenomenon is purportedly due to the percep-                   Skipper et al. (2007) argue that such predictions are what un-
tual anticipation (in the form of corollary discharge) caused              derlie the influence of vision on speech perception, such as in
by mouthing/imagining. This experiment explores whether a                  the McGurk effect (McGurk & MacDonald, 1976).
similar perceptual capture can occur even when the organs of                  These motor-helping-hearing theories are quite simi-
speech production are not being engaged in a speech task —                 lar to the Perception-for-Action-Control Theory (Schwartz,
specifically, whether the position of the velum (up for breath-            Basirat, Ménard, & Sato, 2010). This theory argues that
ing through the mouth, down for breathing through the nose)                speech perception is not motor-based, but that speech ges-
can influence the perception of nasal vs. non-nasal stop con-              tures do define equivalence classes for speech sounds. The
sonants.                                                                   idea is that the motor system helps establish which sounds
Motor involvement in speech perception                                     count as members of the same category, membership be-
                                                                           ing determined by sharing a common method of production.
The question of whether (or to what degree) the motor system               However, once the sound classes are set, the motor system is
influences speech perception is old and strongly contested.                normally not used online in the act of perceiving the members
The best known version of this idea is the Motor Theory                    of these classes, such online perception being achieved by the
of Speech Perception (Liberman & Mattingly, 1985) which                    auditory system. Schwartz et al. (2010) argue for one excep-
claims that speech perception is achieved via a specialized                tion to the independence of sensory and motor processes —
module which extracts the intended speech gestures from the                when auditory perception is made difficult because of miss-
acoustic signal.                                                           ing information, the motor system can be used to ‘fill in’ that
   A somewhat similar view is proposed by Fowler (1986)                    missing information.
who offers a Gibsonian approach to speech perception in
which it is speech gestures that are recovered in perception
but, in contrast to the Motor Theory, this recovery is through             Corollary Discharge
general auditory mechanisms rather than a biologically spe-
cialized speech-perception mechanism. The view that the                    These recent alternatives to the Motor Theory propose a
fundamental units of speech perception (and production) are                specific mechanism by which the motor system influences


                                                                     425
speech perception: corollary discharge.1                                    example of this is the “White Christmas” effect (Merckelbach
   Corollary discharge is an internal sensory signal generated              & Ven, 2001) in which people are induced to hear the song
by one’s own motor system whenever one acts (Aliu, Houde,                   “White Christmas” when presented with white noise, sim-
& Nagarajan, 2009). One primary function of corollary dis-                  ply by telling them that the song might be buried under the
charge is to provide pseudo feedback in situations where reg-               noise (but is not). Such perceptual shifts in speech percep-
ular sensory feedback is too slow to guide one’s actions.                   tion arising from the influence of the motor system have been
There is an unavoidable time delay in real sensory feedback                 shown in other studies such as Sams, Möttönen, and Sihvonen
— our senses do not operate instantaneously, it takes time for              (2005), Ito, Tiede, and Ostry (2009) and Scott et al. (2013).
a change in the environment (or in our body) to be transduced                  A consequence of theories which propose a perceptual ‘fill
by our end-organs, then transmitted to and processed by the                 in the gap’ role for corollary discharge is that the position of
central nervous system and for a motor correction on the ba-                the perceiver’s own articulators should matter for what infor-
sis of this information to be issued. This delay can be quite               mation gets filled in. For a prediction of the sensory conse-
considerable – for auditory speech perception it has been es-               quences of an action to be accurate it must take into account
timated at around 130 ms (Jones & Munhall, 2002). This                      the starting point of the action, as the sensory consequences
means that feedback is not available (or minimally available)               of an action can be vastly different depending on where the
for speech movements that are faster than 130ms (which is in                effector is starting — think of the tactile sensory difference
fact many speech sounds). Corollary discharge can be gener-                 between slamming your jaw shut when your tongue is in its
ated by the motor system before sensory feedback is available               normal resting position vs. extended out between your teeth
and thus can serve the feedback role and avoid the time-lag                 (ouch!). Thus corollary discharge is necessarily generated us-
problem (Wolpert & Flanagan, 2001).                                         ing the current position of the articulators as the basis for pre-
   Another role of corollary discharge is to tag concurrent                 diction (Houde & Nagarajan, 2011; Hickok, 2012).
matching external sensations as “self-caused” and so un-
worthy of intense perceptual processing (Eliades & Wang,                                              Prediction
2008). This function means that corollary discharge functions
                                                                            Given that the predictions of corollary discharge take into ac-
to anticipate perceptions and so can pull ambiguous stim-
                                                                            count the position of the effectors, the perceptual channelling
uli into alignment with the anticipated percept. Scott et al.
                                                                            discussed above should be sensitive to the positions of one’s
(2013) hypothesized that corollary discharge, generated dur-
                                                                            own speech articulators (even if one is not speaking). Thus
ing mouthing of speech, channels the perception of external
                                                                            when a person’s velum is down in order to allow breathing
speech into matching the corollary discharge prediction, thus
                                                                            through the nose, then the perceptual channelling (or ‘filling
performing a perceptual capture function.
                                                                            in’) done by the motor system should be biased towards a pre-
Perceptual Capture                                                          diction of nasality and the person should thus be more likely
                                                                            to hear an ambiguous external sound as nasal. In the con-
Perceptual capture is a shift in perception caused by the                   text of this experiment, this means that people should hear an
fact that corollary discharge is an anticipation, and as such               /ada/~/aga/ ambiguous sound more often as /ana/ when they
can pull ambiguous stimuli into alignment with the antici-                  are breathing through their nose in comparison to when they
pated percept. Hickok, Houde, and Rong (2011) provide an                    are breathing through their mouth.
overview of how corollary discharge influences perception
                                                                               The sounds /d/ and /n/ were chosen as these sounds have
through its role as an anticipation.
                                                                            the same place of articulation and are both voiced and are
   Repp and Knoblich (2009) demonstrated perceptual cap-                    both stops, differing almost exclusively in whether the velum
ture from motor-induced anticipations by having pianists per-               is up (for /d/ and so no airflow through the nose) or down (for
form hand motions for a rising or falling sequence of notes.                /n/ with airflow through the nose). Thus the primary differ-
These hand motions were performed in synchrony with a se-                   ence between these sounds is mirrored in the position of the
quence of notes that could be heard (thanks to a perceptual il-             velum for breathing — up for breathing through the mouth,
lusion) as rising or falling. When performing the hand motion               down for breathing through the nose.
consistent with a rising sequence, pianists tended to hear the
                                                                               The prediction that people should hear more /n/ when their
ambiguous sound as ascending. Schütz-Bosbach and Prinz
                                                                            velum is down for breathing through their nose is similar
(2007) review several such perceptual capture effects across
                                                                            to the perceptual capture effect demonstrated in Sams et al.
sensory modalities.
                                                                            (2005) or Scott et al. (2013) but, unlike those experiments, in
   In terms of hearing, several studies have found evidence of
                                                                            the current experiment the articulators are not being used in a
anticipations altering the perception of sounds. A compelling
                                                                            speech task by the perceiver.
    1 Corollary discharge is the sensory prediction generated by a             This experiment is also similar to that of Ito et al. (2009), in
‘forward model’ – which is a system that takes motor commands as            which they showed that dynamic deformation of a perceiver’s
input and predicts the sensory consequences. Efference copy refers          face (by a robot device) can alter perception in line with the
to the motor command received by the forward model, but some-
times the terms corollary discharge and efference copy are used in-         movement, but only if the movement is timed appropriately
terchangeably.                                                              with the percept. In contrast, the current experiment asks


                                                                      426
whether a static articulator position can also induce a shift            be closer to the other (/ana/) end of the continuum. If the par-
in perception.                                                           ticipant hears this also as /ada/, the computer chooses the next
                                                                         sound to be even closer to /ana/ and so on until the participant
                          Methods                                        reports hearing /ana/. At this point, the computer reverses di-
Participants were asked to breath through their mouth or nose            rection of sound selection (hence this is called a ‘reversal’)
while they categorized sounds as /ada/ or /ana/. The pre-                and selects the next sound to be closer to the /ada/ end of
diction is that when participants are breathing through their            the continuum, however the stepsize along the continuum is
nose, the necessarily lowered velum will influence their per-            made smaller (the computer makes smaller jumps along the
ception so that they hear the sounds as more similar to the              continuum between sound selections, so that there is more
nasal /ana/. In a control experiment, participants categorized           precision). When the participant starts to hear the sound as
/ada/ vs. /aga/ while breathing through their mouth or nose.             /ada/ again, the computer reverses again (and again makes
No difference was predicted for this control experiment.                 the stepsizes along the continuum smaller) moving back to-
                                                                         ward /ana/. This back and forth continues as the computer
Stimuli                                                                  homes in on the participant’s boundary between /ada/ and
A female native speaker of standard European French was                  /ana/, changing direction of movement along the continuum
recorded saying /ada/ and /ana/. A 10 000 step contin-                   and getting more precise with each reversal. This is a robust
uum between these sounds was created using STRAIGHT                      and relatively quick method of estimating a person’s percep-
(Kawahara et al., 2008; Kawahara, Irino, & Morise, 2011).                tual boundary between two sounds.
While this may seem like a large number of continuum steps,
it should be kept in mind that participants only heard a small
subset of these sounds and the large number of steps is sim-
ple to generate and allows for very fine-grained precision in
estimating phoneme boundaries.

Procedures
There were two conditions:

1. Breathing through the mouth (velum up)

2. Breathing through the nose (velum down)

   In each condition, stimuli were presented using the stair-
case method (Cornsweet, 1962). The staircase method                          Figure 1: Abstract Example of a Staircase Procedure
presents points along a continuum and shifts the subsequent
target for presentation based on previous responses. Thus it is
able to ‘search out’ the perceptual boundary between sounds                 The structure of each trial was very simple, participants
quite quickly. Two interleaved staircases with random switch-            were presented with an audio stimulus that was somewhat
ing (to prevent participants being able to predict the upcom-            ambiguous between /ada/ or /ana/ and they pressed (with their
ing sound) were used for each condition, and participants                right hand) a keyboard button (right or left arrow key) to in-
alternated back and forth between conditions so that each                dicate their perception of the sound as /ada/ or /ana/.
participant performed the interleaved staircase procedure for               Participants were given instructions on the task and famil-
each condition twice (a total of four staircases per condition).         iarized with the software before conducting the experiment.
Thus the experiment determined each participant’s perceptual             The experiment itself took about 20 minutes for each partici-
boundary between /ada/ and /ana/ while participants breathed             pant to complete.
through their mouth or nose.                                                The order of breathing conditions (half the participants
   Each staircase consisted of thirteen reversals with decreas-          starting with breathing through the mouth half starting with
ing stepsize after each reversal. The step sizes were: 1250,             breathing through the nose) was counterbalanced across par-
1000, 800, 650, 500, 350, 250, 150, 80, 50, 30, 20, 10 (from             ticipants as was the correspondence of response button (left
a 10 000 step continuum). The two interleaved staircases                 arrow vs. right arrow on the computer keyboard) to sound.
started at points 2400 and 7600 of the continuum.                           The experiment was run on the PsychoPy experiment-
   An abstract example of the layout of a staircase procedure            platform (Peirce, 2007, 2009).
is shown in Figure 1. A sound from one end of the continuum
(for example the /ada/ end of the continuum) is played to the
                                                                         Participants
participant. If the participant categorizes it as belonging to           Thirty-nine native French speaking participants (31 female,
the category consistent with that end of the continuum (/ada/            35 right-handed) were run at Université Paris Descartes (av-
in our example), then the computer selects the next sound to             erage age 22.28, standard deviation 2.37).


                                                                   427
                           Results                                       ing corollary discharge, as sensory consequences are strongly
For each breathing position, each participant’s data from the            dependent on the starting point of the effectors. This leads
four staircases was submitted to a logistic regression to deter-         to the prediction that the position of one’s own articulators
mine what point on the continuum corresponded to their per-              should influence the perception of external speech sounds
ceptual boundary between /ada/ and /ana/ (the point at which             when those sounds are ambiguous and thus draw on the motor
they would hear the sound equally often as /ada/ and /ana/).             system’s prediction abilities.
These calculated boundaries were the dependent measure for                  This experiment tested that prediction and has shown that
the experiment. As this was a within-subjects design, the in-            the position of one’s own articulators does influence the per-
dividual variability in perceptual boundaries (which are often           ception of the speech — even when the position of the articu-
highly variable between individuals) is not an issue here –              lators is adopted for a non-speech activity (breathing). These
each participant served as his/her own control.                          results support theories which argue for a role of the motor
   As predicted, participants heard significantly more /ana/             system (and corollary discharge) in speech perception and
when breathing through their nose than when breathing                    makes a unique contribution in showing that the static po-
through their mouth, as determined by a paired t-test [t(38)             sition of the articulators can have this effect even when their
= 2.08, p = .044, d = .33].                                              position is not intended to produce speech.
   In the control experiment (categorizing /aga/ vs. /ada/),                These results are relevant to the ongoing debate about em-
no such difference was found between breathing conditions                bodied cognition — the degree to which the body and motor
[t(38) = 1.36, p = .18, d = .21]. However, the interaction               control systems are used in cognition. In the realm of se-
between experiment and control versions did not reach sig-               mantic processing of language a similar debate is ongoing
nificance. We believe this is due to a lack of power and are             about the degree of motor involvement in the processing of
creating a new version of the experiment to address this issue.          the meaning of sentences. For example, the Action Sentence
                                                                         Compatibility effect demonstrates that movements of the arm
                                                                         are faster when a person reads a sentence implying arm move-
                                                                         ments, suggesting that the person’s motor plan for arm move-
                                                                         ments was triggered by reading the sentence (Glenberg &
                                                                         Kaschak, 2002). The current experiment demonstrates a re-
                                                                         lated example of embodied cognition, but at a ‘lower’, per-
                                                                         ceptual level of language processing.
                                                                            Ongoing research is currently exploring the extent of
                                                                         this effect, examining how widespread (in terms of speech
                                                                         sounds) such effects are.

                                                                                           Acknowledgements
                                                                         Funding for this project was provided by a United Arab Emi-
                                                                         rates University Research Start-Up Grant (Perceptual-Motor
                                                                         Linkages in Speech and Cognition - 31h060) to Mark Scott.
    Figure 2: Results — Confidence Intervals are Shown
                                                                                                 References
               Discussion & Conclusion                                   Aliu, S. O., Houde, J. F., & Nagarajan, S. S. (2009, April).
Components of our motor systems, known as forward models,                  Motor-induced suppression of the auditory cortex. Journal
constantly predict the sensory effects of our actions — this               of Cognitive Neuroscience, 21(4), 791–802.
prediction is corollary discharge. Corollary discharge serves            Cornsweet, T. N. (1962, September). The Staircase-Method
a variety of crucial roles, including providing feedback for ac-           in Psychophysics. The American Journal of Psychology,
tions performed too quickly to use ‘regular’ sensory feedback              75(3), 485–491.
and tagging self-produced sensations as such, thus preventing            Eliades, S. J., & Wang, X. (2008). Neural substrates of vo-
sensory confusion. It is this second role that allows corollary            calization feedback monitoring in primate auditory cortex.
discharge to influence the concurrent perception of external               Nature, 453(7198), 1102–1106.
sensations.                                                              Fowler, C. A. (1986). An event approach to the study of
   A recent group of theories (Skipper et al., 2006, 2007;                 speech perception from a direct-realist perspective. Journal
Schwartz et al., 2010) have suggested that this function of                of Phonetics, 14(1), 3–28.
corollary discharge may regularly be used to supplement per-             Glenberg, A. M., & Kaschak, M. P. (2002). Grounding lan-
ception in cases of perceptual uncertainty; generating a pre-              guage in action. Psychonomic Bulletin & Review, 9(3),
diction, on the basis of one’s own motor system, to guide                  558–565.
the sensory processing. Forward models necessarily consult               Goldstein, L. M., & Fowler, C. A. (2003). Articulatory
the current position of a person’s articulators when generat-              phonology: A phonology for public language use. In


                                                                   428
   Phonetics and phonology in language comprehension and                Schwartz, J.-L., Basirat, A., Ménard, L., & Sato, M.
   production: Differences and similarities (pp. 159–207).                (2010, January). The Perception-for-Action-Control The-
   Berlin: Mouton de Gruyter.                                             ory (PACT): A perceptuo-motor theory of speech percep-
Hickok, G. (2012). Computational neuroanatomy of speech                   tion. Journal of Neurolinguistics, 1–19.
   production. Nature Reviews Neuroscience, 13(2), 135–                 Schütz-Bosbach, S., & Prinz, W. (2007). Perceptual reso-
   145.                                                                   nance: action-induced modulation of perception. Trends in
Hickok, G., Houde, J., & Rong, F. (2011, February). Sen-                  Cognitive Sciences, 11(8), 349–355.
   sorimotor integration in speech processing: computational            Scott, M., Yeung, H. H., Gick, B., & Werker, J. F. (2013).
   basis and neural organization. Neuron, 69(3), 407–22.                  Inner speech captures the perception of external speech.
Houde, J. F., & Nagarajan, S. S. (2011). Speech Production as             Journal of the Acoustical Society of America Express Let-
   State Feedback Control. Frontiers in Human Neuroscience,               ters, 133(4), 286–293.
   5.                                                                   Skipper, J. I., Nusbaum, H. C., & Small, S. L. (2006). Lend-
Ito, T., Tiede, M., & Ostry, D. J. (2009, January). Somatosen-            ing a helping hand to hearing: another motor theory of
   sory function in speech perception. Proceedings of the Na-             speech perception. In M. A. Arbib (Ed.), Action to Lan-
   tional Academy of Sciences of the United States of America,            guage via the Mirror Neuron System (pp. 250–285). Cam-
   106(4), 1245–8.                                                        bridge: Cambridge University Press.
Jones, J. A., & Munhall, K. G. (2002). The role of audi-                Skipper, J. I., van Wassenhove, V., Nusbaum, H. C., & Small,
   tory feedback during phonation: studies of Mandarin tone               S. L. (2007, October). Hearing lips and seeing voices:
   production (Vol. 30) (No. 3).                                          how cortical areas supporting speech production mediate
Kawahara, H., Irino, T., & Morise, M. (2011). An                          audiovisual speech perception. Cerebral cortex (New York,
   interference-free representation of instantaneous frequency            N.Y. : 1991), 17(10), 2387–99.
   of periodic signals and its application to F0 extraction. In         Wolpert, D. M., & Flanagan, J. R. (2001). Motor prediction.
   2011 IEEE International Conference on Acoustics, Speech                Current Biology, 11(18), 729–732.
   and Signal Processing (ICASSP) (pp. 5420–5423).
Kawahara, H., Morise, M., Takahashi, T., Nisimura, R., Irino,
   T., & Banno, H. (2008). Tandem-STRAIGHT: A tempo-
   rally stable power spectral representation for periodic sig-
   nals and applications to interference-free spectrum, F0, and
   aperiodicity estimation. In IEEE International Conference
   on Acoustics, Speech and Signal Processing, 2008. ICASSP
   2008 (pp. 3933–3936).
Liberman, A. M., & Mattingly, I. G. (1985). The motor theory
   of speech perception revised. Cognition, 21(1), 1–36.
McGurk, H., & MacDonald, J. (1976). Hearing lips and
   seeing voices. Nature, 264, 746–748.
Merckelbach, H., & Ven, V. V. D. (2001). Another White
   Christmas: fantasy proneness and reports of ‘hallucinatory
   experiences’ in undergraduate students. Journal of Behav-
   ior Therapy and Experimental Psychiatry, 32, 137–144.
Peirce, J. W. (2007, May). PsychoPy—Psychophysics
   software in Python. Journal of Neuroscience Methods,
   162(1–2), 8–13.
Peirce, J. W. (2009). Generating stimuli for neuroscience
   using PsychoPy. Frontiers in Neuroinformatics, 2, 10.
Pickering, M. J., & Garrod, S. (2007, March). Do people
   use language production to make predictions during com-
   prehension? Trends in Cognitive Sciences, 11(3), 105–10.
Repp, B. H., & Knoblich, G. (2009). Performed or ob-
   served keyboard actions affect pianists’ judgements of rel-
   ative pitch. The Quarterly Journal of Experimental Psy-
   chology: Human Experimental Psychology, 62(11), 2156–
   2170.
Sams, M., Möttönen, R., & Sihvonen, T. (2005). Seeing and
   hearing others and oneself talk. Cognitive Brain Research,
   23(2-3), 429–435.


                                                                  429