=Paper=
{{Paper
|id=Vol-1347/paper03
|storemode=property
|title=Perception of gesturally distinct consonants in Persian
|pdfUrl=https://ceur-ws.org/Vol-1347/paper03.pdf
|volume=Vol-1347
|dblpUrl=https://dblp.org/rec/conf/networds/FalahatiB15
}}
==Perception of gesturally distinct consonants in Persian==
Perception of gesturally distinct consonants in Persian Reza Falahati Chiara Bertini Laboratorio di Linguistica Laboratorio di Linguistica Scuola Normale Superiore Scuola Normale Superiore Piazza dei Cavalieri 7 Piazza dei Cavalieri 7 56126 Pisa, Italy 56126 Pisa, Italy reza.falahati@sns.it chiara.bertini@sns.it alveolar gesture, some had gestural overlap that Abstract masked at least some of the acoustic information This study explores the sensitivity of the for [t], and some had reduced alveolar gestures. individuals to the residual gestures The current study tests listeners’ sensitivity to remaining after the simplification of these three types of /t/ realizations. consonant clusters. Three sets of target stimuli having full, reduced, and zero 2 Background alveolar gestures along with the control stimuli were used in a perceptual Choosing the basic units or building blocks by identification task. The results of the which the phenomena in a discipline could be experiment showed that subjects reliably explained is fundamentally important. Due to the distinguished the three target sets with “complex” nature of language, there is no varying residual gestures from the consensus among linguists as to the nature of this control. The results also showed that the basic unit in the field. The controversy over degree of residual gestures affects the choosing the building blocks extends to the domain of speech perception where different rate of [t] perception by the subjects; models have postulated various basic units of however, this was not statistically processing and storage. significant. The results are discussed in In general, there are two major theoretical the context of different theories of speech approaches to speech perception: gesturalist perception. theories versus auditory and exemplar theories. The two main gestural theories of speech 1 Introduction perception are Motor Theory and Direct Realism This study investigates the perception of three (MT and DR, henceforth). In motor theories, the categories of consonant clusters that are intended phonetic gestures of the speaker are perceptually similar but gesturally distinct. In considered to be the objects of speech Persian, word-final coronal stops are optionally perception. These gestures are “represented in deleted, when they are preceded by obstruents or the brain as invariant motor commands that call the homorganic nasal /n/. For example, the final for movements of the articulators through certain linguistically significant configurations” clusters in the words /ræft/ “went”, /duχt/ “sew” (Liberman and Mattingly 1985, p. 2). The main and /qæsd/ “intension” are optionally simplified1 motivation for choosing such basic unit by MT, in fast/casual speech, resulting in: [ræf], [duχ], among other factors, is mainly because of patterns where different acoustic cues could give and [qæs], respectively. The articulatory study rise to the same phonetic percept or where conducted on this process in Persian by Falahati variant phonetic percepts were found for the (2013) has shown that the gestures of the deleted same synthetic speech across different contexts segments are often still present. More (Delattre et al., 1955, 1964; Liberman 1957; specifically, the findings showed that of the Liberman and Mattingly 1985). Despite of the clusters that sounded simplified, some had no fact that this theory has gone through significant 1 changes from its inception, all the versions share The term “simplification” is used here for the acoustic and the idea that the objects of speech perception are perceptual consequence of apparent coronal consonant deletion, regardless of whether there is a residual articulatory events rather than acoustic or articulatory gesture. auditory events. Copyright © by the paper’s authors. Copying permitted for private and academic purposes. In Vito Pirrelli, Claudia Marzi, Marcello Ferro (eds.): Word Structure and Word Usage. Proceedings of the NetWordS Final Conference, Pisa, March 30-April 1, 2015, published at http://ceur-ws.org 13 An intended gesture is produced by a number complex stimuli with structured variance (Diehl of muscles that act in concert sometimes ranging et al., 2004). According to this approach, the over more than one articulator. For instance, phonological representations are assumed to be constriction needed for producing coronal stops speaker independent and they are associated with involves the action of the tip/blade of the tongue each word in the listener’s mental lexicon. The and the jaw; however, such a constriction is proponents of this approach take, for example, considered one gesture. According to MT, the categorical perception of non-speech sounds or orchestration among gestures is quite systematic categorical-like perception by non-human and listeners can use the systematically varying animals as evidence for their argument. They acoustic cues for coronal stops as information to also consider some of the cross-linguistic sound detect the related consonant gestures. patterns and the “maximal auditory dispersion” MT assumes a biological link between in vowel systems as further support for their perception and production. According to this claim (Ohala 1990, 1995). perspective both speech perception and speech Exemplar theories form another approach to production share the same set of invariants and speech perception where words and frequently- are governed by auditory principles. “The used grammatical constructions are represented motivation for articulatory and coarticulatory in memory as large sets of exemplars containing maneuvers is to produce just those acoustic fine phonetic information. Listeners are sensitive patterns that fit the language-independent to phonetic details existing in the speech signal. characteristics of the auditory system” (Liberman In such a speech perception model, a mechanism and Mattingly, 1985, p. 6). The acoustic signal is needed for gradiently changing the lexical only serves as a source of information about the representations over time. In order to do so, the gestures. It is the gestures which define the perceptual system must be capable of making phonetic category. fine phonetic distinctions (Johnson 1997). The other main gestural theory to speech These different approaches to speech perception is direct realism. Both DR and MT perception have been tested in different studies. share the claim that listeners to speech perceive Beddor et al., (2013), for example, used eye- vocal tract gestures. However, in DR it is the tracking to assess listeners' use of coarticulatory phonological gestures of the vocal tract, rather vowel nasalization as that information unfolded than the intended gestures, which are the in real time. In the experiment, subjects heard the perceptual objects (Fowler 1981, 1984, 1996). nasalized vowels with two different time According to DR, “the temporal overlap of latencies. The prediction was that subjects will vowels and consonants does not result in a fixate on the related image sooner when they physical merging or assimilation of gestures; hear the nasalized vowel earlier. The results instead, the vowel and consonant gestures are showed that listeners use relevant acoustic cues, coproduced. That is, they remain, to a which was argued to allow listeners to track the considerable extent, separate and independent gestural information. Nalon (1992) in an events...” (Diehl et al., 2004, p. 153). If we could identification task tested whether participants extend this to the gestures of two adjacent could identify different degrees of velar consonants, one should expect that the gestures assimilation. He used four different articulation related to them also remain separate and distinct types called full alveolar, residual alveolar, zero from each other. alveolar (i.e., full assimilation to the following In contrast to gestural theories, the auditory velar), and nonalveolar (i.e., velar in underlying theories assume that speech sounds are perceived representation). The results of his study showed via general cognitive and learning mechanisms. that the participants perceived full alveolar In this view, speech is not special and listeners tokens with 100% accuracy with /d/ responses do not perceive gestures. The auditory approach while less than half the tokens with residual to perception mainly considers general auditory alveolar were identified with /d/ responses. In mechanisms responsible for perceptual performance. According to this view, the speech another study, Pisoni showed that the nonspeech and nonspeech stimuli do not invoke a special or analogs of VOT stimuli are perceived speech-specific module. Gestures have no categorically. Similar studies like this were taken mediatory role as to the perception of speech as evidence against MT which claimed sounds in this approach. Listeners use multiple categorical perception as a specific feature of the imperfect acoustic cues in order to categorize the speech mode of perception. 14 In this study, I will use three sets of simplified Target Full_G: [æχtt kɑ], [æftt bæ], [uftt bɑ] consonant clusters which are auditorily similar Target Partial_G: [æχt kɑ], [æft bæ], [uft bɑ] but gesturally different. The consonant clusters (i.e., C1C2#) happen in the coda of the words Target Zero_G: [æχ kɑ], [æf bæ], [uf bɑ] followed by another word which also starts with Control: [æχ ke], [æf bæ], [uf bɑ] a consonant, therefore giving us three consonants in a row in an intervocalic environment (i.e., The four sets of target and control nonwords V1C1C2#C3V2). The prediction is that if subjects presented above are the excised tokens taken are sensitive, they should have different from the full words presented below: judgment for the stimuli. The stimuli set with no coronal gesture is expected to show the same Target: /sæχt kɑr/ “hard-working”, /næft pattern as the control (with zero coronal gesture in the underlying representation). The stimuli bærɑje/ “oil for”, /kuft bɑʃeh/ “be cheap” with overlapped gestures and reduced gestures are predicted to show a pattern different both Control: /næχ ke/ “thread that”, /sæf bærɑje/, from control and the stimuli with zero residual “cue for” / mæruf bɑʃeh / “be famous” gestures. The following section introduces the methodology of the study. 3.3 Procedure 3 Methodology All the participants listened to forty stimuli (10 3.1 Participants stimuli in each category) with eight repetitions. Thirty-two Persian-speaking students from the (total of 320 tokens) in a sound booth located at Università di Pisa and Sant’Anna, seventeen the linguistics laboratory in Scuola Normale females fifteen males, aged 18-38 participated Superiore. The software Presentation was used to in this study. The results of eight of them are not present the stimuli to the listeners as an considered for analysis because they reported to identification task. The participants were asked be bilinguals and mainly used a language rather to listen very carefully and decide as quickly as than Persian at home or with their close friends. possible whether it is likely that there has been a This resulted in twenty-four, twelve females [t] at the end of the first part of each stimuli. For twelve males. None of them reported any hearing each stimulus, the participants were asked to problem. press either the green or the blue button on a Cedrus response pad. On the screen of a computer, listeners could also see “T” or “NO T” 3.2 Stimuli corresponding to the response buttons. The Three sets of target words varying in only the stimuli were shuffled and presented in blocks in degree/amount of alveolar residual gestures and a way that participants could either begin by one control stimuli set were used in the hearing all the tokens with [f] or [χ]. They also experiment. The three target categories are had the choice of taking a break after listening to mainly the same except for the degree of alveolar every 80 tokens. All the participants received a residual gestures. Target Full_G category has full short training before the start of the experiment. coronal gesture but has overlap hence marked The following section contains the results of the with two superscript [tt]. Target Partial_G study. category has partial residual gesture marked via superscript [t] whereas Target Zero_G has no gestural leftover. The stimuli in the control are 4 Results used as the baseline since they don’t have any The main goal of this study is to test listeners’ underlying coronal stop in the coda position of sensitivity to different degrees of residual the first word. Some examples of the target and gestures remaining after the simplification of control words are given below: consonant clusters. The response type and reaction time are the dependent variables in this study; however, only the results related to response type are presented here. Figure 1 below shows the perception rate of [t] by all subjects 15 across the four conditions. According to this, the 5 Discussion and Conclusion subjects show the highest rate of [t] perception in This research investigated listeners’ sensitivity to tokens with full alveolar gesture (i.e., 59.69%) and the lowest for the ones in the control (i.e., three types of /t/ realizations as target and 36.09%). The condition with partial alveolar compared the results with the control. The target gestures shows the rate of 56.20% which is very categories included simplified consonant clusters close to the full condition. The stimuli in zero with full, partial, and zero alveolar gestures. The alveolar condition show an intermediate level stimuli used as the baseline in the control had no between the control and the other two target alveolar gesture in the underlying form. The conditions with the rate of 49.84%. This shows general results of the study showed that subjects almost a similar pattern between the two target reliably distinguished the three target sets with conditions with full and partial gestures, an varying residual gestures from the control. This intermediate situation for the target condition could be due to more similarity in tongue with zero gesture, and a pattern for the control configuration in realizing these varying degrees which is different from the three target of coronal stop articulation compared to the conditions. control condition where there is no alveolar gesture in the underlying form. Any articulatory modification is expected to trigger acoustic changes. The acoustic results of the stimuli used The rate of [t] perception for the in this study by Falahati (2013) showed no mean subject significant difference between the simplified tokens (i.e., the three target sets with varying 80% degrees of residual gestures labeled all together 60% as simplified) and control tokens. The acoustic 40% parameters used in the analysis were V1 duration, consonant clusters duration, and formant 20% transitions. Despite of the fact that the results did 0% not show any significant difference between Control Full_G Partial_G Zero_G simplified and control conditions, the duration of V1 and consonant clusters in the simplified condition was always higher than the control Figure 1: The Rate of [t] Perception by all Subjects condition. It could be the case that these acoustic cues, although not very strong, are enough for In order to examine the relation between the two human’s auditory system to trigger the presence categorical variables in the study, namely the of a segment. response type and stimuli condition, a Pearson The results of the current study also showed chi-square test was run. The null hypothesis is that participants perceived almost 36% of the that there is no relation in the [t] perception and tokens with no underlying coronal stop as having the four conditions in the study. The results of [t]. This is very similar to the results of the study the test with [t] perception as the dependent reported by Nalon (1992) where 20% of the variable found significant main effect of control nonalveolar tokens were perceived as conditions χ2 (3, N = 960) = 46.2, p < 0.001. This having [d]. In his study, however, the control shows that there is a significant relation between tokens showed similar pattern to that of the target the stimuli conditions and response type. In order with zero alveolar (i.e., full assimilation). He to determine whether the difference in the attributes this to both subjects’ natural language perception of [t] across four categories is really experience as well as the inherent ambiguity in significant or it is due to chance variation, a the stimuli. He states that subjects are “willing to column proportions test was performed. This test “undo” its effects” and therefore, in the case of uses z-test to make the comparisons. The result the current study, report coronal stops even showed that the perception of [t] in the control where there is no evidence for them. The results of our study also showed that was significantly different from the all target categories. The next section presents the participants perceived more [t] in the tokens with discussion and concluding remarks of the study. full and partial alveolar gestures compared to the ones with zero alveolar gestures. The difference 16 between the three categories, however, did not their perception of [t]. The variation across reach the significance level. Such result could individuals regarding speech perception could be shed more light on the theories of speech a good source of information for the specialists perception discussed earlier in this paper. In in the field. Moreover, the degree to which an order to discuss this issue, first we need to individual’s speech production could map to further explore the nature of the three categories his/her perception is an interesting topic which in the target stimuli. From the three groups in the remains to be explored. target stimuli, one group categorically had no alveolar gesture while the other two had different Acknowledgments degrees of the gesture either as a result of overlap or reduction. We argue that the gradient We are very grateful to Patrice Beddor for her gestural reduction and overlap are due to low- comments and suggestions on this study. level phonetic and mechanical reasons while the categorical deletion, which results in tokens with zero gestures, is caused by the cognitive system. In the former groups, speakers neither intend to reduce nor plan to overlap gestures while the latter process is intended by the speaker. According to MT and DR, listeners’ target in speech perception is the intended or phonological gestures. Therefore, the overlapped and reduced stimuli should show different perceptual pattern compared to the stimuli with no residual gesture. The results in this study did not show a striking difference between these three target sets. The existence of acoustic cues pertaining to the presence of gestures is a prerequisite to their perception by the listener. If distinguishing acoustic details could be found between these three categories, then this would not support the gesturalist approach to speech perception. However, with the current results, such a claim cannot be made. Further acoustic analysis between these three target sets is needed to examine this idea further. The findings in our experiment could be best explained by referring to exemplar models of speech perception. In such models, the lexical representations of words change in a gradient way over time. This is due to the nature of some phonological processes in languages which are not categorical. According to this view, the perceptual mechanism is capable to make fine phonetic distinctions. However, it is the mapping between the gradient stimuli and the auditory system which fails and does not result in nonvariant forms. The lack of such a one-to-one mapping will bring variation across subjects in the speech community. The degree of such variation is determined by the amount of individual’s exposure to the specific variants. A closer look at the results for individual subjects showed that all twenty-four participants in the study could fall into three or four dominant patterns based on 17 Reference John Ohala. 1990. Respiratory activity in speech. In W. J. Hardcastle & A. Marchal (eds.), Speech Patrice S. Beddor, Kevin B. McGowan, Julie Boland, Production and Speech Modeling, 23-53. Andries W. Coetzee, and Anna Brasher. 2013. Netherlands: Kluwer Academic Publishers. The perceptual time course of coarticulation. Journal of the Acoustical Society of America, 133, John Ohala. 1995. The perceptual basis of some 2350-2366. sound patterns. In B. Connell and A. Arvaniti (eds.), Phonology and phonetic evidence, Papers in Pierre Delattre, Alvin M. Liberman, and Franklin S. Laboratory Phonology IV, 87-92. Cambridge: Cooper. 1955. Acoustic loci and transitional cues Cambridge University Press. for consonants. Journal of the Acoustical Society of America, 27, 769-773. Pierre Delattre, Alvin M. Liberman, and Franklin S. Cooper. 1964. Formant transition and loci as acoustic correlates of place of articulation in American fricatives. Stud. Linguist. 18, 104-121. Randy L. Diehl, Andrew J. Lotto, and Lorri L. Holt . 2004. Speech perception. Annual Review of psychology. 55, 149-179. Alvin M. Liberman and Ignatius G. Mattingly. 1985. The motor theory of speech perception revised. Cognition, 21: 1-36. Reza Falahati. 2013. Gradient and Categorical Consonant Cluster Simplification in Persian: An Ultrasound and Acoustic Study, Ph. D Dissertation, University of Ottawa, Ottawa. Carol C. Fowler. 1981. Production and perception of coarticulation among stressed and unstressed vowels. Journal of Speech and Hearing Research, 46, 127-139. Carol C. Fowler. 1984. Segmentation of coarticulated speech in perception. Perception & Psychophysics, 36, 359-368. Carol C. Fowler. 1996. Listeners do hear sounds, not tongues. Journal of the Acoustical Society of America, 99, 1730-1741. Keith Johnson. 1997. Speech perception without speaker normalization: an exemplar model. In K. Johnson and J. W. Mullennix (eds.), Talker variability in speech processing, 145-165. San Diego: Academic Press. Alvin M. Liberman and Ignatius G. Mattingly. 1985. The motor theory of speech perception revised. Cognition, 21: 1-36. Francis Nalon, 1992. The descriptive role of segments: evidence from assimilation. In G. J. Docherty and R. Ladd (eds.), Papers in Laboratory Phonology IV, 261-289. Cambridge: Cambridge University Press. 18