-

Jim Blevins

jpb39@cam.ac.uk 1

Petar Milin

petar.milin@uni-tuebingen.de 2

Michael Ramscar

michael.ramscar@uni-tuebingen.de 0 0 Eberhard Karls Universität Tübingen 1 University of Cambridge 2 University of Novi Sad, Eberhard Karls Universität Tübingen

29 31

ihTs talk outlines how form variation can be modelled in terms of equilibria between two dominant communicative pressures. eTh pressure to discriminate forms of a language enhances differences between expressions. Unchecked, this pressure can in principle lead to suppletion of the kind reported in languages such as Yélî Dnye (Henderson ). However, in most languages, the pressure towards maximally discriminative expressions is countered by the need to extrapolate from sparse input. It has long been known that corpora provide only a partial coverage of the forms of a language (inflectional and derivational). iThs talk presents evidence that the shortfall is far greater and far more systematic than previously appreciated, and that the coverage of the form variation remains sparse in corpora of up to one billion words. ehT sampling reported in this talk suggests that the forms in a corpus or encountered by a speaker exhibit a Zipfian distribution at all sample sizes. ehT interaction of these pressures also accounts for the role of lexical neighbourhoods. Since most paradigms will be only partially attested, the organization of paradigms into neighbourhoods provides an analogical base for extrapolation.

It is usually assumed that regularity in a linguistic system is desirable or normative and that suppletion and other irregularities represent deviations from the uniform patterns that systems (or their speakers) strive to maintain. From a discriminative perspective, the situation is exactly reversed. To the extent that patterns like suppletion enhance the discriminability of forms, they contribute to the communicative efficiency of a language. In a discriminative model, such as that of Ramscar et al. (), the only difference between overtly suppletive forms such as mouse/mice and more regular forms such as rat/rats is that the former serve to accelerate the rate at which a speakers’ representation of a specific form/meaning contrast becomes discriminated from the form classes that express similar contrasts. uThs all learning serves to increase the level of suppletion in form-meaning mappings.

Moreover, standard cases of ‘suppletion’ are merely extreme instances of discriminative contrasts that seem ubiquitous at the sub-phonemic level. In the domain of word formation, Davis et al. () found suggestive differences in duration and fundamental frequency between a word like captain and a morphologically unrelated onset word such as cap. Of more direct relevance are studies of inflectional formations. Baayen et al. () found that a sample of speakers produced Dutch nouns with a longer mean duration when they occurred as singulars than as when they occurred as the stem of the corresponding plural. In a follow-up study, Kemps et al. () tested speakers’ sensitivity to prosodic differences, and concluded that “acoustic differences exist between uninflected and inflected forms and that listeners are sensitive to them” (Kemps et al. : ). Recent studies by Plag et al. () find similar contrasts between phonemically identical affixes in English. hTe role of discriminability From a discriminative perspective, it is regularity that stands in need of explanation. Learning models offer a solution here as well. Unlike derivational processes, inflectional processes are traditionally assumed to be highly productive, defining uniform paradigms within a given class. Lemma size is thus not expected to vary, except where forms are unavailable due to paradigm ‘gaps’ or ‘defectiveness’. Yet corpus studies suggest that this expectation is an idealization. Many potentially available inflected forms are unattested in corpora. As corpora increase in size, they do not converge on uniformly populated paradigms. Instead, they reinforce previously attested forms and classes while introducing progressively fewer new units. As shown in In order for a collection of partial samples to allow the generation of unattested forms, the forms that speakers do know must be organized into systematic structures that collectively enable the scope of possible variations to be realized. eThse structures correspond to lexical neigbourhoods, whose effects have been investigated in a wide range of psycholinguistic studies (Baayen et al. ; Gahl et al. ). From the present perspective, neighbourhoods are not independent dimensions of lexical organization but, rather, constitute the creative engine of the morphological system, permitting the extrapolation of the full system from partial patterns. Interesting support for this perspective comes from the study reported in Milin et al. (). In this study, analogical extrapolation from a small set of nearest neighbors allowed a system to model the choice of masculine instrumental singular allomorph by Serbian speakers presented with nonce words. Regular paradigms thus enable language users to generate previously unencountered forms, not because they are the product of an explicit rule, or of any kind of explicit grammatical knowledge, but rather they are implicit in the distribution of forms and semantics in the language as a system, much as suggested by Hockett (: ). in his analogizing … [t]he native user of the language … operates in terms of all sorts of internally stored paradigms, many of them doubtless only partial Gahl, S., Yao, Y. & Johnson, K. (). Why reduce? Phonological neighborhood density and phonetic reduction in spontaneous speech. Journal of Memory and Language (), –. Henderson, J. E. (). Phonology and Grammar of Yele, Papua New Guinea. Pacific Linguistics B, Camberra: Pacific Linguistics.

Hockett, C. F. (). eTh Yawelmani basic verb.

Language , –.

Kemps, J. J. K., Rachèl, Ernestus, M., Schreuder, R. & Baayen, R. H. (). Prosodic cues for morphological complexity: eTh case of Dutch plural nouns. Memory & Cognition (), –. Milin, P., Keuleers, E. & Filipović Đurdjević, D. (). Allomorphic responses in Serbian pseudo-nouns as a result of analogical learning.

Acta Linguistica Hungarica , –.

Plag, I., Homan, J. & Kunter, G. (). Homophony and morphology: eTh acoustics of word-final S in English. Ms, Heinrich-HeineUniversität, Düsseldorf.

Ramscar, M., Dye, M. & McCauley, S. M. ().

Error and expectation in language learning: eTh curious absence of mouses in adult speech. Language (), –. 1

2 3 Number of noun infl. variants 4 1M 3M 12M

15M 6M 9M

Number of forms 12.5 5.0 −3.0 1 2 3 4 5 6 7 8 9 10 11 12 ... ...

m sampleSize 1M 3M 6M 9M 12M 15M Sample sizes (and number of hapax legomena): 1M (1107) 3M (2305) 6M (3187) 9M (8035) 12M (8633) 15M (7365)