Lexical categories or frequency effects? A feedback from quantitative methods applied to psycholinguistic models in two studies on Italian. Francesca Franzon*, Giorgio Arcara°, Chiara Zanini* * Dipartimento di Neuroscienze, Università degli Studi di Padova ° IRCSS Ospedale san Camillo, Lido di Venezia {francescafranzon7;giorgio.arcara}@gmail.com; chiara.zanini.2@unipd.it nouns are in fact accessed faster in the plural Abstract form than in the singular. Since such nouns are not identifiable as a homogeneous group by English. We examined two issues concern- means of some semantic features, the phenome- ing Italian Number morphology: the phe- non has been explained as a mere effect of the nomena related to mass and count nouns frequency of occurrence of the forms (Baayen et and to the plural dominance. By taking into al., 1996; 1997; 2007; Biedermann et al., 2013). account quantitative data from corpora and While the plural dominance seems to be unre- subjective frequency ratings in three mixed lated to grammatical constraints, another phe- effect models, we found that differences in nomenon involving Number morphology seems participants’ performance in two lexical de- to be grammatically grounded instead, namely cision tasks could be better captured as dif- the mass-count issue (Borer, 2005; Cheng, 1973; ferences in frequency rather than in terms of Chierchia, 2010; Jackendoff, 1991). Nouns refer- effects of lexical categories. ring to countable entities are called ‘count nouns’ (anello, ‘ring’), nouns referring to uncountable Italiano. In questo studio sono stati posti a entities are called ‘mass nouns’ (burro, ‘butter’). confronto due fenomeni pertinenti alla mor- Some constraints rule the possibility for the two fologia nominale di Numero in italiano: la types of nouns to occur in some morphosyntactic contabilità dei nomi e la dominanza plurale. contexts, for example count nouns cannot occur Integrando i dati quantitativi provenienti in the singular after a quantifier (*molto anello, dai corpora e da due studi di rating in ‘much ring’), while mass nouns cannot occur un’analisi statistica condotta tramite model- with numerals or the indeterminate article (*un li a effetti misti, risulta che le differenze nel- burro, ‘a butter’). For what concerns Number la prestazione dei partecipanti in due studi morphology, mass nouns should occur only in- di decisione lessicale sono riconducibili a flected in the singular (but for a deeper discus- effetti di frequenza piuttosto che alla pre- sion, see i.a. Acquaviva, 2013; Marcantonio & senza di tratti lessicali categoriali. Pretto, 2001; Pelletier, 2012). Previous lexical decision tasks have pointed out to some differences in the processing of count nouns with respect to mass nouns, which Introduction would require longer response times (RTs) (i.a. The role of frequency in lexical retrieval is well Mondini et al. 2009; Gillon et al. 1999). In the known for what concerns psycholinguistic stud- light of these results, it has been proposed that an ies (since, at least, Forster & Chambers, 1973): additional lexical feature has to be computed for the higher the frequency of a word, the faster its mass nouns as compared to count nouns. retrieval. Generally, the singular form of a noun While psycholinguistic studies on plural dom- is more frequent than the corresponding plural, inance have relied on relative frequency of sin- and thus retrieved faster. However, some nouns gular and plural forms in the selection of stimuli (e.g. stelle, ‘stars’) do occur more frequently in and in results analysis, even the most recent ex- the plural than in the singular: the phenomenon is perimental studies on the mass-count issue have known as plural dominance. Plural dominant not quantified the actual occurrence of the exper- imental stimuli in mass context and in count con- normative judgments, but to focus on the fre- text: nouns have rather been assigned to a mass quency they had heard or read the words; they or to a count category on the basis of the experi- had to assign a score to the frequency of the menters’ judgments. Quantitative data on syntac- nouns on a 7-point Likert scale, ranging from 0 = tic contexts can instead provide a better estimate "never heard or seen” to 6 = “more than once a of the frequency of use of nouns as countable or day”. The nouns in the questionnaires were pre- uncountable: in the present study we relied on sented to each participant in a different random the actual occurrence of nouns in the different order. syntactic contexts in assigning them to the “mass” or to the “count” experimental list. Score mean Singular Plural We will describe and put into comparison two n=0 0 0 lexical decision tasks, concerning the phenomena 05 14 7 most literature with respect to plural dominance. We hypothesize that the frequency of occurrence of the word form (inflected in the singular or in Table 1: Distribution of the subjective frequency the plural) will predict the RTs in lexical deci- scores. sion tasks contrasting mass and count nouns, as well as in the ones concerning the plural domi- Absolute frequency of the aforementioned nance issue. The frequency of occurrence will be nouns was collected on the ItWaC corpus (Baro- measured by means of two subjective frequency ni et al., 2009). A positive correlation was found rating studies and in the corpus ItWaC (Baroni et between corpus frequency and subjective fre- al. 2009). We will rely on quantitative measures quency: r(446) = 0.75, p <.001. In order to dis- to categorize experimental stimuli. Measures of ambiguate the mass use from the count use of the plural dominance of nouns will be based on the nouns presented in the rating questionnaire, we ratio between their occurrence in the plural and designed queries in CQP syntax following the in the singular; the mass and count experimental methods described by Katz & Zamparelli (2012). nouns will be categorized considering their dis- The occurrence of nouns with determiners such tribution with respect to mass and count morpho- as the indeterminate article and quantifiers were syntactic contexts. used to trace the occurrence in unambiguous count or mass context. 1 First study: mass and count nouns 1.2 Lexical decision task 1.1 Rating and corpus analysis From the initial list of 224 nouns, 80 nouns were 448 concrete nouns, namely 224 nouns inflected selected and presented both in the singular and in both in the singular and in the plural, were se- the plural (totally 160 experimental stimuli). lected following the theoretical definitions given These stimuli were selected to span as uniformly in traditional grammars. The list included the as possible across the range of possible values of plural of 45 nouns for which only singular occur- subjective frequency in order to use the subjec- rences would be expected on a normative basis tive frequency as a continuous variable in the (pure “mass” nouns such as burro ‘butter’ - analysis. From the 80 nouns we classified as *burri ‘butters’). “mass” the 18 top mass-used nouns with the A questionnaire was designed in order to eval- highest mass frequencies and values of count uate the subjective frequency of the 448 nouns frequencies that were not among the top 18; we following the methods used in previous literature classified as “count” the 18 top count-used nouns (Ferrand et al., 2008). The questionnaire was with the highest count frequencies and values of administered online by means of the Survey- mass frequencies that were not among the top 18. Monkey platform. 126 informants participated in The nouns were presented both in the singular this study (age: range = 22 - 76 years, mean = and in the plural (totally 72). The remaining 36.2, SD = 12.46; years of education: range = 8- stimuli were not categorised in such terms. Ex- 21). Participants were instructed not to express perimental stimuli are displayed in table 2. The final list included 240 filler words, consisting in subjective frequency and orthographic length. 80 adjectives and 160 phonotactically plausible Results show significant effects of length (long- non-words. er RTs for longer items), of corpus frequency (longer RTs for low corpus frequency) and of N. of Corpus Subjective Length subjective frequency (longer RTs for low subjec- items Frequency Frequency 11850.32 3.29 6.41 tive frequency). All stimuli 160 (27239.65) (1.18) (1.66) Notably, the predictor category is not signifi- “Mass” nouns: 18 26204.88 4.36 6.22 cant (p = 0.85); corpus frequency is a significant singular (28831.43) (0.57) (1.89) predictor in model 2 (p = 0.03), but it only ap- “Mass” nouns: 824 1.95 6.28 plural 18 (1187.38) (0.72) (1.96) proached significance in model 1 (p = 0.05). Pos- “Count” nouns: 38570.05 4.09 5.78 sibly, in model 1 Number is a significant predic- 18 singular (54194.95) (0.84) (1.31) tor because the categorised items represent a “Count”nouns: 18 24365 4.07 5.89 subset that differ for frequency of occurrence in plural (36455) (0.80) (1.27) the plural. In fact, in model 2, in which both cat- egorised and not categorised items were consid- Table 2: Psycholinguistic properties of experi- ered, no effect of Number was found. mental stimuli. Standard p- Fixed effect Coefficient df t 60 Italian native speakers participated in the Error value experiment (mean age = 23.5, SD = 2.37; years Intercept 6.73 0.04 219.42 172.38 <0.001 of education: mean = 15.16, SD = 1.64). Partici- Corpus -0.009 0.004 155.55 -2.16 0.03 pants saw a series of letter strings presented at frequency Subjective < the center of the screen one at a time. They had frequency -0.05 0.008 152.19 -5.37 0.001 to press a key if they thought the string was an Orthographic 0.008 0.004 2.47 2.11 0.04 length Italian word, another key in the converse case. 1.3 Results Table 4: Results of model 2. Results were analyzed by means of mixed effect 2 Second study: plural dominance models (Baayen, Davidson & Bates, 2008). In the model 1, summarized in table 3, we included 2.1 Rating and corpus analysis the 72 stimuli classified as mass and count The ItWaC corpus was queried to obtain the fre- nouns. We considered as predictors: category quency of occurrence of the singular and the plu- (mass/count), Number (singular/plural), corpus rals of nouns displaying the most common in- frequency, subjective frequency and orthograph- flectional patterns (-o/-i; -a/-e). We discarded ic length. Results show significant effects of from testing material compounds, derived nouns length (longer RTs for longer items), of Number and the nouns that differ for orthographic length (longer RTs for plurals) and of subjective fre- or phonological form between singular and plural quency (longer RTs for low subjective frequen- (e.g. occhio - occhi ‘eye –eyes’). The remaining cy). nouns were then ordered on the base of their plu- Stand- ral dominance defined as the ratio plural fre- Fixed Coeffi- effect cient ard df t p-value quency/singular frequency. We calculated stem Error frequency of nouns and selected 284 nouns uni- Intercept 6.56 0.05 95.18 130.53 <0.001 formly span across the range of possible values of frequency. Number= plural 0.37 0.02 64.33 2.04 0.04 A questionnaire was created in order to test Subjective the subjective frequency of the 284 selected -0.04 0.007 74.09 -4.27 <0.001 nouns, both in the singular ad the plural (568 frequency Ortho- experimental items). The questionnaire was ad- graphic 0.009 0.004 65.86 2.077 0.04 length ministered following the same methods de- scribed previously (§2.1). 150 Italian native Table 3: Results of model 1. speakers participated in the study (age: range = 18 – 69, mean = 29; years of education: range = In model 2, summarized in table 4, we includ- 8-21). The distribution of the subjective frequen- ed all the 160 stimuli. We considered as predic- cy is plotted in Table 5. A positive correlation tors: Number (singular/plural), corpus frequency, was found between the singular and plural forms of nouns within the corpus (r(282) = 0.70, p < frequency, subjective frequency and orthograph- .001) and within the rating (r(282) = 0.91, p < ic length. Results show significant effects of .001). length (longer RTs for longer items), of corpus frequency (longer RTs for low corpus frequency) Score mean Singular Plural and of subjective frequency (longer RTs for low n=0 0 0 subjective frequency). 05 7 5 frequency 5 Subjective 170.1 -0.03 0.007 -4.48 < 0.001 frequency 7 Orthographic 165.9 Table 5: Distribution of the subjective frequency length 0.009 0.004 3 2.03 0.04 scores. 2.2 Lexical decision task Table 7: Results of model 3. A lexical decision study was carried out, follow- 3 Discussion and conclusions ing the same methods described in §2.2. From the 284 nouns mentioned in §3.1, we chose: the In this study we applied quantitative methods in 30 nouns with the highest ratio of plural domi- the selection of experimental stimuli used in the nance, the 30 nouns with the lowest ratio of plu- two lexical decision tasks. In both tasks, results ral dominance, the 30 nouns whose ratio between from the three models showed effects of subjec- singular ad plural was the closest to 1 (see table tive frequency and corpus frequency but not of 6). Each noun was presented in the singular and category in written word recognition. For what in the plural (totally 180 experimental stimuli). concerns the plural dominance issue, this result The final list included 364 filler words, consist- was in line with previous literature. For what ing in 184 adjectives and 180 phonotactically concerns the mass-count issue, our results are plausible non-words. unexpected instead. Remind that frequency of 43 Italian native speakers participated in the occurrence in mass and count contexts was used experiment. to avoid biases in categorization of stimuli. Nev- ertheless, we did not observe differences in RTs Ortho between the two so categorized groups of nouns. Domi- tho- Thus, we suggest that there is no need to postu- Morpho- nance N. of Corpus Subjective graph logical (mean Number items Frequency Frequency ic late the computation of a lexical feature related Pl/Sg) Len gth to countability or uncountability in nouns. We Singular 30 5260.3 3.31 propose that the fact that a noun is considered Plural (3.61) (7547.43) 19026.46 (0.77) 3.48 6.33 (1.09) “mass” is better described as an epiphenomenon Plural 30 (25558.41) (0.79) of the distribution of noun with respect of syntac- 25596.9 3.44 tic contexts. However the possibility for a noun Singu- Singular 30 (44944.15) (0.91) 6.13 lar (0.16) 4276.3 3.23 (1.13) to occur in the different syntactic contexts does Plural 30 (7186.03) (0.79) not predict lexical decision RTs: frequency, as 35430.33 3.13 Equal Singular 30 (99471.4) (0.57) 6.16 measured in the corpus and by the rating study, is (0.9) Plural 30 31921.7 3.1 (1.17) the predictor of the lexical access times with re- (93584.35) (0.59) spect to words presented in isolation. In this sense, the mass-count issue is similar to the plu- Table 6: Psycholinguistic properties of experi- ral dominance phenomenon: even in that case, mental stimuli. there is no need to assume the presence of a fea- 2.3 Results ture marking plurality, as the frequency of the inflected form is sufficient to account for the ob- Results were analysed by means of mixed effect served effects in lexical decision tasks. models (Baayen, Davidson & Bates 2008). In The frequency of occurrence of nouns consid- model 3, summarized in table 7, we considered ered as a continuous variable is a better predictor as predictors: category (plural/singular/equal of RTs than a distinction attributed to alleged dominant), Number (singular/plural), corpus lexical categories both in the case of phenomena seemingly unrelated to core grammar rules, like quency estimates for all generally known the plural dominance, as well as in phenomena monosyllabic French words and their relation that have traditionally been described as gram- with other psycholinguistic variables. Behav- mar based, like the mass-count issue. ior Research Methods 40 (4), 1049-1054. Forster, K. I., & Chambers, S. M. (1973). Lexical References access and naming time.Journal of verbal Acquaviva, P. (2013). Il nome. Roma: Carocci. learning and verbal behavior, 12(6), 627-635. Baayen, H., Burani, C., & Schreuder, R. (1996). Gillon, B., Kehayia, E., & Taler, V. (1999). The Effects of semantic markedness in the pro- mass/count distinction: Evidence from on-line cessing of regular nominal singulars and plu- psycholinguistic performance. Brain and Lan- rals in Italian. Yearbook of morphology, guage 68, 205-211. Springer Netherlands, 13-33. Jackendoff, R. (1991). Parts and boundaries. Baayen, R. H., Dijkstra, T., & Schreuder, R. Cognition 41, 9-45. (1997). Singulars and plurals in Dutch: Evi- Katz, G. & Zamparelli, R. (2012). Quantifying dence for a parallel dual-route model. Journal Count/Mass Elasticity. Choi, J. et al. (eds). of Memory and Language, 37(1), 94-117. Proceedings of the 29th West Coast Confer- Baayen, R. H., Davidson, D. J., & Bates, D. M. ence on Formal Linguistics. Somerville, MA: (2008). Mixed-effects modeling with crossed Cascadilla Proceedings Project, 371-379. random effects for subjects and items. Journal Kulkarni, R., Rothstein, S., & Treves, A. (2013). of Memory and Language, 59(4), 390-412. A Statistical Investigation into the Cross- Baayen, R., Levelt, W., Schreuder, R., & Ernes- Linguistic Distribution of Mass and Count tus, M. (2007). Paradigmatic structure in Nouns: Morphosyntactic and Semantic Per- speech production. Proceedings from the An- spectives. Biolinguistics 7, 132-168. nual Meeting of the Chicago Linguistic Socie- Kuperman, V., & Van Dyke, J. A. (2013). Reas- ty, 43(1): 1-29. Chicago Linguistic Society. sessing word frequency as a determinant of Balota, D. A., Pilotti, M., & Cortese, M. J. word recognition for skilled and unskilled (2001). Subjective frequency estimates for readers. Journal of Experimental Psychology: 2,938 monosyllabic words. Memory & Cogni- Human Perception and Performance 39(3), tion 29(4), 639-647. 802. Baroni, M., Bernardini, S., Ferraresi, A., & Zan- Marcantonio, A. & Pretto, A. M. (2001). Il no- chetta, E. (2009). The WaCky Wide Web: A me. L. Renzi, G. Salvi, & A. Cardinaletti Collection of Very Large Linguistically Pro- (eds.). Grande grammatica italiana di consul- cessed Web-Crawled Corpora. Language Re- tazione. Bologna: Il Mulino, 329-346. sources and Evaluation 43 (3), 209-226. Mondini, S., Kehaya, E., Gillon, B., Arcara, G., Biedermann, B., Beyersmann, E., Mason, C., & & Jarema, G. (2009). Lexical access of mass Nickels, L. (2013). Does plural dominance and count nouns. How word recognition reac- play a role in spoken picture naming? A com- tion times correlate with lexical and morpho- parison of unimpaired and impaired speakers. syntactic processing. The Mental Lexicon 4, Journal of Neurolinguistics, 26(6), 712-736. 354-379. Borer, H. (2005). In name only. Oxford: OUP. Pelletier, F. J. (2012a). Lexical Nouns are Nei- ther Mass nor Count, but they are Both Mass Cheng, C.-Y. (1973). Response to Moravcsik. J. and Count. D. Massam (ed.). A Cross- Hintikka, J.M.E. Moravcsik, & P. Suppes Linguistic Exploration of the Count-Mass Dis- (eds.). Approaches to Natural Language. Dor- tinction. Oxford: OUP, 9-26. drecht: Reidel, 286-288. Williams, R., & Morris, R. (2004). Eye move- Chierchia, G. (2010). Mass nouns, vagueness and ments, word familiarity, and vocabulary ac- semantic variation. Synthèses 174, 99-149. quisition. European Journal of Cognitive Psy- Ferrand, L., Bonin, P., Méot, A., Augustinova, chology 16(1/2), 312–339. M., New, B., Pallier, C., & Brysbaert, M. (2008). Age-of-acquisition and subjective fre-