Lexical categories or frequency effects? A feedback from quantitative
 methods applied to psycholinguistic models in two studies on Italian.
               Francesca Franzon*, Giorgio Arcara°, Chiara Zanini*
            *
            Dipartimento di Neuroscienze, Università degli Studi di Padova
                   °
                     IRCSS Ospedale san Camillo, Lido di Venezia
           {francescafranzon7;giorgio.arcara}@gmail.com;
                         chiara.zanini.2@unipd.it


                                                         nouns are in fact accessed faster in the plural
                    Abstract                             form than in the singular. Since such nouns are
                                                         not identifiable as a homogeneous group by
    English. We examined two issues concern-             means of some semantic features, the phenome-
    ing Italian Number morphology: the phe-              non has been explained as a mere effect of the
    nomena related to mass and count nouns               frequency of occurrence of the forms (Baayen et
    and to the plural dominance. By taking into          al., 1996; 1997; 2007; Biedermann et al., 2013).
    account quantitative data from corpora and              While the plural dominance seems to be unre-
    subjective frequency ratings in three mixed          lated to grammatical constraints, another phe-
    effect models, we found that differences in          nomenon involving Number morphology seems
    participants’ performance in two lexical de-         to be grammatically grounded instead, namely
    cision tasks could be better captured as dif-        the mass-count issue (Borer, 2005; Cheng, 1973;
    ferences in frequency rather than in terms of        Chierchia, 2010; Jackendoff, 1991). Nouns refer-
    effects of lexical categories.                       ring to countable entities are called ‘count nouns’
                                                         (anello, ‘ring’), nouns referring to uncountable
    Italiano. In questo studio sono stati posti a        entities are called ‘mass nouns’ (burro, ‘butter’).
    confronto due fenomeni pertinenti alla mor-          Some constraints rule the possibility for the two
    fologia nominale di Numero in italiano: la           types of nouns to occur in some morphosyntactic
    contabilità dei nomi e la dominanza plurale.         contexts, for example count nouns cannot occur
    Integrando i dati quantitativi provenienti           in the singular after a quantifier (*molto anello,
    dai corpora e da due studi di rating in              ‘much ring’), while mass nouns cannot occur
    un’analisi statistica condotta tramite model-        with numerals or the indeterminate article (*un
    li a effetti misti, risulta che le differenze nel-   burro, ‘a butter’). For what concerns Number
    la prestazione dei partecipanti in due studi         morphology, mass nouns should occur only in-
    di decisione lessicale sono riconducibili a          flected in the singular (but for a deeper discus-
    effetti di frequenza piuttosto che alla pre-         sion, see i.a. Acquaviva, 2013; Marcantonio &
    senza di tratti lessicali categoriali.               Pretto, 2001; Pelletier, 2012).
                                                            Previous lexical decision tasks have pointed
                                                         out to some differences in the processing of
                                                         count nouns with respect to mass nouns, which
    Introduction                                         would require longer response times (RTs) (i.a.
The role of frequency in lexical retrieval is well       Mondini et al. 2009; Gillon et al. 1999). In the
known for what concerns psycholinguistic stud-           light of these results, it has been proposed that an
ies (since, at least, Forster & Chambers, 1973):         additional lexical feature has to be computed for
the higher the frequency of a word, the faster its       mass nouns as compared to count nouns.
retrieval. Generally, the singular form of a noun           While psycholinguistic studies on plural dom-
is more frequent than the corresponding plural,          inance have relied on relative frequency of sin-
and thus retrieved faster. However, some nouns           gular and plural forms in the selection of stimuli
(e.g. stelle, ‘stars’) do occur more frequently in       and in results analysis, even the most recent ex-
the plural than in the singular: the phenomenon is       perimental studies on the mass-count issue have
known as plural dominance. Plural dominant               not quantified the actual occurrence of the exper-
imental stimuli in mass context and in count con-    normative judgments, but to focus on the fre-
text: nouns have rather been assigned to a mass      quency they had heard or read the words; they
or to a count category on the basis of the experi-   had to assign a score to the frequency of the
menters’ judgments. Quantitative data on syntac-     nouns on a 7-point Likert scale, ranging from 0 =
tic contexts can instead provide a better estimate   "never heard or seen” to 6 = “more than once a
of the frequency of use of nouns as countable or     day”. The nouns in the questionnaires were pre-
uncountable: in the present study we relied on       sented to each participant in a different random
the actual occurrence of nouns in the different      order.
syntactic contexts in assigning them to the
“mass” or to the “count” experimental list.                Score mean       Singular         Plural
   We will describe and put into comparison two              n=0               0               0
lexical decision tasks, concerning the phenomena            0<n≤1              0               7
of mass-count and of plural dominance respec-               1≤n≤2              3              47
tively. We will explore the possibility that the            2≤n≤3             45              60
mass-count effects described in psycholinguistic            3≤n≤4             88              63
literature could be better explained in terms of
                                                            4≤n≤5             70              36
frequency of occurrence, as it is recognized by
                                                             n>5              14               7
most literature with respect to plural dominance.
We hypothesize that the frequency of occurrence
of the word form (inflected in the singular or in    Table 1: Distribution of the subjective frequency
the plural) will predict the RTs in lexical deci-                         scores.
sion tasks contrasting mass and count nouns, as
well as in the ones concerning the plural domi-         Absolute frequency of the aforementioned
nance issue. The frequency of occurrence will be     nouns was collected on the ItWaC corpus (Baro-
measured by means of two subjective frequency        ni et al., 2009). A positive correlation was found
rating studies and in the corpus ItWaC (Baroni et    between corpus frequency and subjective fre-
al. 2009). We will rely on quantitative measures     quency: r(446) = 0.75, p <.001. In order to dis-
to categorize experimental stimuli. Measures of      ambiguate the mass use from the count use of the
plural dominance of nouns will be based on the       nouns presented in the rating questionnaire, we
ratio between their occurrence in the plural and     designed queries in CQP syntax following the
in the singular; the mass and count experimental     methods described by Katz & Zamparelli (2012).
nouns will be categorized considering their dis-     The occurrence of nouns with determiners such
tribution with respect to mass and count morpho-     as the indeterminate article and quantifiers were
syntactic contexts.                                  used to trace the occurrence in unambiguous
                                                     count or mass context.
1     First study: mass and count nouns
                                                     1.2     Lexical decision task
1.1    Rating and corpus analysis                    From the initial list of 224 nouns, 80 nouns were
448 concrete nouns, namely 224 nouns inflected       selected and presented both in the singular and in
both in the singular and in the plural, were se-     the plural (totally 160 experimental stimuli).
lected following the theoretical definitions given   These stimuli were selected to span as uniformly
in traditional grammars. The list included the       as possible across the range of possible values of
plural of 45 nouns for which only singular occur-    subjective frequency in order to use the subjec-
rences would be expected on a normative basis        tive frequency as a continuous variable in the
(pure “mass” nouns such as burro ‘butter’ -          analysis. From the 80 nouns we classified as
*burri ‘butters’).                                   “mass” the 18 top mass-used nouns with the
   A questionnaire was designed in order to eval-    highest mass frequencies and values of count
uate the subjective frequency of the 448 nouns       frequencies that were not among the top 18; we
following the methods used in previous literature    classified as “count” the 18 top count-used nouns
(Ferrand et al., 2008). The questionnaire was        with the highest count frequencies and values of
administered online by means of the Survey-          mass frequencies that were not among the top 18.
Monkey platform. 126 informants participated in      The nouns were presented both in the singular
this study (age: range = 22 - 76 years, mean =       and in the plural (totally 72). The remaining
36.2, SD = 12.46; years of education: range = 8-     stimuli were not categorised in such terms. Ex-
21). Participants were instructed not to express     perimental stimuli are displayed in table 2. The
final list included 240 filler words, consisting in                            subjective frequency and orthographic length.
80 adjectives and 160 phonotactically plausible                                Results show significant effects of length (long-
non-words.                                                                     er RTs for longer items), of corpus frequency
                                                                               (longer RTs for low corpus frequency) and of
                          N. of           Corpus        Subjective
                                                                     Length    subjective frequency (longer RTs for low subjec-
                          items          Frequency      Frequency
                                          11850.32         3.29        6.41
                                                                               tive frequency).
   All stimuli            160
                                         (27239.65)       (1.18)      (1.66)      Notably, the predictor category is not signifi-
 “Mass” nouns:
                            18
                                          26204.88         4.36        6.22    cant (p = 0.85); corpus frequency is a significant
   singular                              (28831.43)       (0.57)      (1.89)
                                                                               predictor in model 2 (p = 0.03), but it only ap-
 “Mass” nouns:                               824           1.95        6.28
    plural
                            18
                                          (1187.38)       (0.72)      (1.96)
                                                                               proached significance in model 1 (p = 0.05). Pos-
 “Count” nouns:                           38570.05         4.09        5.78
                                                                               sibly, in model 1 Number is a significant predic-
                            18
    singular                             (54194.95)       (0.84)      (1.31)   tor because the categorised items represent a
 “Count”nouns:
                            18
                                            24365          4.07        5.89    subset that differ for frequency of occurrence in
    plural                                 (36455)        (0.80)      (1.27)
                                                                               the plural. In fact, in model 2, in which both cat-
                                                                               egorised and not categorised items were consid-
 Table 2: Psycholinguistic properties of experi-                               ered, no effect of Number was found.
                mental stimuli.
                                                                                                                  Standard                      p-
                                                                                   Fixed effect     Coefficient                df       t
   60 Italian native speakers participated in the                                                                   Error                      value

experiment (mean age = 23.5, SD = 2.37; years                                        Intercept         6.73         0.04     219.42   172.38   <0.001
of education: mean = 15.16, SD = 1.64). Partici-                                      Corpus
                                                                                                      -0.009       0.004     155.55   -2.16     0.03
pants saw a series of letter strings presented at                                   frequency
                                                                                    Subjective                                                   <
the center of the screen one at a time. They had                                    frequency
                                                                                                      -0.05        0.008     152.19   -5.37
                                                                                                                                               0.001
to press a key if they thought the string was an                                   Orthographic
                                                                                                      0.008        0.004      2.47     2.11     0.04
                                                                                      length
Italian word, another key in the converse case.
1.3      Results                                                                                  Table 4: Results of model 2.
Results were analyzed by means of mixed effect                                 2      Second study: plural dominance
models (Baayen, Davidson & Bates, 2008). In
the model 1, summarized in table 3, we included                                2.1       Rating and corpus analysis
the 72 stimuli classified as mass and count                                    The ItWaC corpus was queried to obtain the fre-
nouns. We considered as predictors: category                                   quency of occurrence of the singular and the plu-
(mass/count), Number (singular/plural), corpus                                 rals of nouns displaying the most common in-
frequency, subjective frequency and orthograph-                                flectional patterns (-o/-i; -a/-e). We discarded
ic length. Results show significant effects of                                 from testing material compounds, derived nouns
length (longer RTs for longer items), of Number                                and the nouns that differ for orthographic length
(longer RTs for plurals) and of subjective fre-                                or phonological form between singular and plural
quency (longer RTs for low subjective frequen-                                 (e.g. occhio - occhi ‘eye –eyes’). The remaining
cy).                                                                           nouns were then ordered on the base of their plu-
                                  Stand-
                                                                               ral dominance defined as the ratio plural fre-
   Fixed          Coeffi-
   effect          cient
                                   ard           df          t       p-value   quency/singular frequency. We calculated stem
                                  Error
                                                                               frequency of nouns and selected 284 nouns uni-
  Intercept        6.56           0.05          95.18     130.53     <0.001    formly span across the range of possible values
                                                                               of frequency.
 Number=
  plural
                   0.37           0.02          64.33      2.04       0.04        A questionnaire was created in order to test
 Subjective
                                                                               the subjective frequency of the 284 selected
                  -0.04           0.007         74.09     -4.27      <0.001    nouns, both in the singular ad the plural (568
 frequency
   Ortho-                                                                      experimental items). The questionnaire was ad-
   graphic        0.009           0.004         65.86     2.077       0.04
   length                                                                      ministered following the same methods de-
                                                                               scribed previously (§2.1). 150 Italian native
                 Table 3: Results of model 1.                                  speakers participated in the study (age: range =
                                                                               18 – 69, mean = 29; years of education: range =
   In model 2, summarized in table 4, we includ-                               8-21). The distribution of the subjective frequen-
ed all the 160 stimuli. We considered as predic-                               cy is plotted in Table 5. A positive correlation
tors: Number (singular/plural), corpus frequency,                              was found between the singular and plural forms
of nouns within the corpus (r(282) = 0.70, p <                              frequency, subjective frequency and orthograph-
.001) and within the rating (r(282) = 0.91, p <                             ic length. Results show significant effects of
.001).                                                                      length (longer RTs for longer items), of corpus
                                                                            frequency (longer RTs for low corpus frequency)
      Score mean                 Singular                   Plural          and of subjective frequency (longer RTs for low
          n=0                       0                         0             subjective frequency).
       0<n≤1                        0                         1
                                                                                                           Stand-
       1≤n≤2                       19                         20                Fixed effect
                                                                                                 Coeffi-
                                                                                                            ard      df       t      p-value
                                                                                                  cient
       2≤n≤3                       88                        100                                           Error

       3≤n≤4                       139                       131                                                    211.6
                                                                                 Intercept        6.79     0.04             137.79   <0.001
                                                                                                                      4
       4≤n≤5                       31                         27                   Corpus                           171.5
                                                                                                 -0.02     0.003            -7.23    <0.001
          n>5                       7                         5                  frequency                            5
                                                                                 Subjective                         170.1
                                                                                                 -0.03     0.007            -4.48    < 0.001
                                                                                 frequency                            7
                                                                                Orthographic                        165.9
Table 5: Distribution of the subjective frequency                                  length
                                                                                                 0.009     0.004
                                                                                                                      3
                                                                                                                             2.03     0.04
                     scores.
2.2       Lexical decision task                                                                Table 7: Results of model 3.

A lexical decision study was carried out, follow-                           3       Discussion and conclusions
ing the same methods described in §2.2. From
the 284 nouns mentioned in §3.1, we chose: the                              In this study we applied quantitative methods in
30 nouns with the highest ratio of plural domi-                             the selection of experimental stimuli used in the
nance, the 30 nouns with the lowest ratio of plu-                           two lexical decision tasks. In both tasks, results
ral dominance, the 30 nouns whose ratio between                             from the three models showed effects of subjec-
singular ad plural was the closest to 1 (see table                          tive frequency and corpus frequency but not of
6). Each noun was presented in the singular and                             category in written word recognition. For what
in the plural (totally 180 experimental stimuli).                           concerns the plural dominance issue, this result
The final list included 364 filler words, consist-                          was in line with previous literature. For what
ing in 184 adjectives and 180 phonotactically                               concerns the mass-count issue, our results are
plausible non-words.                                                        unexpected instead. Remind that frequency of
   43 Italian native speakers participated in the                           occurrence in mass and count contexts was used
experiment.                                                                 to avoid biases in categorization of stimuli. Nev-
                                                                            ertheless, we did not observe differences in RTs
                                                                   Ortho    between the two so categorized groups of nouns.
 Domi-                                                              tho-    Thus, we suggest that there is no need to postu-
            Morpho-
 nance                   N. of       Corpus      Subjective        graph
            logical
 (mean
            Number
                         items      Frequency    Frequency            ic    late the computation of a lexical feature related
 Pl/Sg)                                                             Len
                                                                     gth
                                                                            to countability or uncountability in nouns. We
            Singular      30
                                       5260.3       3.31                    propose that the fact that a noun is considered
 Plural
 (3.61)
                                     (7547.43)
                                     19026.46
                                                   (0.77)
                                                    3.48
                                                                    6.33
                                                                   (1.09)
                                                                            “mass” is better described as an epiphenomenon
                Plural    30
                                    (25558.41)     (0.79)                   of the distribution of noun with respect of syntac-
                                      25596.9       3.44                    tic contexts. However the possibility for a noun
 Singu-     Singular      30
                                    (44944.15)     (0.91)           6.13
   lar
 (0.16)                                4276.3       3.23           (1.13)   to occur in the different syntactic contexts does
                Plural    30
                                     (7186.03)     (0.79)                   not predict lexical decision RTs: frequency, as
                                     35430.33       3.13
 Equal
            Singular      30
                                     (99471.4)     (0.57)           6.16    measured in the corpus and by the rating study, is
 (0.9)
                Plural    30
                                      31921.7        3.1           (1.17)   the predictor of the lexical access times with re-
                                    (93584.35)     (0.59)
                                                                            spect to words presented in isolation. In this
                                                                            sense, the mass-count issue is similar to the plu-
 Table 6: Psycholinguistic properties of experi-
                                                                            ral dominance phenomenon: even in that case,
                mental stimuli.
                                                                            there is no need to assume the presence of a fea-
2.3       Results                                                           ture marking plurality, as the frequency of the
                                                                            inflected form is sufficient to account for the ob-
Results were analysed by means of mixed effect                              served effects in lexical decision tasks.
models (Baayen, Davidson & Bates 2008). In                                     The frequency of occurrence of nouns consid-
model 3, summarized in table 7, we considered                               ered as a continuous variable is a better predictor
as predictors: category (plural/singular/equal                              of RTs than a distinction attributed to alleged
dominant), Number (singular/plural), corpus                                 lexical categories both in the case of phenomena
seemingly unrelated to core grammar rules, like       quency estimates for all generally known
the plural dominance, as well as in phenomena         monosyllabic French words and their relation
that have traditionally been described as gram-       with other psycholinguistic variables. Behav-
mar based, like the mass-count issue.                 ior Research Methods 40 (4), 1049-1054.
                                                    Forster, K. I., & Chambers, S. M. (1973). Lexical
References                                            access and naming time.Journal of verbal
Acquaviva, P. (2013). Il nome. Roma: Carocci.         learning and verbal behavior, 12(6), 627-635.
Baayen, H., Burani, C., & Schreuder, R. (1996).     Gillon, B., Kehayia, E., & Taler, V. (1999). The
  Effects of semantic markedness in the pro-          mass/count distinction: Evidence from on-line
  cessing of regular nominal singulars and plu-       psycholinguistic performance. Brain and Lan-
  rals in Italian. Yearbook of morphology,            guage 68, 205-211.
  Springer Netherlands, 13-33.
                                                    Jackendoff, R. (1991). Parts and boundaries.
Baayen, R. H., Dijkstra, T., & Schreuder, R.          Cognition 41, 9-45.
  (1997). Singulars and plurals in Dutch: Evi-      Katz, G. & Zamparelli, R. (2012). Quantifying
  dence for a parallel dual-route model. Journal      Count/Mass Elasticity. Choi, J. et al. (eds).
  of Memory and Language, 37(1), 94-117.              Proceedings of the 29th West Coast Confer-
Baayen, R. H., Davidson, D. J., & Bates, D. M.        ence on Formal Linguistics. Somerville, MA:
  (2008). Mixed-effects modeling with crossed         Cascadilla Proceedings Project, 371-379.
  random effects for subjects and items. Journal    Kulkarni, R., Rothstein, S., & Treves, A. (2013).
  of Memory and Language, 59(4), 390-412.             A Statistical Investigation into the Cross-
Baayen, R., Levelt, W., Schreuder, R., & Ernes-       Linguistic Distribution of Mass and Count
  tus, M. (2007). Paradigmatic structure in           Nouns: Morphosyntactic and Semantic Per-
  speech production. Proceedings from the An-         spectives. Biolinguistics 7, 132-168.
  nual Meeting of the Chicago Linguistic Socie-
                                                    Kuperman, V., & Van Dyke, J. A. (2013). Reas-
  ty, 43(1): 1-29. Chicago Linguistic Society.        sessing word frequency as a determinant of
Balota, D. A., Pilotti, M., & Cortese, M. J.          word recognition for skilled and unskilled
  (2001). Subjective frequency estimates for          readers. Journal of Experimental Psychology:
  2,938 monosyllabic words. Memory & Cogni-           Human Perception and Performance 39(3),
  tion 29(4), 639-647.                                802.
Baroni, M., Bernardini, S., Ferraresi, A., & Zan-   Marcantonio, A. & Pretto, A. M. (2001). Il no-
  chetta, E. (2009). The WaCky Wide Web: A           me. L. Renzi, G. Salvi, & A. Cardinaletti
  Collection of Very Large Linguistically Pro-       (eds.). Grande grammatica italiana di consul-
  cessed Web-Crawled Corpora. Language Re-           tazione. Bologna: Il Mulino, 329-346.
  sources and Evaluation 43 (3), 209-226.           Mondini, S., Kehaya, E., Gillon, B., Arcara, G.,
Biedermann, B., Beyersmann, E., Mason, C., &         & Jarema, G. (2009). Lexical access of mass
  Nickels, L. (2013). Does plural dominance          and count nouns. How word recognition reac-
  play a role in spoken picture naming? A com-       tion times correlate with lexical and morpho-
  parison of unimpaired and impaired speakers.       syntactic processing. The Mental Lexicon 4,
  Journal of Neurolinguistics, 26(6), 712-736.       354-379.
Borer, H. (2005). In name only. Oxford: OUP.        Pelletier, F. J. (2012a). Lexical Nouns are Nei-
                                                      ther Mass nor Count, but they are Both Mass
Cheng, C.-Y. (1973). Response to Moravcsik. J.
                                                      and Count. D. Massam (ed.). A Cross-
  Hintikka, J.M.E. Moravcsik, & P. Suppes
                                                      Linguistic Exploration of the Count-Mass Dis-
  (eds.). Approaches to Natural Language. Dor-
                                                      tinction. Oxford: OUP, 9-26.
  drecht: Reidel, 286-288.
                                                    Williams, R., & Morris, R. (2004). Eye move-
Chierchia, G. (2010). Mass nouns, vagueness and
                                                     ments, word familiarity, and vocabulary ac-
  semantic variation. Synthèses 174, 99-149.
                                                     quisition. European Journal of Cognitive Psy-
Ferrand, L., Bonin, P., Méot, A., Augustinova,       chology 16(1/2), 312–339.
  M., New, B., Pallier, C., & Brysbaert, M.
  (2008). Age-of-acquisition and subjective fre-