=Paper= {{Paper |id=Vol-1347/paper05 |storemode=property |title=Phonotactic probabilities in Italian simplex and complex words: a fragment priming study |pdfUrl=https://ceur-ws.org/Vol-1347/paper05.pdf |volume=Vol-1347 |dblpUrl=https://dblp.org/rec/conf/networds/BraccoCC15 }} ==Phonotactic probabilities in Italian simplex and complex words: a fragment priming study== https://ceur-ws.org/Vol-1347/paper05.pdf
    Phonotactic probabilities in Italian simplex and complex words: a
                        fragment priming study


     Giulia Bracco                          Basilio Calderone                              Chiara Celata
  Università di Salerno              CNRS & Université de Toulouse II                 Scuola Normale Superiore
Via Giovanni Paolo II 132               5 allées Antonio Machado                        P.zza dei Cavalieri 7
     Fisciano (SA)                               Toulouse                                       Pisa
gcbracco@unisa.it                       basilio.calderone                                celata@sns.it
                                           @univ.tlse2.fr



                                                                results of the study on simplex words only; we
1    Introduction                                               however discuss the implications of the current
                                                                findings for the processing of complex words.
Phonotactics refers to the sequential organization
of phonological units that are legal in a language              2     Experiment
(Crystal 1992). However, legal sound sequences
do not all occur with the same probability in a                 2.1    Materials and procedure
language. Phonotactic probability is most often
                                                                Forty-two native Italian speakers participated in
measured in terms of transitional probabilities
                                                                a speeded lexical decision task in a fragment
(TPs) of biphones and has been shown to influ-
                                                                priming paradigm. Thirty bi- or tri-syllabic Ital-
ence a large range of processes, including in-
                                                                ian nouns containing a biphonemic consonant
fants’ discrimination of native language sounds,
                                                                cluster in internal position (e.g. borsa, ‘bag’)
adults’ ratings of the wordlikeness of nonwords
                                                                served as targets. Each target was primed by a
(Vitevitch et al. 1997), speech segmentation (Pitt
                                                                sequence corresponding to an initial fragment of
& McQueen 1998, Mattys & Jusczyk 2001),
                                                                the target (e.g. bor-borsa). The fragment prime
word acquisition (Storkel 2001) and recognition
                                                                could consist of 3 o 4 phonemes and always end-
(Luce & Large 2001). Specifically, in the domain
                                                                ed with the first consonant of the cluster. The
of word recognition, high TPs facilitate word and
                                                                average length ratio between prime and target
nonword identification in speeded same-different
                                                                was 0.49. The clusters were different across
matching tasks, but slow down identification in
                                                                words and each cluster could occur in only one
lexical decision tasks due to the inhibitory effects
                                                                target (although more than one fragment could
of a large neighborhood (e.g. Vitevitch & Luce
                                                                end in a given consonant). 12 were heterosyllabic
1999, Luce & Large 2001). Most of the studies
                                                                (e.g. bor-sa ‘bag’), 12 tautosyllabic (e.g. deg-
on the role of TPs in speech production and per-
                                                                rado ‘decay’) and 6 ambisyllabic clusters (e.g.
ception have been conducted on English.
                                                                dis-tanza ‘distance’).
   In this paper we focus on the role of phonotac-
                                                                   Another set of 30 Italian nouns matching for
tic probabilities in priming morphologically sim-
                                                                average length, frequency and prime/target
plex and complex words in Italian. We investi-
                                                                length ratio, in which the fragment prime ended
gate whether biphone TPs affect the recognition
                                                                in a syllable onset consonant followed by a vow-
of word targets after exposure to fragment
                                                                el (e.g. tuc-tucano ‘toucan’). The same propor-
primes differing in the probability with which the
                                                                tion of fragment-final consonants was main-
fragment-final consonant predicts the consecu-
                                                                tained in the two sets of words.
tive segment in the target.
                                                                    Sixty pseudowords matching for average
   We opted for a non-factorial, regression de-
                                                                length and properties of the fragment were add-
sign including lexical and sub-lexical frequency
                                                                ed. Pseudowords were obtained by changing one
and distributional variables as predictors (see
                                                                letter of existing words (belonging to the same
Baayen 2010). In this paper, we report on the
                                                                frequency range of the experimental words), for

          Copyright © by the paper’s authors. Copying permitted for private and academic purposes.
In Vito Pirrelli, Claudia Marzi, Marcello Ferro (eds.): Word Structure and Word Usage. Proceedings of the NetWordS Final
                          Conference, Pisa, March 30-April 1, 2015, published at http://ceur-ws.org



                                                           24
1/3 in their initial part, 1/3 in their central part        C of the cluster (‘SequenceTypeFreq’), (xi) the
and 1/3 in their final part. The 30 clusters used           cumulated frequency of the words in (x) (‘Se-
for pseudowords did not appear in the words’                quenceTokenFreq’).
list.
   In the lexical decision task, participants were          2.3    Analysis and results
asked to press a button corresponding to their              Fixed and mixed models with subject and prime
dominant hand as soon as the orthographically               as random variables were used.
presented target was judged as a word, and a dif-              For the purposes of the present study, we test-
ferent button for targets judged as nonwords. All           ed two different models, both including frequen-
the stimuli appeared in Courier New font, 18                cy variables and phonotactic probability varia-
point size in the center of the computer screen. In         bles; they are shown in Table 1. The two models
order to avoid allographic effects, primes were             differed for the presence, in model II, of a meas-
displayed in uppercase and targets in lowercase.            ure of prime frequency, which was not included
The fixation was 200 ms, followed by a 50 ms                in model I, and for being focused either on se-
pause. Primes appeared for 150 ms, followed by              quence and bigram token frequencies (model I),
a 50 ms pause. The targets remained on the com-             or on sequence and bigram type frequencies.
puter screen for a maximum of 1 sec. If the par-            Both models were tested for CC items (e.g. bor-
ticipants did not produce any answer within that            sa, ‘bag’) and CV items (e.g. tuc-ano ‘ toucan’)
time, the feedback Fuori tempo (‘Out of time’)              separately.
appeared on the screen. Reaction times (RTs)
and the number of errors (Nerr) constituted the                        Model I             Model II
dependent variables. The reaction times were                 Fixed     TargetFreq          TargetFreq
measured from target onset to subject’s response,            effects   LenghRatio          PrimeTokenFreq
                                                                       SequenceTokenFreq   LengthRatio
and responses given after the deadline were
                                                                       BigramTokenFreq     SequenceTypeFreq
scored as errors.                                                      SequenceTP          BigramTypeFreq
   The Experiment was preceded by a practice                           BigramTP            SequenceTP
session. When the participants reached the 70 %                                            BigramTP
of valid responses the experiment started.                   Random    Subject             Subject
                                                             effects   Fragment prime      Fragment prime
2.2   Experimental variables
                                                            Table 1. Fixed and random effects for the CC and CV
Several statistical and distributional properties of                               items.
word primes, targets and clusters were derived
from the CoLFIS corpus (Bertinetto et al., 2005).              The results of the fixed effects analyses for the
   For each prime-target pair, we calculated (i)            relevant models are summarized in Table 2 (de-
the token frequency of the target (‘TargetFreq’),           pendent variable: RTs) and Table 3 (dependent
(ii) the N of words beginning with the prime                variable: Nerr).
fragment (‘PrimeTypeFreq’), (iii) the cumulated                According to model I, with RTs as the de-
frequency of the words in (ii) (‘PrimeToken-                pendent variable, the sequence’s TP (i.e., the TP
Freq’), (iii) the length of the target (in N graph-         between the fragment prime and the second con-
emes), (iv) the length of the prime (in N graph-            sonant of cluster) turned out to be the most sig-
emes), (v) the prime/target length ratio.                   nificant predictor, even outranking the contribu-
   For each cluster, we calculated (vi) the TP              tion of frequency values (for the target, the se-
value, i.e. the probability with which the first            quence and the bigram), which all concurred to
consonant of the cluster predicts the occurrence            the intercept. A different picture emerged how-
of the following consonant, calculated over the             ever for the CV items, for which no probability
corpus word tokens (‘BigramTP’), (vii) the N of             variables turned out to significantly predict the
words containing the cluster (‘BigramType-                  subjects’ response times; on the contrary, the
Freq’), (viii) the cumulated frequency of the               target frequency, with the secondary contribution
words in (vii) (‘BigramTokenFreq’), (ix) the TP             of the frequency of the cluster, appeared to play a
between the fragment prime and the second con-              role for this subset of items.
sonant of the cluster, e.g. P(s|bor) in borsa ‘bag’            According to model II, for CC items the role
(‘SequenceTP’), (x) the N of words containing               of the target frequency turned out to be very im-
the sequence of the prime followed by the second            portant, and the only additional effect was gener-




                                                       25
ated by the sequence’s TP. Thus the two models
were similar in emphasizing the role of the prob-
ability with which a given C follows the prime
sequence. As for CV items, model II returned a
picture very similar to the one that emerged in
model I, with target frequency and bigram type
frequency as the only significant predictors.




                                                               Table 3. Fixed effects coefficients for the two models,
                                                                   CC and CV items (Nerr=dependent variable).

                                                               3    Discussion
                                                               This work aimed to shed light on the role of TPs
                                                               in a so far unstudied experimental environment,
                                                               i.e., a lexical decision task with fragment prim-
                                                               ing. As the large part of studies on phonotactic
                                                               probabilities focused on English, this work also
                                                               added to the field with evidence from a poorly
Table 2. Fixed effects coefficients for the two models,        investigated language, Italian.
    CC and CV items (RTs=dependent variable).                     Fragment priming is known to be modulated
                                                               not only by word frequency and the frequencies
   When subject and prime were included as ran-                of words matching the fragment but also by top-
dom factors, the pairwise comparison in the like-              down information conveyed by the prime: a
lihood ratio test confirmed that the contribution              fragment prime matching a unique morpho-
of the sequence’s TP increased significantly the               lexical family is as effective as a stem prime,
predictability of the RTs patterns: χ2(1)= 11.184,             thus showing that priming acts as a cue for the
p= 0.0008 in model I, χ2 (1)= 5.4403, p= 0.019 in              properties displayed in the target (see e.g. Lau-
model II.                                                      danna & Bracco, 2006, for Italian).
   The average reaction times and the number of                   This study has shown that the priming effect
errors were positively and significantly correlat-             when an initial fragment is available is influ-
ed, though with an intermediate correlation coef-              enced also by bottom-up variables; in particular,
ficient (r = .648, p < .01). We thus tested the two            it depends on the probability with which the
models with Nerr as the dependent variable, in                 segments composing the fragment or the frag-
order to determine if the error rate was influ-                ment-final consonant predict the occurrence of
enced by frequencies and probabilities to a dif-               the consecutive consonant. Although to a lesser
ferent extent than response latencies.                         extent, the frequency with which bigrams and
   With Nerr as the dependent variable, R2 values              sequences occur (as types or tokens) in the lexi-
were consistently lower than in the RTs simula-                con also predict the subjects’ behavior. Phono-
tions (Table 3), thus indicating that the error pat-           tactic probabilities thus turned out to predict the
terns were accounted for by our frequency and                  subjects’ response to a large degree for many of
probability variables to a more limited extent. In             the phonological environments tested in the cur-
particular, both model I and model II emphasized               rent experiment, sometimes outperforming target
for the CC items the role of target frequency as               frequencies, and consistently overtaking the con-
the only significant predictor of errors, while for            tribution of the prime/target length ratio and of
CV items an additional role of bigram frequen-                 the prime frequency.
cies (by token and by type, respectively) was                     The results however suggested that the phono-
found. Thus for the CV items, RTs and error rate               tactic probabilities in the case of consonant clus-
produced consistent results.                                   ters were overall more important than in the case
                                                               of consonant-vowel sequences; thus it must be




                                                          26
concluded that the contribution of TPs in lexical         tial fragment and the second part of the word
recognition is not the same across phonological           (e.g. per-perdente ‘loser’). Together with the
environments. Consonant clusters might play a             current experiment, the experiment on prefixed
particularly relevant role in lexical access, com-        and pseudo-prefixed words will determine
pared to CV sequences, as contemporary theories           whether or not the role of TPs is different when
based on the principles of phonological and mor-          the target is a simplex word compared to when it
phological naturalness also seems to predict (see         is a prefixed word, and to when it is a pseudo-
e.g. Dressler & Dziubalska-Kolaczyk, 2006; Ko-            prefixed word. Different hypotheses may be put
recky-Kroell et al. 2014).                                forward here, according to whether or not mor-
   Additionally, for CC sequence the token fre-           phological boundaries affect the processing of
quencies (of the bigram and of the prime + C              consonant clusters (e.g., Calderone et al. 2014,
sequence) turned out to be relatively more im-            Celata et al. 2015 in press), and according to the
portant than the corresponding type frequencies,          likelihood that a given sequence occurs as mor-
thus suggesting that the exposure to the number           pheme or as homographic non-morphological
of occurrence of a cluster or of a segment se-            pattern (see Laudanna et al., 1994).
quence may be more important in lexical access               By describing phonotactic probability and fre-
than the exposure to the individual items contain-        quency effects during word recognition, this
ing them.                                                 study offers arguments to models of lexical ac-
   An additional issue concerns the role of TPs in        cess based on bottom-up processes such as co-
morphologically complex words. According to               hort models for orthographic stimuli (see e.g.
some models, morphological parsing is necessary           Johson & Pugh, 1994). The property of single
for lexical access and the prefix (in the case of         consonants to predict the following segment then
prefixed words) has to be stripped away in order          speeding up the recognition of the whole word,
for the word to be recognized (from Taft & For-           as an additional if not independent way to access
ster, 1975 onwards). Assuming a condition in              words and their subparts, might also be discussed
which the fragment prime coincides with a pre-            with reference to models that associate ortho-
fix, TPs would play the additional role of mark-          graphic input units to semantic and lexical
ing the morphological boundary during the prim-           knowledge (from connectionist models such as in
ing event. According to the results of the current        Harm & Seidenberg, 1999, to amorphous models
study, it appears to be of utmost importance to           such as in Baayen et al. 2011).
further verify whether prefixed and pseudo-
prefixed words behave in the same way. In fact,           References
models postulating morphologicl pre-parsing               Harald R. Baayen. 2010. A real experiment is a facto-
(e.g. Schreuder & Baayen, 1995) would suggest               rial experiment? The Mental Lexicon, 5(1): 149-
that high TPs will codetermine latencies for pre-           157.
fixed targets only, while if morphology does not
                                                          Harald R. Baayen, Petar Milin, Dusica Filipovic Dur-
affect word recognition, then the TPs between
                                                            devic, Peter Hendrix and Marco Marelli. 2011. An
the fragment prime and the following segment                amorphous model for morphological processing in
composing the target will modulate latencies in             visual comprehension on naive discriminative lear-
prefixed and pseudo-prefixed words to the same              ning. Psychological Review, 118: 438-482.
extent.
                                                          Pier Marco Bertinetto, Cristina Burani, Alessandro
   A follow-up experiment will therefore test the
                                                             Laudanna, Lucia Marconi, Daniela Ratti, C. Ro-
contribution of phonotactic statistical knowledge            lando and Anna Maria Thornton. 2005. Corpus e
in native speakers’ access to complex word                   Lessico di Frequenza dell’Italiano Scritto CoL-
forms (specifically, prefixed nouns). Prefixed               FIS). http://linguistica.sns.it/CoLFIS/Home.net
and pseudo-prefixed words will be used for that
                                                          Basilio Calderone, Chiara Celata, Katharina Korecky-
purpose. In particular, fragment primes will be
                                                            Kroell and Wolfgang U. Dressler. 2014. A compu-
selected according to two different conditions: in          tational approach to (mor)phonotactics: Evidence
condition a) the targets are prefixed words and             from German. Language Sciences, 46 (part A): 59-
the fragment prime coincides with the prefix                70.
(e.g. bis-bisnonna ‘grandmother’); in condition
                                                          Chiara Celata, Katharina Korecky-Kroell, Irene Ricci,
b) the targets are pseudo-prefixed words and no
                                                            and Wolfgang U. Dressler. 2015 (in press). Online
morphological boundary occurs between the ini-              processing of German (mor)phonotactic clusters by




                                                     27
  adults and adolescents. Italian Journal of Lingui-          Michael Vitevitch, Paul A. Luce, David B. Pisoni and
  stics, 27(1).                                                 Edward T. Auer. 1999. Phonotactics, neighborhood
                                                                activation and lexical access for spoken words.
Wolfgang U. Dressler and Katarzyna Dziubalska-
                                                                Brain and Language, 68: 306-311.
 Kolacyk. 2006. Proposing Morphonotactics. Italian
 Journal of Linguistics, 18: 249-266.
Katharina Korecky-Kroell, Wolfgang U. Dressler,
  Eva Maria Freiberger, Eva Reinisch, Karlheinz
  Moerth and Gary Libben. 2014. Phonotactic and
  morphonotactic processing in German-speaking
  adults. Language Sciences, 46 (part A): 48-58.
N.F. Johnson and K.R. Pugh. 1994. A cohort model of
  visual word recognition. Cognitive Psychology, 26:
  240-346.
Alessandro Laudanna, Cristina Burani and Antonella
  Cermele. 1994. Prefixes as processing units. Lan-
  guage and Cognitive Processes, 9, 295-316.
Alessandro Laudanna and Giulia Bracco. 2006. Stem
  and fragment priming on verbal forms of Italian. In
  Proceedings of the 5th International Conference on
  the Mental Lexicon (Montreal, Canada, 11-13 Oc-
  tober, 2006): 26.
Paul A. Luce and Nathan R. Large. 2001. Phonotac-
  tics, density, and entropy in spoken word recogni-
  tion. Language and Cognitive Processes, 16: 565-
  581.
Sven L. Mattys and Peter W. Jusczyk. 2001. Phono-
  tactic cues for segmentation of fluent speech by in-
  fants. Cognition, 78: 91-121.
Mark Pitt and James McQueen. 1998. Is compensa-
  tion for coarticulation mediated by the lexicon?
  Journal of Memory and Language, 39: 347-370.
Robert Schreuder and Harald R. Baayen. 1997. How
  simplex complex words can be. Journal of Memory
  and Language, 37: 118-139.
Holly L. Storkel. 2001. Learning nonwords: Phono-
  tactic probabilities in language development. Jour-
  nal of Speech, Language, and Hearing Research,
  44: 1321–1337
Marcus Taft and Kenneth I. Forster. 1975. Lexical
  storage and retrieval of prefixed words. Journal of
  Verbal Learning and Verbal Behavior, 14: 638-
  647.
Michael Vitevitch, Paul Luce, J. Charles-Luce and D.
  Kemmerer. 1997. Phonotactics and syllable stress:
  Implications for the processing of spoken nonsense
  words. Language and Speech, 40: 47–62.
Michael S. Vitevitch and Paul A. Luce. 1999. Proba-
  bilistic phonotactics and neighborhood activation
  in spoken word recognition. Journal of Memory &
  Language, 40: 374-408.




                                                         28