=Paper= {{Paper |id=Vol-1347/paper05 |storemode=property |title=Phonotactic probabilities in Italian simplex and complex words: a fragment priming study |pdfUrl=https://ceur-ws.org/Vol-1347/paper05.pdf |volume=Vol-1347 |dblpUrl=https://dblp.org/rec/conf/networds/BraccoCC15 }} ==Phonotactic probabilities in Italian simplex and complex words: a fragment priming study== https://ceur-ws.org/Vol-1347/paper05.pdf

Phonotactic probabilities in Italian simplex and complex words: a
fragment priming study

Giulia Bracco Basilio Calderone Chiara Celata
Università di Salerno CNRS & Université de Toulouse II Scuola Normale Superiore
Via Giovanni Paolo II 132 5 allées Antonio Machado P.zza dei Cavalieri 7
Fisciano (SA) Toulouse Pisa
gcbracco@unisa.it basilio.calderone celata@sns.it
@univ.tlse2.fr

results of the study on simplex words only; we
1 Introduction however discuss the implications of the current
findings for the processing of complex words.
Phonotactics refers to the sequential organization
of phonological units that are legal in a language 2 Experiment
(Crystal 1992). However, legal sound sequences
do not all occur with the same probability in a 2.1 Materials and procedure
language. Phonotactic probability is most often
Forty-two native Italian speakers participated in
measured in terms of transitional probabilities
a speeded lexical decision task in a fragment
(TPs) of biphones and has been shown to influ-
priming paradigm. Thirty bi- or tri-syllabic Ital-
ence a large range of processes, including in-
ian nouns containing a biphonemic consonant
fants’ discrimination of native language sounds,
cluster in internal position (e.g. borsa, ‘bag’)
adults’ ratings of the wordlikeness of nonwords
served as targets. Each target was primed by a
(Vitevitch et al. 1997), speech segmentation (Pitt
sequence corresponding to an initial fragment of
& McQueen 1998, Mattys & Jusczyk 2001),
the target (e.g. bor-borsa). The fragment prime
word acquisition (Storkel 2001) and recognition
could consist of 3 o 4 phonemes and always end-
(Luce & Large 2001). Specifically, in the domain
ed with the first consonant of the cluster. The
of word recognition, high TPs facilitate word and
average length ratio between prime and target
nonword identification in speeded same-different
was 0.49. The clusters were different across
matching tasks, but slow down identification in
words and each cluster could occur in only one
lexical decision tasks due to the inhibitory effects
target (although more than one fragment could
of a large neighborhood (e.g. Vitevitch & Luce
end in a given consonant). 12 were heterosyllabic
1999, Luce & Large 2001). Most of the studies
(e.g. bor-sa ‘bag’), 12 tautosyllabic (e.g. deg-
on the role of TPs in speech production and per-
rado ‘decay’) and 6 ambisyllabic clusters (e.g.
ception have been conducted on English.
dis-tanza ‘distance’).
In this paper we focus on the role of phonotac-
Another set of 30 Italian nouns matching for
tic probabilities in priming morphologically sim-
average length, frequency and prime/target
plex and complex words in Italian. We investi-
length ratio, in which the fragment prime ended
gate whether biphone TPs affect the recognition
in a syllable onset consonant followed by a vow-
of word targets after exposure to fragment
el (e.g. tuc-tucano ‘toucan’). The same propor-
primes differing in the probability with which the
tion of fragment-final consonants was main-
fragment-final consonant predicts the consecu-
tained in the two sets of words.
tive segment in the target.
Sixty pseudowords matching for average
We opted for a non-factorial, regression de-
length and properties of the fragment were add-
sign including lexical and sub-lexical frequency
ed. Pseudowords were obtained by changing one
and distributional variables as predictors (see
letter of existing words (belonging to the same
Baayen 2010). In this paper, we report on the
frequency range of the experimental words), for

Copyright © by the paper’s authors. Copying permitted for private and academic purposes.
In Vito Pirrelli, Claudia Marzi, Marcello Ferro (eds.): Word Structure and Word Usage. Proceedings of the NetWordS Final
Conference, Pisa, March 30-April 1, 2015, published at http://ceur-ws.org

24
1/3 in their initial part, 1/3 in their central part C of the cluster (‘SequenceTypeFreq’), (xi) the
and 1/3 in their final part. The 30 clusters used cumulated frequency of the words in (x) (‘Se-
for pseudowords did not appear in the words’ quenceTokenFreq’).
list.
In the lexical decision task, participants were 2.3 Analysis and results
asked to press a button corresponding to their Fixed and mixed models with subject and prime
dominant hand as soon as the orthographically as random variables were used.
presented target was judged as a word, and a dif- For the purposes of the present study, we test-
ferent button for targets judged as nonwords. All ed two different models, both including frequen-
the stimuli appeared in Courier New font, 18 cy variables and phonotactic probability varia-
point size in the center of the computer screen. In bles; they are shown in Table 1. The two models
order to avoid allographic effects, primes were differed for the presence, in model II, of a meas-
displayed in uppercase and targets in lowercase. ure of prime frequency, which was not included
The fixation was 200 ms, followed by a 50 ms in model I, and for being focused either on se-
pause. Primes appeared for 150 ms, followed by quence and bigram token frequencies (model I),
a 50 ms pause. The targets remained on the com- or on sequence and bigram type frequencies.
puter screen for a maximum of 1 sec. If the par- Both models were tested for CC items (e.g. bor-
ticipants did not produce any answer within that sa, ‘bag’) and CV items (e.g. tuc-ano ‘ toucan’)
time, the feedback Fuori tempo (‘Out of time’) separately.
appeared on the screen. Reaction times (RTs)
and the number of errors (Nerr) constituted the Model I Model II
dependent variables. The reaction times were Fixed TargetFreq TargetFreq
measured from target onset to subject’s response, effects LenghRatio PrimeTokenFreq
SequenceTokenFreq LengthRatio
and responses given after the deadline were
BigramTokenFreq SequenceTypeFreq
scored as errors. SequenceTP BigramTypeFreq
The Experiment was preceded by a practice BigramTP SequenceTP
session. When the participants reached the 70 % BigramTP
of valid responses the experiment started. Random Subject Subject
effects Fragment prime Fragment prime
2.2 Experimental variables
Table 1. Fixed and random effects for the CC and CV
Several statistical and distributional properties of items.
word primes, targets and clusters were derived
from the CoLFIS corpus (Bertinetto et al., 2005). The results of the fixed effects analyses for the
For each prime-target pair, we calculated (i) relevant models are summarized in Table 2 (de-
the token frequency of the target (‘TargetFreq’), pendent variable: RTs) and Table 3 (dependent
(ii) the N of words beginning with the prime variable: Nerr).
fragment (‘PrimeTypeFreq’), (iii) the cumulated According to model I, with RTs as the de-
frequency of the words in (ii) (‘PrimeToken- pendent variable, the sequence’s TP (i.e., the TP
Freq’), (iii) the length of the target (in N graph- between the fragment prime and the second con-
emes), (iv) the length of the prime (in N graph- sonant of cluster) turned out to be the most sig-
emes), (v) the prime/target length ratio. nificant predictor, even outranking the contribu-
For each cluster, we calculated (vi) the TP tion of frequency values (for the target, the se-
value, i.e. the probability with which the first quence and the bigram), which all concurred to
consonant of the cluster predicts the occurrence the intercept. A different picture emerged how-
of the following consonant, calculated over the ever for the CV items, for which no probability
corpus word tokens (‘BigramTP’), (vii) the N of variables turned out to significantly predict the
words containing the cluster (‘BigramType- subjects’ response times; on the contrary, the
Freq’), (viii) the cumulated frequency of the target frequency, with the secondary contribution
words in (vii) (‘BigramTokenFreq’), (ix) the TP of the frequency of the cluster, appeared to play a
between the fragment prime and the second con- role for this subset of items.
sonant of the cluster, e.g. P(s|bor) in borsa ‘bag’ According to model II, for CC items the role
(‘SequenceTP’), (x) the N of words containing of the target frequency turned out to be very im-
the sequence of the prime followed by the second portant, and the only additional effect was gener-

25
ated by the sequence’s TP. Thus the two models
were similar in emphasizing the role of the prob-
ability with which a given C follows the prime
sequence. As for CV items, model II returned a
picture very similar to the one that emerged in
model I, with target frequency and bigram type
frequency as the only significant predictors.

Table 3. Fixed effects coefficients for the two models,
CC and CV items (Nerr=dependent variable).

3 Discussion
This work aimed to shed light on the role of TPs
in a so far unstudied experimental environment,
i.e., a lexical decision task with fragment prim-
ing. As the large part of studies on phonotactic
probabilities focused on English, this work also
added to the field with evidence from a poorly
Table 2. Fixed effects coefficients for the two models, investigated language, Italian.
CC and CV items (RTs=dependent variable). Fragment priming is known to be modulated
not only by word frequency and the frequencies
When subject and prime were included as ran- of words matching the fragment but also by top-
dom factors, the pairwise comparison in the like- down information conveyed by the prime: a
lihood ratio test confirmed that the contribution fragment prime matching a unique morpho-
of the sequence’s TP increased significantly the lexical family is as effective as a stem prime,
predictability of the RTs patterns: χ2(1)= 11.184, thus showing that priming acts as a cue for the
p= 0.0008 in model I, χ2 (1)= 5.4403, p= 0.019 in properties displayed in the target (see e.g. Lau-
model II. danna & Bracco, 2006, for Italian).
The average reaction times and the number of This study has shown that the priming effect
errors were positively and significantly correlat- when an initial fragment is available is influ-
ed, though with an intermediate correlation coef- enced also by bottom-up variables; in particular,
ficient (r = .648, p < .01). We thus tested the two it depends on the probability with which the
models with Nerr as the dependent variable, in segments composing the fragment or the frag-
order to determine if the error rate was influ- ment-final consonant predict the occurrence of
enced by frequencies and probabilities to a dif- the consecutive consonant. Although to a lesser
ferent extent than response latencies. extent, the frequency with which bigrams and
With Nerr as the dependent variable, R2 values sequences occur (as types or tokens) in the lexi-
were consistently lower than in the RTs simula- con also predict the subjects’ behavior. Phono-
tions (Table 3), thus indicating that the error pat- tactic probabilities thus turned out to predict the
terns were accounted for by our frequency and subjects’ response to a large degree for many of
probability variables to a more limited extent. In the phonological environments tested in the cur-
particular, both model I and model II emphasized rent experiment, sometimes outperforming target
for the CC items the role of target frequency as frequencies, and consistently overtaking the con-
the only significant predictor of errors, while for tribution of the prime/target length ratio and of
CV items an additional role of bigram frequen- the prime frequency.
cies (by token and by type, respectively) was The results however suggested that the phono-
found. Thus for the CV items, RTs and error rate tactic probabilities in the case of consonant clus-
produced consistent results. ters were overall more important than in the case
of consonant-vowel sequences; thus it must be

26
concluded that the contribution of TPs in lexical tial fragment and the second part of the word
recognition is not the same across phonological (e.g. per-perdente ‘loser’). Together with the
environments. Consonant clusters might play a current experiment, the experiment on prefixed
particularly relevant role in lexical access, com- and pseudo-prefixed words will determine
pared to CV sequences, as contemporary theories whether or not the role of TPs is different when
based on the principles of phonological and mor- the target is a simplex word compared to when it
phological naturalness also seems to predict (see is a prefixed word, and to when it is a pseudo-
e.g. Dressler & Dziubalska-Kolaczyk, 2006; Ko- prefixed word. Different hypotheses may be put
recky-Kroell et al. 2014). forward here, according to whether or not mor-
Additionally, for CC sequence the token fre- phological boundaries affect the processing of
quencies (of the bigram and of the prime + C consonant clusters (e.g., Calderone et al. 2014,
sequence) turned out to be relatively more im- Celata et al. 2015 in press), and according to the
portant than the corresponding type frequencies, likelihood that a given sequence occurs as mor-
thus suggesting that the exposure to the number pheme or as homographic non-morphological
of occurrence of a cluster or of a segment se- pattern (see Laudanna et al., 1994).
quence may be more important in lexical access By describing phonotactic probability and fre-
than the exposure to the individual items contain- quency effects during word recognition, this
ing them. study offers arguments to models of lexical ac-
An additional issue concerns the role of TPs in cess based on bottom-up processes such as co-
morphologically complex words. According to hort models for orthographic stimuli (see e.g.
some models, morphological parsing is necessary Johson & Pugh, 1994). The property of single
for lexical access and the prefix (in the case of consonants to predict the following segment then
prefixed words) has to be stripped away in order speeding up the recognition of the whole word,
for the word to be recognized (from Taft & For- as an additional if not independent way to access
ster, 1975 onwards). Assuming a condition in words and their subparts, might also be discussed
which the fragment prime coincides with a pre- with reference to models that associate ortho-
fix, TPs would play the additional role of mark- graphic input units to semantic and lexical
ing the morphological boundary during the prim- knowledge (from connectionist models such as in
ing event. According to the results of the current Harm & Seidenberg, 1999, to amorphous models
study, it appears to be of utmost importance to such as in Baayen et al. 2011).
further verify whether prefixed and pseudo-
prefixed words behave in the same way. In fact, References
models postulating morphologicl pre-parsing Harald R. Baayen. 2010. A real experiment is a facto-
(e.g. Schreuder & Baayen, 1995) would suggest rial experiment? The Mental Lexicon, 5(1): 149-
that high TPs will codetermine latencies for pre- 157.
fixed targets only, while if morphology does not
Harald R. Baayen, Petar Milin, Dusica Filipovic Dur-
affect word recognition, then the TPs between
devic, Peter Hendrix and Marco Marelli. 2011. An
the fragment prime and the following segment amorphous model for morphological processing in
composing the target will modulate latencies in visual comprehension on naive discriminative lear-
prefixed and pseudo-prefixed words to the same ning. Psychological Review, 118: 438-482.
extent.
Pier Marco Bertinetto, Cristina Burani, Alessandro
A follow-up experiment will therefore test the
Laudanna, Lucia Marconi, Daniela Ratti, C. Ro-
contribution of phonotactic statistical knowledge lando and Anna Maria Thornton. 2005. Corpus e
in native speakers’ access to complex word Lessico di Frequenza dell’Italiano Scritto CoL-
forms (specifically, prefixed nouns). Prefixed FIS). http://linguistica.sns.it/CoLFIS/Home.net
and pseudo-prefixed words will be used for that
Basilio Calderone, Chiara Celata, Katharina Korecky-
purpose. In particular, fragment primes will be
Kroell and Wolfgang U. Dressler. 2014. A compu-
selected according to two different conditions: in tational approach to (mor)phonotactics: Evidence
condition a) the targets are prefixed words and from German. Language Sciences, 46 (part A): 59-
the fragment prime coincides with the prefix 70.
(e.g. bis-bisnonna ‘grandmother’); in condition
Chiara Celata, Katharina Korecky-Kroell, Irene Ricci,
b) the targets are pseudo-prefixed words and no
and Wolfgang U. Dressler. 2015 (in press). Online
morphological boundary occurs between the ini- processing of German (mor)phonotactic clusters by

27
adults and adolescents. Italian Journal of Lingui- Michael Vitevitch, Paul A. Luce, David B. Pisoni and
stics, 27(1). Edward T. Auer. 1999. Phonotactics, neighborhood
activation and lexical access for spoken words.
Wolfgang U. Dressler and Katarzyna Dziubalska-
Brain and Language, 68: 306-311.
Kolacyk. 2006. Proposing Morphonotactics. Italian
Journal of Linguistics, 18: 249-266.
Katharina Korecky-Kroell, Wolfgang U. Dressler,
Eva Maria Freiberger, Eva Reinisch, Karlheinz
Moerth and Gary Libben. 2014. Phonotactic and
morphonotactic processing in German-speaking
adults. Language Sciences, 46 (part A): 48-58.
N.F. Johnson and K.R. Pugh. 1994. A cohort model of
visual word recognition. Cognitive Psychology, 26:
240-346.
Alessandro Laudanna, Cristina Burani and Antonella
Cermele. 1994. Prefixes as processing units. Lan-
guage and Cognitive Processes, 9, 295-316.
Alessandro Laudanna and Giulia Bracco. 2006. Stem
and fragment priming on verbal forms of Italian. In
Proceedings of the 5th International Conference on
the Mental Lexicon (Montreal, Canada, 11-13 Oc-
tober, 2006): 26.
Paul A. Luce and Nathan R. Large. 2001. Phonotac-
tics, density, and entropy in spoken word recogni-
tion. Language and Cognitive Processes, 16: 565-
581.
Sven L. Mattys and Peter W. Jusczyk. 2001. Phono-
tactic cues for segmentation of fluent speech by in-
fants. Cognition, 78: 91-121.
Mark Pitt and James McQueen. 1998. Is compensa-
tion for coarticulation mediated by the lexicon?
Journal of Memory and Language, 39: 347-370.
Robert Schreuder and Harald R. Baayen. 1997. How
simplex complex words can be. Journal of Memory
and Language, 37: 118-139.
Holly L. Storkel. 2001. Learning nonwords: Phono-
tactic probabilities in language development. Jour-
nal of Speech, Language, and Hearing Research,
44: 1321–1337
Marcus Taft and Kenneth I. Forster. 1975. Lexical
storage and retrieval of prefixed words. Journal of
Verbal Learning and Verbal Behavior, 14: 638-
647.
Michael Vitevitch, Paul Luce, J. Charles-Luce and D.
Kemmerer. 1997. Phonotactics and syllable stress:
Implications for the processing of spoken nonsense
words. Language and Speech, 40: 47–62.
Michael S. Vitevitch and Paul A. Luce. 1999. Proba-
bilistic phonotactics and neighborhood activation
in spoken word recognition. Journal of Memory &
Language, 40: 374-408.