=Paper= {{Paper |id=Vol-2253/paper21 |storemode=property |title=Local Associations and Semantic Ties in Overt and Masked Semantic Priming |pdfUrl=https://ceur-ws.org/Vol-2253/paper21.pdf |volume=Vol-2253 |authors=Andrea Nadalini,Marco Marelli,Roberto Bottini,Davide Crepaldi |dblpUrl=https://dblp.org/rec/conf/clic-it/NadaliniMBC18 }} ==Local Associations and Semantic Ties in Overt and Masked Semantic Priming== https://ceur-ws.org/Vol-2253/paper21.pdf

Local associations and semantic ties in overt and masked semantic
priming

Andrea Nadalini Marco Marelli Roberto Bottini Davide Crepaldi
International School Bicocca University Center for mind/brain International School
for Advanced Studies Milan, Italy sciences for Advanced Studies
Trieste, Italy marco.marelli Trento, Italy Trieste, Italy
anadalini @unimib.it roberto.bottini@ dcrepaldi
@sissa.it unitn.it @sissa.it

portamentali. Alla luce di tali risultati,
Abstract ciò che è stato tradizionalmente conside-
rato come effetto semantico potrebbe ba-
English. Distributional semantic models sarsi principalmente su associazioni lo-
(DSM) are widely used in psycholinguis- cali di co-occorrenza lessicale.
tic research to automatically assess the
degree of semantic relatedness between 1 Introduction
words. Model estimates strongly corre-
late with human similarity judgements Over the past two decades, computational se-
and offer a tool to successfully predict a mantics has made a lot of progress in the strive
wide range of language-related phenom- for developing techniques that are able to pro-
ena. In the present study, we compare the vide human-like estimates of the semantic relat-
state-of-art model with pointwise mutual edness between lexical items. Distributional Se-
information (PMI), a measure of local as- mantic Models (DSM; Baroni and Lenci, 2010)
sociation between words based on their assume that it is possible to represent lexical
surface cooccurrence. In particular, we meaning based on statistical analyses of the way
test how the two indexes perform on a words are used in large text corpora. Words are
dataset of sematic priming data, showing modeled as vectors and populate a high-
how PMI outperforms DSM in the fit to dimensionsional space where similar words tend
the behavioral data. According to our re- to cluster together. Meaning relatedness between
sult, what has been traditionally thought two words corresponds to the proximity of their
of as semantic effects may mostly rely on vectors; for example, one can approximate relat-
local associations based on word co- edness as the cosine of the angle formed by two
occurrence. word-vectors:
!∙!
cosθ =
Italiano. I modelli semantici distribuzio- | ! |∙| ! |
nali sono ampiamente utilizzati in psico-
DSMs have been proposed as a psychologically
linguistica per quantificare il grado di
plausible models of semantic memory, with par-
similarità tra parole. Tali stime sono in
ticular emphasis on how meaning representations
linea con i corrispettivi giudizi umani, e
are achieved and structured (e.g. LSA, Landauer
offrono uno strumento per modellare
and Dumais, 1997; HAL, Lund and Burgess,
un'ampia gamma di fenomeni relativi al
1996). So, they can be pitted against human be-
linguaggio. Nel presente studio, confron-
havior, in search for psychological validation of
tiamo il modello con la pointwise mutual
this modeling. For example, the model’s esti-
information (PMI), una misura di asso-
mates have been used to make reliable predic-
ciazione locale tra parole basata sulla
tions about the processing time associated with
loro cooccorrenza. In particolare, ab-
the stimuli (Baroni et al., 2014; Mandera et al.,
biamo testato i due indici su un set di dati
2017).
di priming semantico, mostrando come la
PMI riesca a spiegare meglio i dati com-
The technique most commonly used to explore over-estimating the importance of rare items
semantic processing is the priming paradigm (Manning and Schütze, 1999).
(McNamara, 2005), according to which the
recognition of a given word (the target) is easier Despite many DSMs use measures of local asso-
if preceded by a related word (the prime; e.g., ciation between words like PMI to build contin-
cat–dog). Interestingly, facilitation can be ob- gency matrices, the information conveyed by two
served both when the prime word is fully visible similar word-vectors is different from the infor-
and when it is kept outside of participants’ mation conveyed by two highly recurrent words.
awareness through visual masking (Forster and Cosine similarity is based on “higher order” co-
Davis, 1984; de Wit and Kinoshita, 2015). In this occurrences: two words are similar in the way
technique, the prime stimulus is displayed short- they are used together with all the other words in
ly, embedded between a forward and a backward the vocabulary. Local measures as PMI instead
string (Figure 1). rely only on the effective co-presence of two
given words. Two synonyms like the words car
and automobile are not likely to often appear
close to each other in a given text, still they rep-
resent the same referent, and therefore expected
to be used in similar contexts.

Based on these considerations, PMI and DSMs
can be pitted against human behavior, in search
Figure 1: exemplar trial in a masked priming experiment.
for psychological validation of this modeling. In
The prime stimulus is briefly presented (<= 50 ms), between particular, we tested how PMI and cosine prox-
the two masks, before the onset of the target stimulus. imity predicts priming in a set of data encom-
passing different prime visibility conditions
Beside words’ distribution, one can be interested (masked vs unmasked) and prime durations (33,
in the local association strength between lexical 50, 200, 1200 ms).
items, starting from the assumption that two
words that are often used close to each other, 2 Our Study
tend to become associated. Yet, a given pair may
be often attested only because the two compo- 2.1 Material
nents are in turn highly frequent. Therefore, raw All the stimuli used in the current study were
frequency counts are often transformed into italian words. 50 words referring to animals and
some kinds of association measure which can 50 words referring to tools were used as target
determine if the pair is attested above chance stimuli. Each word in this list was paired with
(Evert, 2008). A common method is to compute three words from the same category, resulting in
pointwise mutual information (PMI) between 300 unique prime-target couples which were di-
two words, according to the formula: vided into three rotations. We add to each rota-
tion 100 additional filler trials which will not be
!(!₁,!₂)
PMI(w1,w2) = log2 included in the analysis step. More precisely, we
!(!₁)!(!₂) used abstract word as target stimuli, paired with
animals and tool primes different from those pre-
where p(w1,w2) corresponds to the probability of sented in the experimental trials. In this way we
the word pair, while p(w1) and p(w2) to the indi- ensured that the response to the target was not
vidual probabilities of the two components predictable by the presence of the prime.
(Church and Hanks, 1990). Relatedness estimates were obtained by looking
at the stimuli distribution across the ItWac cor-
PMI has been used to model a wide range of pus, a linguistic database of nearly 2 billion
psycholinguistics phenomena, from similarity words built through web crawling (Baroni et al.,
judgements (Recchia and Jones, 2009) to reading 2009). We downloaded the lemmatized and part-
speed (Ellis and Simpson-Vlach, 2009). Moreo- of-speech annotated corpus, freely provided by
ver, PMI has also been shown to successfully the authors. All characters were set to lowercase,
generalize to non-linguistic fields as epistemolo- and special characters were removed together
gy and psychology of reasoning (Tentori et al., with a list of stop-words.
2014). On the other hand, PMI has the limit of
PMI between the word pairs was computed block they were asked to press the yes-button if
based on frequency counts gained by sliding a 5- the target word referred to a tool. The order of
words window along ItWac. Cosine proximity the two blocks was counterbalanced across sub-
between word vectors was obtained training a jects. 10 practice and 2 warm-up trials were pre-
word2vec model (Mikolov et al., 2013) on the sented before each block. Participants could take
same corpus. Model’s parameters were set ac- a short break halfway through each block.
cording to the WEISS model (Marelli, 2017). All Each trial began with a 750 ms fixation-cross
words attested at least 100 times were included (+). Prime duration was varied across experi-
in the model, which was trained using the con- ments: 33, 50, 200 and 1200 ms respectively. In
tinuous-bag-of-word architecture, a 5-word win- the former two conditions, prime visibility was
dow and 200 dimensions. The parameter k for prevented through forward and backward visual
negative sampling was set to 10, and the sub- masks. Finally, the target word was left on the
sampling parameter to 10-5. screen until a response was provided.
Correlations between semantic and lexical varia-
bles are shown in Table 1. Prime visibility task. In the experiments with the
Target Target PMI cosine
masked primes, participants were not informed
length frequency about their presence. This was only revealed af-
ter the relevant session, when participants were
Target length 1 invited to take part into a prime visibility task
requiring them to spot the presence of the letter
Target -.211 1 “n” within the masked word. After the first two
frequency
examples, where prime duration was increased to
PMI .091 -.205 1 150 ms to ensure visibility, 10 practice and 80
experimental trials were displayed. Prime visibil-
cosine .147 -.059 .541 1 ity was quantified through a d–prime analysis
carried out on each participant (Green and Swets
Table 1: Correlations between lexical and semantic indexes
in our stimulus set.
,1966).
2.3 Results
2.2 Methods
Participants: Overall, 246 volunteers were Response times (RT) were analyzed on accurate,
recruited for the current study, and were assigned yes-response trials only. RT were inverse trans-
to the different prime timing conditions. All sub- formed to approximate a normal distribution and
jects were native Italian speakers, with normal or employed as a dependent variable in linear
corrected-to-normal vision and no history of neu- mixed-effects regression models. This analysis
rological or learning diseases. allows us to control for all the covariates that
may have affected the performance, such as trial
Apparatus: All stimuli were displayed on a 25’’ position in the randomized list, rotation, RT and
monitor with a refresh rate of 120 Hz, using accuracy on the preceding trial, the response re-
MatLab Psychtoolbox. The words and the masks quired in the preceding trial, frequency and
were presented in Arial font 32, in white color length of the target. All these variables, together
against a black background. with the two semantic indexes (PMI and cosine
proximity), were entered in the model as fixed
Procedure: Participants were engaged in a clas- effects, while participants and items were con-
sic YES/NO task, requiring them to classify the sidered as random intercepts. Model selection
stimuli as members of either the animal or the was implemented stepwise, progressively remov-
tool category, according to the instructions. YES- ing those variables whose contribution to good-
response were always provided with the domi- ness of fit was not significant.
nant hand.
Each unique prime-target pair was presented on- In the masked priming data, neither PMI nor co-
ly once to each participant. Experimental ses- sine proximity were reliable predictors by them-
sions included a total of 200 trials, which were selves (p=.298 and p=.206, respectively). How-
divided into two blocks. In one block, subjects ever, both indexes interacted with prime visibil-
were asked to press the yes-button if the target ity as tracked by participants’ d–prime
word referred to an animal, while in the other (𝐹!"#∗!! (1, 9750)= 13.74, p<.001; 𝐹!"#∗!! (1,
9745)= 13.24, p<.001.). As illustrated in Figure Conclusion
1, the more each participant could see the prime
word, the higher the priming effect she dis- Thanks to the help of computational methods, we
played. provided new insights on the nature of the pro-
cessing that supports semantic priming. Overall,
effects seem to be primarily driven by local word
associations as tracked by Pointwise Mutual In-
formation—when semantic priming emerged,
PMI effects were consistently stronger and more
solid than those related to DSM estimates. This
would be in line with previous literature suggest-
ing that the behavior of the human cognitive sys-
Figure 1. Interaction between d’ and prime–target associa- tem may be effectively described by Information
tion. Both PMI (left) and cosine proximity (right) effects Theory principles. For example, Paperno and
become stronger as prime visibility (d’) increases. Error colleagues (Paperno et al., 2014) showed that
bars refer to 95% C.I. PMI is a significant predictor of human judge-
ments of word co–occurrence.
In the overt priming data, both PMI and cosine
The results from masked priming offer another
proximity yield a significant main effect (50ms
important insight—some kind of prime visibility
presentation time: 𝐹!"# (1,9769)= 10.36, p= .001; may be required for semantic/associative priming
𝐹!"# (1, 9769)= 8.602, p= .0058), but only PMI to emerge. Other studies have shown genuine
significantly predicts priming when both indexes semantic effects with subliminally presented
are entered into the model (𝐹!"# (1,9769)= 10.36, stimuli (Bottini et al., 2016). However, they typi-
p= .001; 𝐹!"# (1,9769)=0.60, p=.489). Results cally used words from small/closed classes (e.g.,
were very consistent across conditions and showed spatial words, planet names). Conversely, we
the same pattern when prime presentation time was drew stimuli across the lexicon, and sampled
200ms or 1200ms (see Figure 2). form very large category such as animals and
tools; this may point to an effect of target pre-
dictability. In general, our data cast some doubts
on a wide–across–the–lexicon processing of se-
mantic information outside of awareness.

References
Baroni M., S. Bernardini, A. Ferraresi and E. Zan-
chetta. (2009). The WaCky Wide Web: A Collec-
tion of Very Large Linguistically Processed Web-
Crawled Corpora. Language Resources and Evalu-
ation 43 (3): 209-226.
Baroni, M., Dinu, G., and Kruszewski, G. (2014).
Don't count, predict! A systematic comparison of
context-counting vs. context-predicting semantic
vectors. In Proceedings of the 52nd Annual Meet-
ing of the Association for Computational Linguis-
tics (Volume 1: Long Papers) (Vol. 1, pp. 238-247)
Bottini, R., Bucur, M., and Crepaldi, D. (2016). The
nature of semantic priming by subliminal spatial
words: Embodied or disembodied?. Journal of Ex-
Figure 2. Significant effect of PMI (right) and non- perimental Psychology: General, 145(9), 1160.
significant effect of cosine proximity (right) across prime
presentation times (50ms, 200ms, 1200ms on the first, se- de Wit, B., and Kinoshita, S. (2015). The masked se-
cond and third row respectively). Error bars refer to 95% mantic priming effect is task dependent: Reconsid-
C.I. ering the automatic spreading activation process.
Journal of Experimental Psychology: Learning,
Memory, and Cognition, 41(4), 1062.
Ellis, N. C., and Simpson-Vlach, R. (2009). Formula-
ic language in native speakers: Triangulating psy-
cholinguistics, corpus linguistics, and education. More Accurate and Time Consistent? Cognitive
Corpus Linguistics and Linguistic Theory, 5(1), 61- science, 40(3), 758-778.
78.
Evert, S. (2008). Corpora and collocations. Corpus
linguistics. An international handbook, 2, 223-233.
Forster, K. I., and Davis, C. (1984). Repetition prim-
ing and frequency attenuation in lexical access.
Journal of experimental psychology: Learning,
Memory, and Cognition, 10(4), 680.
Green D.M. and Swets J.A. (1966). Signal detection
theory and psychophysics. Wiley New York.
Landauer, T. K., and Dumais, S. T. (1997). A solution
to Plato's problem: The latent semantic analysis
theory of acquisition, induction, and representation
of knowledge. Psychological review, 104(2), 211.
Lund, K., and Burgess, C. (1996). Producing high-
dimensional semantic spaces from lexical co-
occurrence. Behavior research methods, instru-
ments, and computers, 28(2), 203-208.
Mandera, P., Keuleers, E., and Brysbaert, M. (2017).
Explaining human performance in psycholinguistic
tasks with models of semantic similarity based on
prediction and counting: A review and empirical
validation. Journal of Memory and Language, 92,
57-78.
Manning, C. D., & Schütze, H. (1999). Foundations
of statistical natural language processing. MIT
press.
Marelli, M. (2017). Word-Embeddings Italian Seman-
tic Spaces: A semantic model for psycholinguistic
research. Psihologija, 50(4), 503-520.
McNamara, T. P. (2005). Semantic priming: Perspec-
tives from memory and word recognition. Psychol-
ogy Press.
Mikolov, P., Chen, K., Corrado, G. S. Dean, J. (2013).
Efficient estimation of word representations in vec-
tor space. Available from ArXiv:1301.3781.
Paperno, D., Marelli, M., Tentori, K., and Baroni, M.
(2014). Corpus-based estimates of word associa-
tion predict biases in judgment of word co-
occurrence likelihood. Cognitive psychology, 74,
66-83.
Recchia, G., and Jones, M. N. (2009). More data
trumps smarter algorithms: Comparing pointwise
mutual information with latent semantic analysis.
Behavior research methods, 41(3), 647-656.
Rohaut, B. and Naccache, L. (2018), What are the
boundaries of unconscious semantic cognition? Eur
J Neurosci. . doi:10.1111/ejn.13930.
Tentori, K., Chater, N., and Crupi, V. (2016). Judging
the Probability of Hypotheses Versus the Impact of
Evidence: Which Form of Inductive Inference Is