=Paper= {{Paper |id=Vol-2253/paper21 |storemode=property |title=Local Associations and Semantic Ties in Overt and Masked Semantic Priming |pdfUrl=https://ceur-ws.org/Vol-2253/paper21.pdf |volume=Vol-2253 |authors=Andrea Nadalini,Marco Marelli,Roberto Bottini,Davide Crepaldi |dblpUrl=https://dblp.org/rec/conf/clic-it/NadaliniMBC18 }} ==Local Associations and Semantic Ties in Overt and Masked Semantic Priming== https://ceur-ws.org/Vol-2253/paper21.pdf
  Local associations and semantic ties in overt and masked semantic
                              priming


 Andrea Nadalini            Marco Marelli           Roberto Bottini            Davide Crepaldi
 International School     Bicocca University       Center for mind/brain     International School
for Advanced Studies         Milan, Italy                sciences           for Advanced Studies
     Trieste, Italy       marco.marelli                Trento, Italy             Trieste, Italy
     anadalini              @unimib.it            roberto.bottini@               dcrepaldi
     @sissa.it                                         unitn.it                  @sissa.it



                                                         portamentali. Alla luce di tali risultati,
                   Abstract                              ciò che è stato tradizionalmente conside-
                                                         rato come effetto semantico potrebbe ba-
   English. Distributional semantic models               sarsi principalmente su associazioni lo-
   (DSM) are widely used in psycholinguis-               cali di co-occorrenza lessicale.
   tic research to automatically assess the
   degree of semantic relatedness between            1    Introduction
   words. Model estimates strongly corre-
   late with human similarity judgements             Over the past two decades, computational se-
   and offer a tool to successfully predict a        mantics has made a lot of progress in the strive
   wide range of language-related phenom-            for developing techniques that are able to pro-
   ena. In the present study, we compare the         vide human-like estimates of the semantic relat-
   state-of-art model with pointwise mutual          edness between lexical items. Distributional Se-
   information (PMI), a measure of local as-         mantic Models (DSM; Baroni and Lenci, 2010)
   sociation between words based on their            assume that it is possible to represent lexical
   surface cooccurrence. In particular, we           meaning based on statistical analyses of the way
   test how the two indexes perform on a             words are used in large text corpora. Words are
   dataset of sematic priming data, showing          modeled as vectors and populate a high-
   how PMI outperforms DSM in the fit to             dimensionsional space where similar words tend
   the behavioral data. According to our re-         to cluster together. Meaning relatedness between
   sult, what has been traditionally thought         two words corresponds to the proximity of their
   of as semantic effects may mostly rely on         vectors; for example, one can approximate relat-
   local associations based on word co-              edness as the cosine of the angle formed by two
   occurrence.                                       word-vectors:
                                                                                  !∙!
                                                                      cosθ =
   Italiano. I modelli semantici distribuzio-                                  | ! |∙| ! |
   nali sono ampiamente utilizzati in psico-
                                                     DSMs have been proposed as a psychologically
   linguistica per quantificare il grado di
                                                     plausible models of semantic memory, with par-
   similarità tra parole. Tali stime sono in
                                                     ticular emphasis on how meaning representations
   linea con i corrispettivi giudizi umani, e
                                                     are achieved and structured (e.g. LSA, Landauer
   offrono uno strumento per modellare
                                                     and Dumais, 1997; HAL, Lund and Burgess,
   un'ampia gamma di fenomeni relativi al
                                                     1996). So, they can be pitted against human be-
   linguaggio. Nel presente studio, confron-
                                                     havior, in search for psychological validation of
   tiamo il modello con la pointwise mutual
                                                     this modeling. For example, the model’s esti-
   information (PMI), una misura di asso-
                                                     mates have been used to make reliable predic-
   ciazione locale tra parole basata sulla
                                                     tions about the processing time associated with
   loro cooccorrenza. In particolare, ab-
                                                     the stimuli (Baroni et al., 2014; Mandera et al.,
   biamo testato i due indici su un set di dati
                                                     2017).
   di priming semantico, mostrando come la
   PMI riesca a spiegare meglio i dati com-
The technique most commonly used to explore                   over-estimating the importance of rare items
semantic processing is the priming paradigm                   (Manning and Schütze, 1999).
(McNamara, 2005), according to which the
recognition of a given word (the target) is easier            Despite many DSMs use measures of local asso-
if preceded by a related word (the prime; e.g.,               ciation between words like PMI to build contin-
cat–dog). Interestingly, facilitation can be ob-              gency matrices, the information conveyed by two
served both when the prime word is fully visible              similar word-vectors is different from the infor-
and when it is kept outside of participants’                  mation conveyed by two highly recurrent words.
awareness through visual masking (Forster and                 Cosine similarity is based on “higher order” co-
Davis, 1984; de Wit and Kinoshita, 2015). In this             occurrences: two words are similar in the way
technique, the prime stimulus is displayed short-             they are used together with all the other words in
ly, embedded between a forward and a backward                 the vocabulary. Local measures as PMI instead
string                (Figure                  1).            rely only on the effective co-presence of two
                                                              given words. Two synonyms like the words car
                                                              and automobile are not likely to often appear
                                                              close to each other in a given text, still they rep-
                                                              resent the same referent, and therefore expected
                                                              to be used in similar contexts.

                                                              Based on these considerations, PMI and DSMs
                                                              can be pitted against human behavior, in search
Figure 1: exemplar trial in a masked priming experiment.
                                                              for psychological validation of this modeling. In
The prime stimulus is briefly presented (<= 50 ms), between   particular, we tested how PMI and cosine prox-
the two masks, before the onset of the target stimulus.       imity predicts priming in a set of data encom-
                                                              passing different prime visibility conditions
Beside words’ distribution, one can be interested             (masked vs unmasked) and prime durations (33,
in the local association strength between lexical             50, 200, 1200 ms).
items, starting from the assumption that two
words that are often used close to each other,                2     Our Study
tend to become associated. Yet, a given pair may
be often attested only because the two compo-                 2.1    Material
nents are in turn highly frequent. Therefore, raw             All the stimuli used in the current study were
frequency counts are often transformed into                   italian words. 50 words referring to animals and
some kinds of association measure which can                   50 words referring to tools were used as target
determine if the pair is attested above chance                stimuli. Each word in this list was paired with
(Evert, 2008). A common method is to compute                  three words from the same category, resulting in
pointwise mutual information (PMI) between                    300 unique prime-target couples which were di-
two words, according to the formula:                          vided into three rotations. We add to each rota-
                                                              tion 100 additional filler trials which will not be
                           !(!₁,!₂)
   PMI(w1,w2) = log2                                          included in the analysis step. More precisely, we
                         !(!₁)!(!₂)                           used abstract word as target stimuli, paired with
                                                              animals and tool primes different from those pre-
where p(w1,w2) corresponds to the probability of              sented in the experimental trials. In this way we
the word pair, while p(w1) and p(w2) to the indi-             ensured that the response to the target was not
vidual probabilities of the two components                    predictable by the presence of the prime.
(Church and Hanks, 1990).                                     Relatedness estimates were obtained by looking
                                                              at the stimuli distribution across the ItWac cor-
PMI has been used to model a wide range of                    pus, a linguistic database of nearly 2 billion
psycholinguistics phenomena, from similarity                  words built through web crawling (Baroni et al.,
judgements (Recchia and Jones, 2009) to reading               2009). We downloaded the lemmatized and part-
speed (Ellis and Simpson-Vlach, 2009). Moreo-                 of-speech annotated corpus, freely provided by
ver, PMI has also been shown to successfully                  the authors. All characters were set to lowercase,
generalize to non-linguistic fields as epistemolo-            and special characters were removed together
gy and psychology of reasoning (Tentori et al.,               with a list of stop-words.
2014). On the other hand, PMI has the limit of
PMI between the word pairs was computed                      block they were asked to press the yes-button if
based on frequency counts gained by sliding a 5-             the target word referred to a tool. The order of
words window along ItWac. Cosine proximity                   the two blocks was counterbalanced across sub-
between word vectors was obtained training a                 jects. 10 practice and 2 warm-up trials were pre-
word2vec model (Mikolov et al., 2013) on the                 sented before each block. Participants could take
same corpus. Model’s parameters were set ac-                 a short break halfway through each block.
cording to the WEISS model (Marelli, 2017). All              Each trial began with a 750 ms fixation-cross
words attested at least 100 times were included              (+). Prime duration was varied across experi-
in the model, which was trained using the con-               ments: 33, 50, 200 and 1200 ms respectively. In
tinuous-bag-of-word architecture, a 5-word win-              the former two conditions, prime visibility was
dow and 200 dimensions. The parameter k for                  prevented through forward and backward visual
negative sampling was set to 10, and the sub-                masks. Finally, the target word was left on the
sampling parameter to 10-5.                                  screen until a response was provided.
Correlations between semantic and lexical varia-
bles are shown in Table 1.                                   Prime visibility task. In the experiments with the
                 Target   Target       PMI       cosine
                                                             masked primes, participants were not informed
                 length   frequency                          about their presence. This was only revealed af-
                                                             ter the relevant session, when participants were
Target length    1                                           invited to take part into a prime visibility task
                                                             requiring them to spot the presence of the letter
Target           -.211    1                                  “n” within the masked word. After the first two
frequency
                                                             examples, where prime duration was increased to
PMI              .091     -.205        1                     150 ms to ensure visibility, 10 practice and 80
                                                             experimental trials were displayed. Prime visibil-
cosine           .147     -.059        .541      1           ity was quantified through a d–prime analysis
                                                             carried out on each participant (Green and Swets
Table 1: Correlations between lexical and semantic indexes
in our stimulus set.
                                                             ,1966).
                                                             2.3   Results
2.2      Methods
Participants: Overall, 246 volunteers were                   Response times (RT) were analyzed on accurate,
recruited for the current study, and were assigned           yes-response trials only. RT were inverse trans-
to the different prime timing conditions. All sub-           formed to approximate a normal distribution and
jects were native Italian speakers, with normal or           employed as a dependent variable in linear
corrected-to-normal vision and no history of neu-            mixed-effects regression models. This analysis
rological or learning diseases.                              allows us to control for all the covariates that
                                                             may have affected the performance, such as trial
Apparatus: All stimuli were displayed on a 25’’              position in the randomized list, rotation, RT and
monitor with a refresh rate of 120 Hz, using                 accuracy on the preceding trial, the response re-
MatLab Psychtoolbox. The words and the masks                 quired in the preceding trial, frequency and
were presented in Arial font 32, in white color              length of the target. All these variables, together
against a black background.                                  with the two semantic indexes (PMI and cosine
                                                             proximity), were entered in the model as fixed
Procedure: Participants were engaged in a clas-              effects, while participants and items were con-
sic YES/NO task, requiring them to classify the              sidered as random intercepts. Model selection
stimuli as members of either the animal or the               was implemented stepwise, progressively remov-
tool category, according to the instructions. YES-           ing those variables whose contribution to good-
response were always provided with the domi-                 ness of fit was not significant.
nant hand.
Each unique prime-target pair was presented on-              In the masked priming data, neither PMI nor co-
ly once to each participant. Experimental ses-               sine proximity were reliable predictors by them-
sions included a total of 200 trials, which were             selves (p=.298 and p=.206, respectively). How-
divided into two blocks. In one block, subjects              ever, both indexes interacted with prime visibil-
were asked to press the yes-button if the target             ity as tracked by participants’ d–prime
word referred to an animal, while in the other               (𝐹!"#∗!! (1, 9750)= 13.74, p<.001; 𝐹!"#∗!! (1,
9745)= 13.24, p<.001.). As illustrated in Figure              Conclusion
1, the more each participant could see the prime
word, the higher the priming effect she dis-                  Thanks to the help of computational methods, we
played.                                                       provided new insights on the nature of the pro-
                                                              cessing that supports semantic priming. Overall,
                                                              effects seem to be primarily driven by local word
                                                              associations as tracked by Pointwise Mutual In-
                                                              formation—when semantic priming emerged,
                                                              PMI effects were consistently stronger and more
                                                              solid than those related to DSM estimates. This
                                                              would be in line with previous literature suggest-
                                                              ing that the behavior of the human cognitive sys-
Figure 1. Interaction between d’ and prime–target associa-    tem may be effectively described by Information
tion. Both PMI (left) and cosine proximity (right) effects    Theory principles. For example, Paperno and
become stronger as prime visibility (d’) increases. Error     colleagues (Paperno et al., 2014) showed that
bars refer to 95% C.I.                                        PMI is a significant predictor of human judge-
                                                              ments of word co–occurrence.
In the overt priming data, both PMI and cosine
                                                              The results from masked priming offer another
proximity yield a significant main effect (50ms
                                                              important insight—some kind of prime visibility
presentation time: 𝐹!"# (1,9769)= 10.36, p= .001;             may be required for semantic/associative priming
𝐹!"# (1, 9769)= 8.602, p= .0058), but only PMI                to emerge. Other studies have shown genuine
significantly predicts priming when both indexes              semantic effects with subliminally presented
are entered into the model (𝐹!"# (1,9769)= 10.36,             stimuli (Bottini et al., 2016). However, they typi-
p= .001; 𝐹!"# (1,9769)=0.60, p=.489). Results                 cally used words from small/closed classes (e.g.,
were very consistent across conditions and showed             spatial words, planet names). Conversely, we
the same pattern when prime presentation time was             drew stimuli across the lexicon, and sampled
200ms or 1200ms (see Figure 2).                               form very large category such as animals and
                                                              tools; this may point to an effect of target pre-
                                                              dictability. In general, our data cast some doubts
                                                              on a wide–across–the–lexicon processing of se-
                                                              mantic information outside of awareness.

                                                              References
                                                              Baroni M., S. Bernardini, A. Ferraresi and E. Zan-
                                                                chetta. (2009). The WaCky Wide Web: A Collec-
                                                                tion of Very Large Linguistically Processed Web-
                                                                Crawled Corpora. Language Resources and Evalu-
                                                                ation 43 (3): 209-226.
                                                              Baroni, M., Dinu, G., and Kruszewski, G. (2014).
                                                                Don't count, predict! A systematic comparison of
                                                                context-counting vs. context-predicting semantic
                                                                vectors. In Proceedings of the 52nd Annual Meet-
                                                                ing of the Association for Computational Linguis-
                                                                tics (Volume 1: Long Papers) (Vol. 1, pp. 238-247)
                                                              Bottini, R., Bucur, M., and Crepaldi, D. (2016). The
                                                                nature of semantic priming by subliminal spatial
                                                                words: Embodied or disembodied?. Journal of Ex-
Figure 2. Significant effect of PMI (right) and non-            perimental Psychology: General, 145(9), 1160.
significant effect of cosine proximity (right) across prime
presentation times (50ms, 200ms, 1200ms on the first, se-     de Wit, B., and Kinoshita, S. (2015). The masked se-
cond and third row respectively). Error bars refer to 95%        mantic priming effect is task dependent: Reconsid-
C.I.                                                             ering the automatic spreading activation process.
                                                                 Journal of Experimental Psychology: Learning,
                                                                 Memory, and Cognition, 41(4), 1062.
                                                              Ellis, N. C., and Simpson-Vlach, R. (2009). Formula-
                                                                 ic language in native speakers: Triangulating psy-
  cholinguistics, corpus linguistics, and education.     More Accurate and Time Consistent? Cognitive
  Corpus Linguistics and Linguistic Theory, 5(1), 61-    science, 40(3), 758-778.
  78.
Evert, S. (2008). Corpora and collocations. Corpus
  linguistics. An international handbook, 2, 223-233.
Forster, K. I., and Davis, C. (1984). Repetition prim-
  ing and frequency attenuation in lexical access.
  Journal of experimental psychology: Learning,
  Memory, and Cognition, 10(4), 680.
Green D.M. and Swets J.A. (1966). Signal detection
  theory and psychophysics. Wiley New York.
Landauer, T. K., and Dumais, S. T. (1997). A solution
  to Plato's problem: The latent semantic analysis
  theory of acquisition, induction, and representation
  of knowledge. Psychological review, 104(2), 211.
Lund, K., and Burgess, C. (1996). Producing high-
  dimensional semantic spaces from lexical co-
  occurrence. Behavior research methods, instru-
  ments, and computers, 28(2), 203-208.
Mandera, P., Keuleers, E., and Brysbaert, M. (2017).
  Explaining human performance in psycholinguistic
  tasks with models of semantic similarity based on
  prediction and counting: A review and empirical
  validation. Journal of Memory and Language, 92,
  57-78.
Manning, C. D., & Schütze, H. (1999). Foundations
  of statistical natural language processing. MIT
  press.
Marelli, M. (2017). Word-Embeddings Italian Seman-
  tic Spaces: A semantic model for psycholinguistic
  research. Psihologija, 50(4), 503-520.
McNamara, T. P. (2005). Semantic priming: Perspec-
  tives from memory and word recognition. Psychol-
  ogy Press.
Mikolov, P., Chen, K., Corrado, G. S. Dean, J. (2013).
  Efficient estimation of word representations in vec-
  tor space. Available from ArXiv:1301.3781.
Paperno, D., Marelli, M., Tentori, K., and Baroni, M.
  (2014). Corpus-based estimates of word associa-
  tion predict biases in judgment of word co-
  occurrence likelihood. Cognitive psychology, 74,
  66-83.
Recchia, G., and Jones, M. N. (2009). More data
  trumps smarter algorithms: Comparing pointwise
  mutual information with latent semantic analysis.
  Behavior research methods, 41(3), 647-656.
Rohaut, B. and Naccache, L. (2018), What are the
  boundaries of unconscious semantic cognition? Eur
  J Neurosci. . doi:10.1111/ejn.13930.
Tentori, K., Chater, N., and Crupi, V. (2016). Judging
  the Probability of Hypotheses Versus the Impact of
  Evidence: Which Form of Inductive Inference Is