=Paper= {{Paper |id=Vol-3033/paper8 |storemode=property |title=Moving from Human Ratings to Word Vectors to Classify People With Focal Dementias: Are We There Yet? |pdfUrl=https://ceur-ws.org/Vol-3033/paper8.pdf |volume=Vol-3033 |authors=Chiara Barattieri di San Pietro,Marco Marelli,Carlo Reverberi |dblpUrl=https://dblp.org/rec/conf/clic-it/PietroMR21 }} ==Moving from Human Ratings to Word Vectors to Classify People With Focal Dementias: Are We There Yet?== https://ceur-ws.org/Vol-3033/paper8.pdf
Moving from Human Ratings to Word Vectors to Classify People with
              Focal Dementias: Are We There Yet?
            Chiara Barattieri di San Pietro1,2, Marco Marelli1, Carlo Reverberi1
                  1. Università degli Studi di Milano-Bicocca, Milano, Italy
                      2. Università degli Studi di Verona, Verona, Italy
                  chiara.barattieridisanpietro@unimib.it,
           carlo.reverberi@unimib.it, marco.marelli@unimib.it



                     Abstract                            proposed. Among these, the number of consecu-
                                                         tive words produced that share similar properties
    Fine-grained variables based on semantic             such as being a citrus fruit (this is called "semantic
    proximity of words can provide helpful di-           cluster" and its size is a clinically useful variable),
    agnostic information when applied to the             and the total number of transitions between clus-
    analysis of Verbal Fluency tasks. How-               ters (called "number of switches" – Troyer et al.,
    ever, before leaving human-based ratings             1997). Indeed, by characterising a semantic VF
    in favour of measures derived from distri-           task (category "fruits") using the number of se-
    butional approaches, it is essential to as-          mantic categories produced, the average semantic
    sess the performance of the latter against           proximity between words, the number of new
    that of the former. In this work, we ana-            words and out-of-category words, it has been pos-
    lysed a Verbal Fluency task using                    sible to classify people with and without focal de-
    measures of semantic proximity derived               mentias, as well as across three different subtypes
    from Distributional Semantic Models of               of dementias (Fronto-Temporal Dementia versus
    language, and we show how Machine                    Primary Progressive Aphasia versus Semantic
    Learning models based on them are less               Dementia) with good accuracy (78% accuracy for
    accurate in classifying patients with focal          patients vs healthy control classification, and
    dementias than the same models built on              58.3% accuracy for classification across three
    human-based ratings. We discuss the pos-             pathological subcategories – Reverberi et al.,
    sible interpretation of these results and the        2014). One shortcoming of this model, however,
    implications for the application of distri-          is that those VP indexes are built upon human-
    butional semantics in clinical settings.             based ratings of semantic proximity between pairs
                                                         of words collected from a sample of healthy con-
1    Introduction                                        trols, making it hard to extend the same approach
                                                         to words for which human judgments were not
A Verbal Fluency (VF) task (Lezak et al., 2004)          previously collected, i.e., other semantic catego-
is a test routinely used in the neuropsychological       ries.
practice that requires participants to produce as
many words as possible belonging to a given se-                Recent advances in Natural Language Pro-
mantic category (e.g., "colours, "animals", etc.)        cessing techniques could help overcome this lim-
within a time limit (typically 60 sec). It is com-       itation. Distributional Semantic Models (DSMs)
monly used to study lexical retrieval, and the sub-      of language start from lexical co-occurrences ex-
ject's performance is standardly rated by the num-       tracted from large text corpora (Turney & Pantel,
ber of correct words produced for a given cue.           2010), and applying different computational tech-
However, to overcome the opacity of the overall          niques, end up representing word meanings as nu-
score and help distinguish the different cognitive       merical vectors in a multidimensional space.
functions underpinning VF performance, addi-             Here, terms that are semantically related are lo-
tional measures of VF performance have been              cated close to each other. Such models can be used
                                                         to simulate the structure of conceptual knowledge
                                                         implied in the performance of semantic tasks such

  Copyright ©️ 2021 for this paper by its authors. Use
permitted under Creative Commons License Attribu-
tion 4.0 International (CC BY 4.0).
as a VF task. Indeed, DSMs have been success-           (about 1.9 billion tokens). To ensure comparabil-
fully applied to different tasks of semantic rela-      ity with this previous literature, we extracted a
tionships (Mandera et al., 2017), including the         subset of the itWac corpus to match the TASA
analysis of VF tasks to classify patients with Alz-     size. We selected an untagged set of 91,058 docu-
heimer's disease (Linz et al., 2017) and reaching       ments randomly extracted from itWAC, compris-
remarkable accuracy (F1 = 0.77). However, de-           ing the same set of words (N = 180,080) of the
spite the success, questions have been posed con-       WEISS semantic spaces. The creation of a matrix
cerning what exactly distributional models can          of co-occurrences was carried out using the DIS-
learn (Erk, 2016) and if such models are suffi-         SECT toolkit (Dinu et al., 2013), and applying a
ciently rich in terms of encoded features (Lucy         Positive Pointwise Mutual Information weighting
and Gauthier, 2017) to be applied to all sorts of       scheme (Niwa & Nitta, 1995), followed by dimen-
semantic tasks/problems.                                sionality reduction by Singular Value Decompo-
      The present study aims to test if the analysis    sition. We set the number of dimensions at 300
of a VF task based on DSM-derived measures              following the study of Landauer and Dumais
would reproduce the results of an analysis based        (1997), which indicates good performance for di-
on human-derived measures. In particular, we de-        mensionalities ranging from 300 to 1,000.
cided to re-analyse the original data of a semantic
                                                        2    Materials and Methods
VF task (category “fruit”) that Reverberi et al. col-
lected on a cohort of participants with focal de-       The verbal production to a sematic VF (category
mentias and healthy controls (CTR). Focal de-           "fruits") from the original cohort of 371 subjects
mentias are neurodegenerative diseases that cause       (Table 1) was analysed. Overall datapoints were
deterioration of cognitive function, including lan-     N = 3,642 words, with 133 unique words.
guage. The original cohort included people with
Fronto-Temporal Dementia (FTD), Primary Pro-                             PPA        FTD           SD        CTR
gressive Aphasia (PPA), and Semantic Dementia            Number           16          33           15        307
                                                         Age           73.6±3.4   67.0±6.1     67.9±6.5    54.9±17
(SD). Each diagnostic group presents peculiar lin-       Education      7±4.6      8.6±4.4      9.3±4.9     9.6±5
guistic symptomatology, making these syndromes
ideal candidates for a differential approach. The           Table 1: Demographic information for all the
human-based indexes of VF (see Section 2 for de-                        subject groups.
tails) were adapted to be computed on different
DSMs (Landauer & Dumais, 1997; Mikolov et al.,          Data were entered in an R pipeline, leveraging on
2013). Specifically, we adopted two predict and         two word2vec (Mikolov et al., 2013) semantic
one count model. All three semantic spaces were         spaces ("WEISS1" and "WEISS2"), and an LSA
based on the itWac web-crawled corpus (Baroni           space with identical vocabulary size (“LSA”). For
et al., 2009). The two predict models (Word-Em-         each participant, the pipeline outputs three sets of
beddings Italian Semantic Space 1 and 2 -               semantic indexes computed according to five dif-
"WEISS1" and "WEISS2") were obtained from               ferent thresholds (set to identify the occurrence of
Marelli (2017) and were chosen for both their           a semantic switch), corresponding to the 10 th, 30th,
practical          accessibility         (http://me-    50th, 70th, and 90th quantiles of the distribution of
shugga.ugent.be/snaut-italian) and their proven         semantic relatedness values (Table 2), computed
good performance in previous studies (Mancuso           considering the cosine proximity of all adjacent
et al., 2020; Nadalini et al., 2018). WEISS1 is         words produced by the whole study cohort.
based on a CBOW model with 400 dimensions
                                                                          10th    30th       50th   70th   90th
and a 9-word window; WEISS2 is based on a                     WEISS1      .185    .226       .247   .268   .287
CBOW model with 200 dimensions and a 5-word                   WEISS2      .303    .371       .405   .434   .463
                                                              LSA         .336    .431       .479   .519   .582
window. Both models consider words with a min-
imum frequency of 100 in the original corpus. The           Table 2: Cosine values adopted as thresholds
count-model based on Latent Semantic Analysis                     for the three semantic spaces.
("LSA") was created ad-hoc for this study follow-
ing Günther and colleagues' (2015) procedure.               For each participant, we computed the follow-
Many psycholinguistic studies applying LSA in               ing 9 indexes of VF:
the English language used the TASA corpus
(http://lsa.colorado.edu, including 12,190,931 to-          1) Total number of valid words, produced in
kens), which is a far smaller corpus than ItWac                1 minute, excluding repetitions. Differ-
                                                               ently from the original work, words not
    included in the vocabulary of the seman-                 from the corpus of reference (itWac), con-
    tic space were obligatory excluded, but                  verted to lower case and excluding
    words not belonging to the category                      metadata;
    "fruit" were kept. Due to limitations of the
                                                         8) Out-of-category words ("OOC" ): number
    semantic space's vocabulary, 53 words
                                                            of words not pertaining to the 15 subcate-
    and compound expressions (8 from the
                                                            gories of "fruit" as identified in previous
    patient group and 45 from the control
                                                            works by the same Authors (Reverberi et
    group) out of the 3,642 (1.5%) were re-
                                                            al., 2004; 2006). Given that the vectorial
    moved from the data;
                                                            representation of words differs according
2) Repetitions ("rep"): the total number of                 to inflectional morphology, data were not
   repeated words;                                          normalised (singular to plural) but kept as
                                                            originally produced;
3) Total number of switches ("switch"):
   computational equivalent of the "number               9) Order Index ("OI" ): computed following
   of switches between subcategories" in the                the formula proposed in Reverberi et al.,
   original work. Semantic switches were                    2006. In its simplified notation, the Order
   identified based on measures of semantic                 Index is equivalent to the difference be-
   relatedness obtained from three semantic                 tween the theoretical maximum number
   spaces and according to five different                   of switches (total number of words minus
   thresholds (Table 2);                                    1) and the actual observed switches, di-
                                                            vided by the range of theoretically possi-
4) Total number of semantic clusters
                                                            ble switches (total number of words mi-
   ("NC"): computational equivalent of the
                                                            nus 1, minus total number of clusters mi-
   "number of subcategories" in the original
                                                            nus 1). To avoid non-linearity problems,
   work. Clusters were identified based on
                                                            the participant production is represented
   the occurrence of a semantic switch, i.e.,
                                                            in a three-dimensional space having num-
   when the mean value of cosine similarity
                                                            ber of words, number of switches, and
   of words within a cluster drops below the
                                                            number of subcategories as axes: the or-
   identified threshold (Table 2);
                                                            der index is then transformed using the
5) Mean size of clusters ("SC"): mean num-                  arctangents of the resulting segments.
   ber of words within a semantic cluster;
   computational equivalent of the "relative       2.1     Statistical Analyses
   switching" index in the original work;             All variables of interest were pre-processed to
6) Average semantic proximity ("prox"), the        remove variance due to differences in age, level
   semantic distance between adjacent              of education, and the total number of words. We
   words. Unlike the original index, based         ran a linear regression analysis with the relevant
   on human-derived estimated of semantic          variable as the dependent factor and with age, ed-
   proximity (Reverberi et al., 2006), we de-      ucation, and the total number of words as regres-
   rived this index from the mean cosine be-       sors (only considering healthy subjects to avoid
   tween the vectorial representation of adja-     any potential bias in the estimates due to brain
   cent words in the participants' production.     damage). We then used the regression coefficients
                                                   to compute the residuals for each variable and all
    In addition, to ascertain the replicability    subjects. Residuals were then used as predicting
of original results with computational meth-       variables for the classification analysis. The aver-
odologies, the following indexes were adapted      age for each variable and each patient group was
from the original work:                            compared with the respective average in the con-
                                                   trol group through a two-sample t-test, Bonferroni
7) Mean familiarity ("fam"). As a computa-         corrected.
   tional equivalent of the original index,        2.2     Classification Analysis
   calculated according to familiarity scores
   collected from a sample of healthy con-         The R packages caret and e1071 (interfaces to
   trols (Reverberi et al., 2004), we com-         the LIBSVM by Chang & Lin 2011) were used.
   puted the raw word frequency as derived         The aim of the classification analysis was to de-
                                                   termine: i) which variables, alone or in combina-
                                                   tion, would be able to classify a subject as being
either a patient or control, and; ii) which variables,      The best classification performances for pa-
alone or in combination, would best classify a pa-       tients versus healthy controls was found when we
tient as being member of one of the three frontal        considered the variables "total number of new
dementia group (FTD, PPA, SD).                           words" and "Order Index" at any threshold and
      After removing variance due to differences         with all semantic spaces. In these cases, the over-
in age and education, we performed a Leave-One-          all accuracy of the models was 61.2%, with sensi-
Out Cross-Validation (LOOCV) analysis. The               tivity of 57.4% and specificity of 79.7% (Table 4).
model kernels were set as linear, and relative
weights were added to counterbalance the differ-             SS     Thres.             Vars          Acc.    Sens.   Spec.
ence in group numerosity. In LOOCV, a data in-                   Human-       NC + prox + new +
                                                                                                     84       86      82
stance is left out, and a model is constructed on all             Based             OOC

other data instances in the training set. The model          all     all          New + OI           61.2    57.4    79.7

is tested against the data point left out, and the as-       -        -                New           61.0    57.0    79.7
sociated error is recorded. The process is then re-          all     all               OI            61.0    57.0    79.7
peated for all data points, and the overall predic-          all     all       Rep + new + OI        60.7    55.7    84.4
tion error is calculated by taking the average of the
                                                             -        -                OOC           60.4    56.4    79.7
recorded test error estimates. The LOOCV analy-
sis was repeated for each combination of the 9              Table 4. Top 5 performing classification mod-
variables of interest, for each of the 3 semantic        els (patients vs controls).
spaces, and each of the 5 thresholds, resulting in
7,665 models.                                               The best classification performances for pa-
                                                         tients in their specific pathology group was found
3    Results                                             when we considered the variables "out of category
                                                         words", "average semantic proximity", and "size
   We compared the performance of each group to          of clusters" computed at the 3rd threshold (50th) of
that of healthy controls for each of the nine varia-     the WEISS2 space (Table 5). In this case, the
bles considered. All pathological groups signifi-        overall max accuracy was 43.8%. Sensitivity and
cantly differed from the controls on at least one        specificity for each pathology group were: PPA =
variable (Table 3). In the classification analysis,      87.5% and 62.5%; FTD = 36.4% and 71%; SD =
we investigated which variables (alone, or in all        13.33% and 81.6%, respectively.
the possible combinations with other variables,
i.e., 511 combinations) would best predict the                SS
                                                                     Th
                                                                                Vars          Acc.   PPA     FTD      SD
                                                                     res.
membership of participants. We carried out two
                                                                             Fam + NS +
sets of analysis: i) healthy controls versus partici-         Human-
                                                                             OI + new +       58     NA       NA      NA
                                                               Based
pants with focal dementias (PPA, FTD, and SD);                                   rep
                                                                             OOC + prox              87.5/   36.4/   13.3/
and ii) participants with PPA versus participants            W2      50
                                                                                + SC
                                                                                              43.8
                                                                                                     62.5     71     81.6
with FTD versus participants with SD. The analy-                                                     87.5/   39.4/    0/
                                                             W1      10      OOC + SC         42.2
sis was performed for each semantic space and for                                                    56.3    74.2    83.7
                                                                                                     93.8/   33.3/    0/
each preidentified threshold for a total number of           W1      30       NS + NC         40.6
                                                                                                      50     77.4    85.7
7,665 models.                                                                                        87.5/   36.4/    0/
                                                             W1      70      OOC + SC         40.6
                                                                                                     62.5    64.5    81.6
                                                                                                     68.8/   42.4/   0/81.
                        FTD       PPA        SD              W2      90         SC            39.1
                                                                                                     60.4    64.5      6
 Proximity               +                                  Table 5. Top 5 performing classification mod-
 Familiarity
                                                         els (patients in each specific pathology group).
 New words                +                   +
 Out-Of-Category
 N Switches               +                              4         Discussion
 N Cluster                +                              In this work, we replaced human-based measures
 Size Cluster             +         +         +
                                                         of semantic proximity with DSM-derived
 Order Index              +         +
 Repetitions                                             measures of semantic proximity to compute a set
                                                         of indexes of VF that was found to be able to clas-
  Table 3: Variables that are significantly differ-      sify with good accuracy people with and without
ent between a given pathological group vis-à-vis         focal dementias based on their verbal production
 healthy controls. Results Bonferroni-corrected          to a semantic VF task (category "fruits", which
      for multiple comparison are reported.              was originally adopted to limit the set of possible
items as compared to broader categories such as         touch). As such, the representation of this seman-
“animals”). The objective of the study was to as-       tic category might not be simply derivable by the
sess the accuracy of Machine Learning (ML)              lexical distribution of its items in a corpus. Differ-
models based on DSM measures of semantic in-            ently, other semantic categories might leverage on
formation, in view of their possible extension to       less perceptual and more encyclopaedic semantic
words and semantic categories for whom the              knowledge, such as, for example, the category
measure of semantic proximity is not available.         "animals", another semantic cue widely used for
Despite being above chance in both cases, ML            the assessment of VF. Indeed, while people do
models based on DSM-derived measures of se-             generally have first-hand, real-life experience of
mantic proximity showed lower accuracy com-             "fruits", knowledge about "animals" may be more
pared to models built on human-based ratings.           commonly derived from indirect exposure to en-
This was true both for the classification of patients   cyclopaedic information (i.e., the media). In other
versus controls (61.2% and 84%, respectively), as       words, when we think about a cherry, we may not
well as for the subclassification of diagnosis          only recall the meaning of the lemma as compared
(43.8% and 58%, respectively).                          to, for example, an apple, but at the same time, we
   The observed differences might be due to the         might also recall the sensory information attached
functional adaptations needed to transpose the          to the drupe (round, red, juicy, etc.). Conversely,
original VF indexes to DSM-derived measures.            apart from common pets, it is unlikely that partic-
For example, the computational equivalent of the        ipants have first-hand experience about most of
"familiarity" index, calculated according to famil-     the items commonly included "animals" category
iarity scores collected from the sample of healthy      (e.g., "lion", “whale”, etc.).
controls, was approximated via the raw word fre-           This means that distributional models might be
quency as derived from the corpus of reference.         not the best-suited tool to resolve semantic prob-
Moreover, given that the vectorial representation       lems when the semantic task under investigation
of words differs according to inflectional mor-         makes use of a subset of words pertaining to a se-
phology, data were not normalised (singular to          mantic category perceptually rich (such as that of
plural) but kept as originally produced, unlike the     “fruits”).
original work. Hence, it might be possible that
these operations introduced some distortions that       5    Conclusions and Future Works
could explain the differences observed compared
                                                           The past decades have witnessed an increasing
to the original study.
                                                        interest towards the application of NLP tech-
   In terms of parameter setting, it is worth noting
                                                        niques to answer, or support the resolution of, dif-
that our choices might have affect the overall per-
                                                        ferent clinical problems, from patients’ classifica-
formance of the adopted models, possibly reduc-
                                                        tions to disease monitoring, and from differential
ing their ability to avoid noise and biases. For ex-
                                                        diagnosis to prediction of treatment response (see
ample, according to Tripodi (2017), hyperparam-
                                                        de Boer et al., 2018 for a comprehensive review).
eter setting for Italian has specific requirements in
                                                        All these applications implicitly rely on the as-
terms of vector size, negative sampling, vocabu-
                                                        sumption that these techniques are agnostic/trans-
lary threshold cutting, to maximize performance
                                                        parent to the semantic task under investigation
in an analogy task (although to what extent such
                                                        and, given the good results obtained, that they are
recommendation can be extended to VF is an em-
                                                        equipped with sufficiently rich semantic infor-
pirical question that remains to be addressed).
                                                        mation to solve any kind of task based on linguis-
Also, the choice of a CBOW model, instead of
                                                        tic data. Our findings challenge this idea and align
“more predictive” algorithms such as Skipgram
                                                        with previous works pointing to a lack of basic
and Mask might have reduced the ability of the
                                                        features of perceptual meaning in DSM (Lucy and
model to mimic the human ratings of word asso-
                                                        Gauthier, 2017).
ciations.
                                                           Implications for the application of DSM-de-
   However, a different explanation might be re-
                                                        rived measures to clinical work and research indi-
lated to the type of information encoded into the
                                                        cate that the choice of the verbal task and the as-
human proximity ratings. Given its evolutionary
                                                        sociated DSM can affect the results. For this rea-
relevance, the neural substrate underpinning the
                                                        son, we plan to assess the classification accuracy
notion of "fruits" might encode a rich multidimen-
                                                        of ML models built both on human ratings and
sional semantic characterisation (including sen-
                                                        DSM-derived measures of semantic proximity for
sory information such as taste, smell, sight,
other categorical VF tasks, as well as adopting            ceedings of the 51st Annual Meeting of the Associa-
word vectors derived from lemmatised corpora.              tion for Computational Linguistics: System Demon-
   Before moving to more recent language models            strations, 31-36.
such as the last generation of deep neural language      Günther Fritz, Dudschig Caroline and Kaup Barbara.
models like BERT (Devlin et al., 2019), consider-          2015. Latent semantic analysis cosines as a cogni-
ation should be given to the trade-off between             tive similarity measure: Evidence from priming
computational and data resources needed to train           studies. Quarterly Journal of Experimental Psychol-
them (Bender et al., 2021) on one hand, and what           ogy, 69(4):626–653.
kind of added value they can give compared to tra-       Landauer Thomas and Dumais Susan. 1997. A solution
ditional “static” embeddings (Lenci et al., 2021)          to Plato's problem: The latent semantic analysis the-
on the other. Further research might address the           ory of acquisition, induction, and representation of
limits of current DSM models by enriching the in-          knowledge. Psychological review, 104(2), 211.
formation encoded, integrating experiential and          Lenci Alessandro, Sahlgren Magnus, Jeuniaux Patrick,
distributional data to induce reliable semantic rep-       Gyllensten Amaru Cuba and Miliani Martina 2021.
resentations (Andrews et al., 2009). Additional            A comprehensive comparative evaluation and anal-
sources of multimodal information (e.g., Lynnott           ysis of Distributional Semantic Models. arXiv pre-
et al., 2020) including visual and audio infor-            print arXiv:2105.09825.
mation, might help overcome these current limita-        Lezak Muriel, Howieson Diane, Loring David, Hannay
tions (Chen et al., 2021).                                 Julia and Fischer Jill. 2004. Neuropsychological as-
                                                           sessment. New York: OUP, USA.
References                                               Lucy Li and Gauthier Jon. 2017. Are Distributional
Baroni Marco, Bernardini Silvia, Ferraresi Adriano         Representations Ready for the Real World? Evalu-
  and Zanchetta Eros. 2009. The waCky wide web: A          ating Word Vectors for Grounded Perceptual Mean-
  collection of very large linguistically processed        ing. Proceedings of the First Workshop on Lan-
  web-crawled corpora. Language Resources and              guage Grounding for Robotics.
  Evaluation, 43(3): 209–226.
                                                         Lynott Dermot, Connell Louise, Brysbaert Marc,
Bender Emily M., Gebru Timnit, McMillan-Major              Brand James and Carney James. 2020. The Lancas-
  Angelina & Shmitchell Shmargaret. 2021. On the           ter Sensorimotor Norms: multidimensional
  Dangers of Stochastic Parrots: Can Language Mod-         measures of perceptual and action strength for
  els Be Too Big? . In Proceedings of the 2021             40,000 English words. Behavior Research Methods,
  ACM Conference on Fairness, Accountability, and          52(3), 1271-1291.
  Transparency: 610-623.                                 Mandera Paul, Keuleers Emmanuel and Brysbaert
Chang Chih-Chung and Lin Chih-Jen. 2011. LIBSVM:           Marc. 2017. Explaining human performance in psy-
  a library for support vector machines. ACM transac-      cholinguistic tasks with models of semantic similar-
  tions on intelligent systems and technology (TIST),      ity based on prediction and counting: A review and
  2(3): 1-27.                                              empirical validation. Journal of Memory and Lan-
                                                           guage, 92, 57-78.
Chen Wei, Wang Weiping, Liu Li and Lew Micheal S.
  2021. New ideas and trends in deep multimodal con-     Marelli Marco. 2017. Word-embeddings Italian Se-
  tent understanding: A review. Neurocomputing,            mantic spaces: A semantic model for psycholinguis-
  426:195-215.                                             tic research. Psihologija, 50(4): 503–520.
De Boer Jann N., Voppel Alban E., Begemann Marieke       Mikolov Tomas, Sutskever Ilya, Chen Kai, Corrado
  J.H., Schnack Hugo G., Wijnen Frank and Sommer           Greg and Dean Jeffrey. 2013. Distributed Represen-
  Iris E.C. 2018. Clinical use of semantic space mod-      tations of Words and Phrases and their Composi-
  els in psychiatry and neurology: a systematic review     tionality.              Retrieved            from
  and meta-analysis. Neuroscience & Biobehavioral          http://arxiv.org/abs/1310.4546
  Reviews, 93: 85-92.                                    Niwa Yoshiki and Nitta Yoshihiko. 1995. Co-occur-
Devlin Jacob, Chang MW, Lee K, Toutanova K. 2019           rence vectors from corpora vs. distance vectors from
  BERT: Pre-training of Deep Bidirectional Trans-          dictionaries. arXiv preprint cmp-lg/9503025
  formers for Language Understanding. In: Proceed-       R CoreTeam. 2021. R: A language and environment for
  ings of NAACLHLT 2019, 4171–4186                         statistical computing. R Foundation for Statistical
Dinu Georgiana and Baroni Marco. 2013. Dissect-dis-        Computing. Retrieved from https://www.r-pro-
  tributional semantics composition toolkit. In Pro-       ject.org.
                                                         Reverberi Carlo, Cherubini Paolo, Baldinelli Sara and
                                                           Luzzi Simona. 2014. Semantic fluency: Cognitive
  basis and diagnostic performance in focal dementias
  and Alzheimer's disease. Cortex, 54, 150-164.
Tripodi Rocco and Pira Stefano Li. 2017. Analysis of
   Italian word embeddings. arXiv preprint
   arXiv:1707.08783.
Turney Peter D. and Pantel Patrick. 2010. From fre-
  quency to meaning: Vector space models of seman-
  tics. Journal of artificial intelligence research, 37,
  141-188