<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Moving from Human Ratings to Word Vectors to Classify People with Focal Dementias: Are We There Yet?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Chiara Barattieri di San Pietro</string-name>
          <email>chiara.barattieridisanpietro@unimib.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Marelli</string-name>
          <email>marco.marelli@unimib.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carlo Reverberi</string-name>
          <email>carlo.reverberi@unimib.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>. Università degli Studi di Milano-Bicocca</institution>
          ,
          <addr-line>Milano</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>. Università degli Studi di Verona</institution>
          ,
          <addr-line>Verona</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Fine-grained variables based on semantic proximity of words can provide helpful diagnostic information when applied to the analysis of Verbal Fluency tasks. However, before leaving human-based ratings in favour of measures derived from distributional approaches, it is essential to assess the performance of the latter against that of the former. In this work, we analysed a Verbal Fluency task using measures of semantic proximity derived from Distributional Semantic Models of language, and we show how Machine Learning models based on them are less accurate in classifying patients with focal dementias than the same models built on human-based ratings. We discuss the possible interpretation of these results and the implications for the application of distributional semantics in clinical settings.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        A Verbal Fluency (VF) task
        <xref ref-type="bibr" rid="ref11">(Lezak et al., 2004)</xref>
        is a test routinely used in the neuropsychological
practice that requires participants to produce as
many words as possible belonging to a given
semantic category (e.g., "colours, "animals", etc.)
within a time limit (typically 60 sec). It is
commonly used to study lexical retrieval, and the
subject's performance is standardly rated by the
number of correct words produced for a given cue.
However, to overcome the opacity of the overall
score and help distinguish the different cognitive
functions underpinning VF performance,
additional measures of VF perfo
        <xref ref-type="bibr" rid="ref18">rmance have been
Copyright ©️ 2021</xref>
        for this paper by its authors. Use
permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).
proposed. Among these, the number of
consecutive words produced that share similar properties
such as being a citrus fruit (this is called "semantic
cluster" and its size is a clinically useful variable),
and the total number of transitions between
clusters (called "number of switches" – Troyer et al.,
1997). Indeed, by characterising a semantic VF
task (category "fruits") using the number of
semantic categories produced, the average semantic
proximity between words, the number of new
words and out-of-category words, it has been
possible to classify people with and without focal
dementias, as well as across three different subtypes
of dementias (Fronto-Temporal Dementia versus
Primary Progressive Aphasia versus Semantic
Dementia) with good accuracy
        <xref ref-type="bibr" rid="ref19">(78% accuracy for
patients vs healthy control classification, and
58.3% accuracy for classification across three
pathological subcategories – Reverberi et al.,
2014)</xref>
        . One shortcoming of this model, however,
is that those VP indexes are built upon
humanbased ratings of semantic proximity between pairs
of words collected from a sample of healthy
controls, making it hard to extend the same approach
to words for which human judgments were not
previously collected, i.e., other semantic
categories.
      </p>
      <p>
        Recent advances in Natural Language
Processing techniques could help overcome this
limitation. Distributional Semantic Models (DSMs)
of language start from lexical co-occurrences
extracted from large text corpora
        <xref ref-type="bibr" rid="ref21">(Turney &amp; Pantel,
2010)</xref>
        , and applying different computational
techniques, end up representing word meanings as
numerical vectors in a multidimensional space.
Here, terms that are semantically related are
located close to each other. Such models can be used
to simulate the structure of conceptual knowledge
implied in the performance of semantic tasks such
as a VF task. Indeed, DSMs have been
successfully applied to different tasks of semantic
relationships
        <xref ref-type="bibr" rid="ref14">(Mandera et al., 2017)</xref>
        , including the
analysis of VF tasks to classify patients with
Alzheimer's disease (Linz et al., 2017) and reaching
remarkable accuracy (F1 = 0.77). However,
despite the success, questions have been posed
concerning what exactly distributional models can
learn (Erk, 2016) and if such models are
sufficiently rich in terms of encoded features
        <xref ref-type="bibr" rid="ref12 ref14 ref20">(Lucy
and Gauthier, 2017)</xref>
        to be applied to all sorts of
semantic tasks/problems.
      </p>
      <p>
        The present study aims to test if the analysis
of a VF task based on DSM-derived measures
would reproduce the results of an analysis based
on human-derived measures. In particular, we
decided to re-analyse the original data of a semantic
VF task (category “fruit”) that Reverberi et al.
collected on a cohort of participants with focal
dementias and healthy controls (CTR). Focal
dementias are neurodegenerative diseases that cause
deterioration of cognitive function, including
language. The original cohort included people with
Fronto-Temporal Dementia (FTD), Primary
Progressive Aphasia (PPA), and Semantic Dementia
(SD). Each diagnostic group presents peculiar
linguistic symptomatology, making these syndromes
ideal candidates for a differential approach. The
human-based indexes of VF (see Section 2 for
details) were adapted to be computed on different
DSMs
        <xref ref-type="bibr" rid="ref16 ref9">(Landauer &amp; Dumais, 1997; Mikolov et al.,
2013)</xref>
        . Specifically, we adopted two predict and
one count model. All three semantic spaces were
based on the itWac web-crawled corpus
        <xref ref-type="bibr" rid="ref1">(Baroni
et al., 2009)</xref>
        . The two predict models
(Word-Embeddings Italian Semantic Space 1 and 2
"WEISS1" and "WEISS2") were obtained from
        <xref ref-type="bibr" rid="ref15">Marelli (2017)</xref>
        and were chosen for both their
practical accessibility
(http://meshugga.ugent.be/snaut-italian) and their proven
good performance in previous studies (Mancuso
et al., 2020; Nadalini et al., 2018). WEISS1 is
based on a CBOW model with 400 dimensions
and a 9-word window; WEISS2 is based on a
CBOW model with 200 dimensions and a 5-word
window. Both models consider words with a
minimum frequency of 100 in the original corpus. The
count-model based on Latent Semantic Analysis
("LSA") was created ad-hoc for this study
followin
        <xref ref-type="bibr" rid="ref8">g Günther and colleagues' (2015</xref>
        ) procedure.
Many psycholinguistic studies applying LSA in
the English language used the TASA corpus
(http://lsa.colorado.edu, including 12,190,931
tokens), which is a far smaller corpus than ItWac
(about 1.9 billion tokens). To ensure
comparability with this previous literature, we extracted a
subset of the itWac corpus to match the TASA
size. We selected an untagged set of 91,058
documents randomly extracted from itWAC,
comprising the same set of words (N = 180,080) of the
WEISS semantic spaces. The creation of a matrix
of co-occurrences was carried out using the
DISSECT toolkit
        <xref ref-type="bibr" rid="ref7">(Dinu et al., 2013)</xref>
        , and applying a
Positive Pointwise Mutual Information weighting
scheme
        <xref ref-type="bibr" rid="ref17">(Niwa &amp; Nitta, 1995)</xref>
        , followed by
dimensionality reduction by Singular Value
Decomposition. We set the number of dimensions at 300
following the study of
        <xref ref-type="bibr" rid="ref9">Landauer and Dumais
(1997)</xref>
        , which indicates good performance for
dimensionalities ranging from 300 to 1,000.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Materials and Methods</title>
      <p>The verbal production to a sematic VF (category
"fruits") from the original cohort of 371 subjects
(Table 1) was analysed. Overall datapoints were
N = 3,642 words, with 133 unique words.</p>
      <p>Number
Age
Education</p>
      <p>PPA
16
73.6±3.4
7±4.6</p>
      <p>FTD
33
67.0±6.1
8.6±4.4</p>
      <p>SD
15
67.9±6.5
9.3±4.9</p>
      <p>CTR
307
54.9±17
9.6±5</p>
      <p>
        Data were entered in an R pipeline, leveraging on
two word2vec
        <xref ref-type="bibr" rid="ref16">(Mikolov et al., 2013)</xref>
        semantic
spaces ("WEISS1" and "WEISS2"), and an LSA
space with identical vocabulary size (“LSA”). For
each participant, the pipeline outputs three sets of
semantic indexes computed according to five
different thresholds (set to identify the occurrence of
a semantic switch), corresponding to the 10th, 30th,
50th, 70th, and 90th quantiles of the distribution of
semantic relatedness values (Table 2), computed
considering the cosine proximity of all adjacent
words produced by the whole study cohort.
      </p>
      <p>WEISS1
WEISS2
LSA
1) Total number of valid words, produced in
1 minute, excluding repetitions.
Differently from the original work, words not
included in the vocabulary of the
semantic space were obligatory excluded, but
words not belonging to the category
"fruit" were kept. Due to limitations of the
semantic space's vocabulary, 53 words
and compound expressions (8 from the
patient group and 45 from the control
group) out of the 3,642 (1.5%) were
removed from the data;
2) Repetitions ("rep"): the total number of
repeated words;
3) Total number of switches ("switch"):
computational equivalent of the "number
of switches between subcategories" in the
original work. Semantic switches were
identified based on measures of semantic
relatedness obtained from three semantic
spaces and according to five different
thresholds (Table 2);
4) Total number of semantic clusters
("NC"): computational equivalent of the
"number of subcategories" in the original
work. Clusters were identified based on
the occurrence of a semantic switch, i.e.,
when the mean value of cosine similarity
of words within a cluster drops below the
identified threshold (Table 2);
5)</p>
      <p>Mean size of clusters ("SC"): mean
number of words within a semantic cluster;
computational equivalent of the "relative
switching" index in the original work;
6) Average semantic proximity ("prox"), the
semantic distance between adjacent
words. Unlike the original index, based
on human-derived estimated of semantic
proximity (Reverberi et al., 2006), we
derived this index from the mean cosine
between the vectorial representation of
adjacent words in the participants' production.</p>
      <p>In addition, to ascertain the replicability
of original results with computational
methodologies, the following indexes were adapted
from the original work:
7)</p>
      <p>Mean familiarity ("fam"). As a
computational equivalent of the original index,
calculated according to familiarity scores
collected from a sample of healthy
controls (Reverberi et al., 2004), we
computed the raw word frequency as derived
from the corpus of reference (itWac),
converted to lower case and excluding
metadata;
8) Out-of-category words ("OOC" ): number
of words not pertaining to the 15
subcategories of "fruit" as identified in previous
works by the same Authors (Reverberi et
al., 2004; 2006). Given that the vectorial
representation of words differs according
to inflectional morphology, data were not
normalised (singular to plural) but kept as
originally produced;
9) Order Index ("OI" ): computed following
the formula proposed in Reverberi et al.,
2006. In its simplified notation, the Order
Index is equivalent to the difference
between the theoretical maximum number
of switches (total number of words minus
1) and the actual observed switches,
divided by the range of theoretically
possible switches (total number of words
minus 1, minus total number of clusters
minus 1). To avoid non-linearity problems,
the participant production is represented
in a three-dimensional space having
number of words, number of switches, and
number of subcategories as axes: the
order index is then transformed using the
arctangents of the resulting segments.
2.1</p>
      <sec id="sec-2-1">
        <title>Statistical Analyses</title>
        <p>All variables of interest were pre-processed to
remove variance due to differences in age, level
of education, and the total number of words. We
ran a linear regression analysis with the relevant
variable as the dependent factor and with age,
education, and the total number of words as
regressors (only considering healthy subjects to avoid
any potential bias in the estimates due to brain
damage). We then used the regression coefficients
to compute the residuals for each variable and all
subjects. Residuals were then used as predicting
variables for the classification analysis. The
average for each variable and each patient group was
compared with the respective average in the
control group through a two-sample t-test, Bonferroni
corrected.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Classification Analysis</title>
        <p>
          The R packages caret and e1071
          <xref ref-type="bibr" rid="ref3">(interfaces to
the LIBSVM by Chang &amp; Lin 2011)</xref>
          were used.
The aim of the classification analysis was to
determine: i) which variables, alone or in
combination, would be able to classify a subject as being
either a patient or control, and; ii) which variables,
alone or in combination, would best classify a
patient as being member of one of the three frontal
dementia group (FTD, PPA, SD).
        </p>
        <p>After removing variance due to differences
in age and education, we performed a
Leave-OneOut Cross-Validation (LOOCV) analysis. The
model kernels were set as linear, and relative
weights were added to counterbalance the
difference in group numerosity. In LOOCV, a data
instance is left out, and a model is constructed on all
other data instances in the training set. The model
is tested against the data point left out, and the
associated error is recorded. The process is then
repeated for all data points, and the overall
prediction error is calculated by taking the average of the
recorded test error estimates. The LOOCV
analysis was repeated for each combination of the 9
variables of interest, for each of the 3 semantic
spaces, and each of the 5 thresholds, resulting in
7,665 models.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <p>We compared the performance of each group to
that of healthy controls for each of the nine
variables considered. All pathological groups
significantly differed from the controls on at least one
variable (Table 3). In the classification analysis,
we investigated which variables (alone, or in all
the possible combinations with other variables,
i.e., 511 combinations) would best predict the
membership of participants. We carried out two
sets of analysis: i) healthy controls versus
participants with focal dementias (PPA, FTD, and SD);
and ii) participants with PPA versus participants
with FTD versus participants with SD. The
analysis was performed for each semantic space and for
each preidentified threshold for a total number of
7,665 models.</p>
      <p>Proximity
Familiarity
New words
Out-Of-Category
N Switches
N Cluster
Size Cluster
Order Index
Repetitions</p>
      <p>FTD
+
+
+
+
+
+</p>
      <p>PPA</p>
      <p>SD
+
+
+
+</p>
      <p>The best classification performances for
patients versus healthy controls was found when we
considered the variables "total number of new
words" and "Order Index" at any threshold and
with all semantic spaces. In these cases, the
overall accuracy of the models was 61.2%, with
sensitivity of 57.4% and specificity of 79.7% (Table 4).</p>
      <p>SS
all
all
all</p>
      <p>Thres.</p>
      <p>HumanBased
all
all
all</p>
      <p>Vars
NC + prox + new +</p>
      <p>OOC
New + OI</p>
      <p>New
OI</p>
      <p>OOC
Rep + new + OI</p>
      <p>Acc. Sens. Spec.
84
61.2
61.0
61.0
60.7
60.4
86
57.4
57.0
57.0
55.7
56.4
82
79.7
79.7
79.7
84.4
79.7</p>
      <p>The best classification performances for
patients in their specific pathology group was found
when we considered the variables "out of category
words", "average semantic proximity", and "size
of clusters" computed at the 3rd threshold (50th) of
the WEISS2 space (Table 5). In this case, the
overall max accuracy was 43.8%. Sensitivity and
specificity for each pathology group were: PPA =
87.5% and 62.5%; FTD = 36.4% and 71%; SD =
13.33% and 81.6%, respectively.
In this work, we replaced human-based measures
of semantic proximity with DSM-derived
measures of semantic proximity to compute a set
of indexes of VF that was found to be able to
classify with good accuracy people with and without
focal dementias based on their verbal production
to a semantic VF task (category "fruits", which
was originally adopted to limit the set of possible
items as compared to broader categories such as
“animals”). The objective of the study was to
assess the accuracy of Machine Learning (ML)
models based on DSM measures of semantic
information, in view of their possible extension to
words and semantic categories for whom the
measure of semantic proximity is not available.
Despite being above chance in both cases, ML
models based on DSM-derived measures of
semantic proximity showed lower accuracy
compared to models built on human-based ratings.
This was true both for the classification of patients
versus controls (61.2% and 84%, respectively), as
well as for the subclassification of diagnosis
(43.8% and 58%, respectively).</p>
      <p>The observed differences might be due to the
functional adaptations needed to transpose the
original VF indexes to DSM-derived measures.
For example, the computational equivalent of the
"familiarity" index, calculated according to
familiarity scores collected from the sample of healthy
controls, was approximated via the raw word
frequency as derived from the corpus of reference.
Moreover, given that the vectorial representation
of words differs according to inflectional
morphology, data were not normalised (singular to
plural) but kept as originally produced, unlike the
original work. Hence, it might be possible that
these operations introduced some distortions that
could explain the differences observed compared
to the original study.</p>
      <p>
        In terms of parameter setting, it is worth noting
that our choices might have affect the overall
performance of the adopted models, possibly
reducing their ability to avoid noise and biases. For
example, according to
        <xref ref-type="bibr" rid="ref20">Tripodi (2017)</xref>
        ,
hyperparameter setting for Italian has specific requirements in
terms of vector size, negative sampling,
vocabulary threshold cutting, to maximize performance
in an analogy task (although to what extent such
recommendation can be extended to VF is an
empirical question that remains to be addressed).
Also, the choice of a CBOW model, instead of
“more predictive” algorithms such as Skipgram
and Mask might have reduced the ability of the
model to mimic the human ratings of word
associations.
      </p>
      <p>However, a different explanation might be
related to the type of information encoded into the
human proximity ratings. Given its evolutionary
relevance, the neural substrate underpinning the
notion of "fruits" might encode a rich
multidimensional semantic characterisation (including
sensory information such as taste, smell, sight,
touch). As such, the representation of this
semantic category might not be simply derivable by the
lexical distribution of its items in a corpus.
Differently, other semantic categories might leverage on
less perceptual and more encyclopaedic semantic
knowledge, such as, for example, the category
"animals", another semantic cue widely used for
the assessment of VF. Indeed, while people do
generally have first-hand, real-life experience of
"fruits", knowledge about "animals" may be more
commonly derived from indirect exposure to
encyclopaedic information (i.e., the media). In other
words, when we think about a cherry, we may not
only recall the meaning of the lemma as compared
to, for example, an apple, but at the same time, we
might also recall the sensory information attached
to the drupe (round, red, juicy, etc.). Conversely,
apart from common pets, it is unlikely that
participants have first-hand experience about most of
the items commonly included "animals" category
(e.g., "lion", “whale”, etc.).</p>
      <p>This means that distributional models might be
not the best-suited tool to resolve semantic
problems when the semantic task under investigation
makes use of a subset of words pertaining to a
semantic category perceptually rich (such as that of
“fruits”).
5</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusions and Future Works</title>
      <p>
        The past decades have witnessed an increasing
interest towards the application of NLP
techniques to answer, or support the resolution of,
different clinical problems, from patients’
classifications to disease monitoring, and from differential
diagnosis to prediction of treatment response
        <xref ref-type="bibr" rid="ref5">(see
de Boer et al., 2018 for a comprehensive review)</xref>
        .
All these applications implicitly rely on the
assumption that these techniques are
agnostic/transparent to the semantic task under investigation
and, given the good results obtained, that they are
equipped with sufficiently rich semantic
information to solve any kind of task based on
linguistic data. Our findings challenge this idea and align
with previous works pointing to a lack of basic
features of perceptual meaning in DSM
        <xref ref-type="bibr" rid="ref12 ref14 ref20">(Lucy and
Gauthier, 2017)</xref>
        .
      </p>
      <p>Implications for the application of
DSM-derived measures to clinical work and research
indicate that the choice of the verbal task and the
associated DSM can affect the results. For this
reason, we plan to assess the classification accuracy
of ML models built both on human ratings and
DSM-derived measures of semantic proximity for
other categorical VF tasks, as well as adopting
word vectors derived from lemmatised corpora.</p>
      <p>
        Before moving to more recent language models
such as the last generation of deep neural language
models like BERT
        <xref ref-type="bibr" rid="ref6">(Devlin et al., 2019)</xref>
        ,
consideration should be given to the trade-off between
computational and data resources needed to train
them
        <xref ref-type="bibr" rid="ref2">(Bender et al., 2021)</xref>
        on one hand, and what
kind of added value they can give compared to
traditional “static” embeddings
        <xref ref-type="bibr" rid="ref10">(Lenci et al., 2021)</xref>
        on the other. Further research might address the
limits of current DSM models by enriching the
information encoded, integrating experiential and
distributional data to induce reliable semantic
representations (Andrews et al., 2009). Additional
sources of multimodal information (e.g., Lynnott
et al., 2020) including visual and audio
information, might help overcome these current
limitations
        <xref ref-type="bibr" rid="ref4">(Chen et al., 2021)</xref>
        .
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Baroni</given-names>
            <surname>Marco</surname>
          </string-name>
          , Bernardini Silvia, Ferraresi Adriano and
          <string-name>
            <given-names>Zanchetta</given-names>
            <surname>Eros</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>The waCky wide web: A collection of very large linguistically processed web-crawled corpora</article-title>
          .
          <source>Language Resources and Evaluation</source>
          ,
          <volume>43</volume>
          (
          <issue>3</issue>
          ):
          <fpage>209</fpage>
          -
          <lpage>226</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Bender Emily</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gebru</surname>
            <given-names>Timnit</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McMillan-Major</surname>
            <given-names>Angelina</given-names>
          </string-name>
          &amp; Shmitchell
          <string-name>
            <surname>Shmargaret</surname>
          </string-name>
          .
          <year>2021</year>
          .
          <article-title>On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?</article-title>
          .
          <source>In Proceedings of the 2021 ACM Conference on Fairness, Accountability</source>
          , and Transparency:
          <fpage>610</fpage>
          -
          <lpage>623</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Chang</given-names>
            <surname>Chih-Chung</surname>
          </string-name>
          and
          <string-name>
            <given-names>Lin</given-names>
            <surname>Chih-Jen</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>LIBSVM: a library for support vector machines</article-title>
          .
          <source>ACM transactions on intelligent systems and technology (TIST)</source>
          ,
          <volume>2</volume>
          (
          <issue>3</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>27</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Chen</given-names>
            <surname>Wei</surname>
          </string-name>
          , Wang Weiping,
          <string-name>
            <given-names>Liu</given-names>
            <surname>Li</surname>
          </string-name>
          and
          <string-name>
            <surname>Lew Micheal S.</surname>
          </string-name>
          <year>2021</year>
          .
          <article-title>New ideas and trends in deep multimodal content understanding: A review</article-title>
          .
          <source>Neurocomputing</source>
          ,
          <volume>426</volume>
          :
          <fpage>195</fpage>
          -
          <lpage>215</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>De Boer Jann N</surname>
          </string-name>
          .,
          <string-name>
            <surname>Voppel</surname>
          </string-name>
          Alban E.,
          <string-name>
            <surname>Begemann</surname>
            <given-names>Marieke J.H.</given-names>
          </string-name>
          , Schnack Hugo G.,
          <article-title>Wijnen Frank</article-title>
          and
          <string-name>
            <surname>Sommer Iris E.C.</surname>
          </string-name>
          <year>2018</year>
          .
          <article-title>Clinical use of semantic space models in psychiatry and neurology: a systematic review and meta-analysis</article-title>
          .
          <source>Neuroscience &amp; Biobehavioral Reviews</source>
          ,
          <volume>93</volume>
          :
          <fpage>85</fpage>
          -
          <lpage>92</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Devlin</given-names>
            <surname>Jacob</surname>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            <given-names>MW</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            <given-names>K.</given-names>
          </string-name>
          <year>2019</year>
          BERT:
          <article-title>Pre-training of Deep Bidirectional Transformers for Language Understanding</article-title>
          .
          <source>In: Proceedings of NAACLHLT</source>
          <year>2019</year>
          ,
          <volume>4171</volume>
          -
          <fpage>4186</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Dinu</given-names>
            <surname>Georgiana</surname>
          </string-name>
          and
          <string-name>
            <given-names>Baroni</given-names>
            <surname>Marco</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Dissect-distributional semantics composition toolkit</article-title>
          .
          <source>In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations</source>
          ,
          <fpage>31</fpage>
          -
          <lpage>36</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Günther</given-names>
            <surname>Fritz</surname>
          </string-name>
          , Dudschig Caroline and
          <string-name>
            <given-names>Kaup</given-names>
            <surname>Barbara</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Latent semantic analysis cosines as a cognitive similarity measure: Evidence from priming studies</article-title>
          .
          <source>Quarterly Journal of Experimental Psychology</source>
          ,
          <volume>69</volume>
          (
          <issue>4</issue>
          ):
          <fpage>626</fpage>
          -
          <lpage>653</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Landauer</given-names>
            <surname>Thomas</surname>
          </string-name>
          and
          <string-name>
            <given-names>Dumais</given-names>
            <surname>Susan</surname>
          </string-name>
          .
          <year>1997</year>
          .
          <article-title>A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge</article-title>
          .
          <source>Psychological review</source>
          ,
          <volume>104</volume>
          (
          <issue>2</issue>
          ),
          <fpage>211</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Lenci</given-names>
            <surname>Alessandro</surname>
          </string-name>
          , Sahlgren Magnus, Jeuniaux Patrick,
          <source>Gyllensten Amaru Cuba and Miliani Martina</source>
          <year>2021</year>
          .
          <article-title>A comprehensive comparative evaluation and analysis of Distributional Semantic Models</article-title>
          .
          <source>arXiv preprint arXiv:2105</source>
          .
          <fpage>09825</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Lezak</given-names>
            <surname>Muriel</surname>
          </string-name>
          , Howieson Diane, Loring David,
          <string-name>
            <given-names>Hannay</given-names>
            <surname>Julia</surname>
          </string-name>
          and
          <string-name>
            <given-names>Fischer</given-names>
            <surname>Jill</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Neuropsychological assessment</article-title>
          . New York: OUP, USA.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Lucy</given-names>
            <surname>Li</surname>
          </string-name>
          and
          <string-name>
            <given-names>Gauthier</given-names>
            <surname>Jon</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Are Distributional Representations Ready for the Real World? Evaluating Word Vectors for Grounded Perceptual Meaning</article-title>
          .
          <source>Proceedings of the First Workshop on Language Grounding for Robotics.</source>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Lynott</given-names>
            <surname>Dermot</surname>
          </string-name>
          , Connell Louise, Brysbaert Marc, Brand James and
          <string-name>
            <given-names>Carney</given-names>
            <surname>James</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>The Lancaster Sensorimotor Norms: multidimensional measures of perceptual and action strength for 40,000 English words</article-title>
          .
          <source>Behavior Research Methods</source>
          ,
          <volume>52</volume>
          (
          <issue>3</issue>
          ),
          <fpage>1271</fpage>
          -
          <lpage>1291</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Mandera</given-names>
            <surname>Paul</surname>
          </string-name>
          , Keuleers Emmanuel and
          <string-name>
            <given-names>Brysbaert</given-names>
            <surname>Marc</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Explaining human performance in psycholinguistic tasks with models of semantic similarity based on prediction and counting: A review and empirical validation</article-title>
          .
          <source>Journal of Memory and Language</source>
          ,
          <volume>92</volume>
          ,
          <fpage>57</fpage>
          -
          <lpage>78</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Marelli</given-names>
            <surname>Marco</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Word-embeddings Italian Semantic spaces: A semantic model for psycholinguistic research</article-title>
          . Psihologija,
          <volume>50</volume>
          (
          <issue>4</issue>
          ):
          <fpage>503</fpage>
          -
          <lpage>520</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Mikolov</given-names>
            <surname>Tomas</surname>
          </string-name>
          , Sutskever Ilya, Chen Kai,
          <source>Corrado Greg and Dean Jeffrey</source>
          .
          <year>2013</year>
          .
          <article-title>Distributed Representations of Words and Phrases and their Compositionality</article-title>
          . Retrieved from http://arxiv.org/abs/1310.4546
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Niwa</given-names>
            <surname>Yoshiki</surname>
          </string-name>
          and
          <string-name>
            <given-names>Nitta</given-names>
            <surname>Yoshihiko</surname>
          </string-name>
          .
          <year>1995</year>
          .
          <article-title>Co-occurrence vectors from corpora vs. distance vectors from dictionaries</article-title>
          .
          <source>arXiv preprint cmp-lg/9503025</source>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>R</given-names>
            <surname>CoreTeam</surname>
          </string-name>
          .
          <year>2021</year>
          .
          <article-title>R: A language and environment for statistical computing. R Foundation for Statistical Computing</article-title>
          . Retrieved from https://www.r-project.
          <source>org.</source>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <given-names>Reverberi</given-names>
            <surname>Carlo</surname>
          </string-name>
          , Cherubini Paolo, Baldinelli Sara and
          <string-name>
            <given-names>Luzzi</given-names>
            <surname>Simona</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Semantic fluency: Cognitive basis and diagnostic performance in focal dementias and Alzheimer's disease</article-title>
          .
          <source>Cortex</source>
          ,
          <volume>54</volume>
          ,
          <fpage>150</fpage>
          -
          <lpage>164</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <given-names>Tripodi</given-names>
            <surname>Rocco and Pira Stefano Li</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Analysis of Italian word embeddings</article-title>
          .
          <source>arXiv preprint arXiv:1707</source>
          .
          <fpage>08783</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>Turney Peter D.</surname>
            and
            <given-names>Pantel</given-names>
          </string-name>
          <string-name>
            <surname>Patrick</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>From frequency to meaning: Vector space models of semantics</article-title>
          .
          <source>Journal of artificial intelligence research</source>
          ,
          <volume>37</volume>
          ,
          <fpage>141</fpage>
          -
          <lpage>188</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>