=Paper= {{Paper |id=Vol-2765/160 |storemode=property |title=CONcreTEXT @ EVALITA2020: The Concreteness in Context Task |pdfUrl=https://ceur-ws.org/Vol-2765/paper160.pdf |volume=Vol-2765 |authors=Lorenzo Gregori,Maria Montefinese,Daniele P. Radicioni,Andrea Amelio Ravelli,Rossella Varvara |dblpUrl=https://dblp.org/rec/conf/evalita/GregoriMRRV20 }} ==CONcreTEXT @ EVALITA2020: The Concreteness in Context Task== https://ceur-ws.org/Vol-2765/paper160.pdf

CON CRE TEXT @ EVALITA2020:
The Concreteness in Context Task

Lorenzo Gregori Maria Montefinese Daniele P. Radicioni
University of Florence University of Padua University of Turin
lorenzo.gregori@unifi.it maria.montefinese@unipd.it daniele.radicioni@unito.it

Andrea Amelio Ravelli Rossella Varvara
Istituto di Linguistica Computazionale University of Florence
“Antonio Zampolli” (ILC–CNR) - ItaliaNLP Lab rossella.varvara@unifi.it
andreaamelio.ravelli@ilc.cnr.it

Abstract the senses; abstract concepts lie on the opposite
side of the scale and are grounded in the inter-
Focus of the CON CRE TEXT task is con- nal sensory experience and linguistic information.
ceptual concreteness: systems were so- While concrete concepts have direct sensory ref-
licited to compute a value expressing to erents (Crutch and Warrington, 2005) and greater
what extent target concepts are concrete availability of contextual information (Connell et
(i.e., more or less perceptually salient) al., 2018; Kousta et al., 2011; Montefinese et al.,
within a given context of occurrence. To 2020), abstract concepts tend to be more emotion-
these ends, we have developed a new ally valenced (Kousta et al., 2011) and less image-
dataset which was annotated with con- able (Montefinese et al., 2020; Garbarini et al.,
creteness ratings and used as gold standard 2020).
in the evaluation of systems. Four teams The CON CRE TEXT task challenges partici-
participated in this first edition of the task, pants to build NLP systems to automatically as-
with a total of 15 runs submitted. sign a concreteness value to words in context. It is
Interestingly, these works extend infor- aimed at investigating how the concreteness infor-
mation on conceptual concreteness avail- mation affects sense selection: different from past
able in existing (non contextual) norms research (Brysbaert et al., 2014b; Montefinese et
derived from human judgments with new al., 2014), we are interested in assessing the con-
knowledge from recently developed neu- creteness of concepts within the context of real
ral architectures, in much the same multi- sentences rather than in isolation. Additionally,
disciplinary spirit whereby the CON CRE - the concreteness score is assumed to be a property
TEXT task was organized. of meanings rather than a property of word forms;
thus, scoring the concreteness of a concept in con-
1 Introduction text implicitly requires to individuate its underly-
Concept concreteness – that is, how directly a con- ing sense, by handling lexical phenomena such as
cept is related to sensorial experience (Brysbaert polysemy and homonymy.
et al., 2014a)– is a fundamental dimension of con- Ordinary experience suggests that concepts’
ceptual semantic representation that has attracted concrete/abstract status can affect their semantic
more and more interest and attention in psycholin- representation, and lexical access and processing:
guistics in the last decade. This dimension is usu- concrete meanings are acknowledged to be more
ally assessed by participants ratings on a Likert quickly and easily delivered in human commu-
scale: concrete concepts lie herein on one side of nication than abstract meanings (Bambini et al.,
the scale and refer to something that exists in re- 2014). Historically, it has been observed that con-
ality and can be experienced immediately through crete concepts are responded to more quickly than
abstract concepts in lexical decision tasks (Bleas-
Copyright c 2020 for this paper by its authors. Use
permitted under Creative Commons License Attribution 4.0 dale, 1987; Kroll and Merves, 1986), although
International (CC BY 4.0). more recent experiments have shown that abstract
concepts might have an advantage when other based approaches, and more recent language mod-
variables have been accounted for (Kousta et al., els and sequence-to-sequence models. Finally,
2011). Concrete concepts are also easier to encode like in many real-world cases, the provided trial
and retrieve than abstract concepts (Romani et al., data is rather scarce, in the order of hundred sen-
2008; Miller and Roodenrys, 2009), are easier to tences for the Italian language, and as many for
make associations with (de Groot, 1989), and are English. This aspect forced our participants to
more thoroughly described in definition tasks (Sa- face something similar to a ‘cold start’ problem.
doski et al., 1997). Moreover, it takes generally We hope that this edition of the CON CRE TEXT
less time to comprehend a concrete sentence than task will be the first appointment in a series for
an abstract one (Haberlandt and Graesser, 1985; those who are interested in the issues posed by the
Schwanenflugel and Shoben, 1983). Thus, it has contextual conceptual concreteness to research on
been proposed that different organizational princi- natural language semantics.
ples govern semantic representations of concrete
and abstract concepts: concrete concepts are pre-
2 Task Definition
dominantly organized by featural similarity mea-
sures, and abstract concepts by associative rela-
tions, co-occurrence patterns and syntactic infor- The task CON CRE TEXT (so dubbed after CON-
mation (Vigliocco et al., 2009). creteness in conTEXT) focuses on automatic con-
creteness (and conversely, abstractness) recogni-
All surveyed features make aspects ingrained in tion. Given a sentence along with a target word,
the distinction between concreteness/abstractness we asked participants to propose a system able
a stimulating and challenging field also for com- to assess the concreteness of a concept expressed
putational linguistics. Among the earliest attempts by a given word within a sentence, on a 7-point
at grasping concreteness, we find works that in- Likert-like scale where 1 stands for completely ab-
vestigated on concreteness/abstractness informa- stract (e.g., ‘freedom’) and 7 for completely con-
tion in its interplay with metaphor identification crete (e.g., ‘car’). For example, in the sentence
and figurative language more in general (Tur- “In summer, wheat fields are coloured in yellow”
ney et al., 2011) (and, more recently (Mensa the noun field refers to an entity that can smell, be
et al., 2018b)). Although concreteness infor- touched, and pointed to. In this case, in a scale
mation is acknowledged to be central to, e.g., ranging from 1 to 7 its concreteness may be evalu-
word-sense induction and compositionality mod- ated as 7, because it refers to an extremely con-
eling (Hill et al., 2013), the contribution of con- crete concept. In contrast, the same noun field
creteness/abstractness to semantic representations in the sentence “Physics is Alice’s research field”
is not fully grasped and exploited in existing ap- refers to a scientific subject, i.e., something that
proaches and resources, with the notable excep- cannot be perceived through the five senses, but
tion of works aimed i) at learning multimodal em- that can be explained through a linguistic descrip-
beddings, and how abstract and concrete repre- tion. In this sentence, the noun field may be eval-
sentations can be acquired by multi-modal mod- uated 1 because it refers to an extremely abstract
els (Hill and Korhonen, 2014); and ii) at exploring concept. Moreover, the task targets can be halfway
in how far concreteness information is represented between completely abstract and completely con-
in the distributional patterns in corpora (Hill et crete, as in the case of “Magnetic field attracts
al., 2013). Moreover, some approaches exist that iron”, where the noun field refers to something
attempted to create lexical resources by also em- more abstract compared to “wheat fields” but more
ploying common-sense information (Mensa et al., concrete compared to “research field”. As antic-
2018a; Colla et al., 2018). ipated, the concreteness score being assigned to
Characterizing tokens within sentences with the word should be evaluated in context: the word
their concreteness requires integrating both word- should not be considered in isolation, but as part
specific and contextual information. In our view, of a given sentence.
the CON CRE TEXT Task entails dealing with a Participants were invited to exploit all possible
relaxed form of word sense disambiguation; such strategies to solve the task, including (but not lim-
aspects were faced by our participants by devising ited to) knowledge bases, external training data,
methods relying on both traditional knowledge- word embeddings, etc.
Table 1: Basic statistics on the CON CRE TEXT
GOLD−EN GOLD−IT
dataset used as gold standard.

0.30

0.30
Italian English

0.20

0.20
Unique Verb targets 52 44
Unique Noun targets 96 73

0.10

0.10
Num. Sentences 550 534
Num. Sentences Verb target 189 210

0.00

0.00
Num. Sentences Noun target 361 324
1 2 3 4 5 6 7 1 2 3 4 5 6 7
Avg. sent. length 14.43 14.33
Avg. sent. length (no punct) 13.03 12.87
Avg. full words per sent. 7.14 7.15 (a) English dataset. (b) Italian dataset.
Num. Annotators 333 310
Human ratings (HR) 18,726 16,522 Figure 1: Distribution of human ratings for the En-
Min HR per sentence 30 30 glish and Italian datasets.

3 Dataset can be used across the entire Italian – and English
The dataset used for this task has been taken from – speaking populations.
the English-Italian parallel section of The Human The dataset has been split into trial and test data,
Instruction Dataset (Chocron and Pareti, 2018), with a 20–80 ratio. Trial data has been released
derived from WikiHow instructions.1 All such with the concreteness scores, while the test data
documents had been anonymized beforehand, so has been provided at the beginning of the evalua-
that downloaded data present no privacy nor data tion window without any score.2
sensitivity issues.
4 Evaluation Measures and Baselines
The dataset is composed of overall 1, 096 sen-
tences, arranged as follows: 562 Italian sentences We chose the Spearman correlation indices as our
plus 534 English sentences. Each sentence con- main evaluation measure; for the sake of com-
tains a target term (either verb or noun) with its pleteness, we also report Pearson indices (substan-
associated concreteness score (1–7 scale). Such tially in accord with the previous metrics). We
score is derived from the average of at least 30 chose the former measure because the collected
human judgments from native Italian and English ratings are not normally distributed, which makes
speakers about the concreteness of a target word in the Spearman correlation more suited to the data.
a given sentence (see Table 1 for the dataset num- In fact, by running the Shapiro–Wilk test we ob-
bers). tained a p-value < 0.001. The non normal distri-
The reliability of the collected data within bution of data is also confirmed by the plot of the
each language (Italian, English) for the trial and gold standard ratings, as illustrated in Figure 1.
test phases was evaluated separately by apply- Two baselines have been designed for this task.
ing the split-half correlations corrected with the
Spearman-Brown formula after randomly divid- Baseline One. The first baseline for the Italian
ing the participants into two subgroups of equal language is derived as follows. The fastText word
size. All the reliability indexes were calculated embeddings have been acquired beforehand by
on 10, 000 different randomizations of the partic- training the model on the Italian dump of the Wik-
ipants. The mean correlations between the two iHow instructions. We chose fastText for its sup-
groups are very high for both the trial and test port to the handling of OOV terms (Bojanowski et
phases, ranging from a minimum of r = 0.87 al., 2017), which is a crucial feature in the present
for English (at the test phase) to a maximum of setting. The cited norms by Montefinese et al.
r = 0.98 for Italian (at the trial phase), showing (2014) (referred to as ‘the norms’ hereafter) have
that the resulting ratings are highly reliable and been used herein. The average score of terms in
each input sentence S = {t1 , t2 , . . . tK } has been
1
The whole Human Instruction Dataset
2
dataset is freely available on Kaggle, The dataset employed in the CON CRE TEXT task is
https://www.kaggle.com/paolop/ available at the URL https://lablita.github.io/
human-instructions-multilingual-wikihow CONcreTEXT/.
computed by scrolling through the content words 5.1 A NDI
of the sentence. Each term t is searched in the
The A NDI team (Rotaru, 2020) proposed a system
norms: if the term is found, the associated con-
based on multiple classes of concreteness score
creteness score c(t) is returned; otherwise, if the
predictors. The first class of predictors has been
term is not present in the norms, the ranking of
derived from large datasets of behavioral norms,
the l (l = 20, 000) elements most similar to t is
collected for a wide variety of psycholinguistic
generated through fastText. In this case, we scan
factors. Beside well known concreteness norms,
the whole norms list and employ the concreteness
A NDI takes into account also semantic diversity,
score of the element in the norms closest to those
age of acquisition, emotional and sensori-motor
in the fastText ranking. In either case we obtain
dimensions, as well as frequency and contextual
a score for each and every term in the input sen-
diversity counts. The vocabulary resulting from
tence, so that the concreteness score of the target
the merging of these words collections comprises
token t̂ is computed as the averaged score of the
more than 70K words, and it is the base vocabu-
terms in the input sentence:
lary used to extract all the predictors. The second
K
1 X class of predictors has been derived from context-
c(t̂) = · c(ti ). independent distributional models, namely Skip-
K
i=1
gram, GloVe, and NumberBatch embeddings, as
The first baseline for the English language is well as from the concatenation of the three. The
analogous to the Italian one, except for the fact that third class of predictors has been derived from fea-
the English tokens from the norms are accessed in tures obtained through recent transformers mod-
this case. The same strategy governs the handling els, i.e. context-dependent representations. The
of the fastText resource, that in this case has been models exploited are: BERT, GPT-2, Bart, and
trained on the English dump of the Human Instruc- ALBERT. The final rating has been computed
tion Dataset. through a ridge regression over the three classes.
Baseline Two. The second baseline for the Ital-
ian language implements a simple lookup func- 5.2 C APISCO
tion. More specifically, input sentences have been The C APISCO Team (Bondielli et al., 2020) sub-
translated into English through the Google Trans- mitted 3 systems for both Italian and English.
late ajax API implementation, and then the con-
creteness scores associated to the terms in the N ON -C APISCO. The first system computes a
norms by Brysbaert et al. (2014b) are retrieved variation of the Baseline Two; that is, the target
(in the unlikely case the term is not found, it is concreteness is obtained by combining the con-
dropped, thus not contributing to the final score). creteness value of the target term (taken in isola-
The concreteness score of the target term is thus tion), and the average concreteness of the whole
assigned to the average concreteness of terms in sentence. Improvement from baseline comes from
the given input sentence. The baseline two for the considering differently the weight of the concrete-
English language employs the concreteness score ness of the target term and of the context.
—by also employing the norms by Brysbaert et
al. (2014b)— associated to all terms in the input C APISCO -C ENTROIDS. This system is based
sentence, finally assigning to the target token the on the assumption that close semantic spaces are
average concreteness score for the whole sentence. featured by similar concreteness scores. In this
case the authors first build two centroids, one for
5 Systems Descriptions
concrete and one for abstract concepts based on
In this Section we briefly describe the systems that the norms by Brysbaert et al. (2014b) and Della
participated in the competition. As a first edition, Rosa et al. (2010), by employing fastText pre-
the CON CRE TEXT task recorded a good feed- trained embeddings. The concreteness score of a
back from the community, with 4 teams, overall term is then computed by averaging the distance of
7 participants and 15 submitted system runs. In the first 50 lexical substitutes of the target (identi-
the next Section we report the results obtained by fied through BERT) from the two polarized cen-
all such systems, while anonymizing a withdrawn troids. Introducing a list of target substitutes in a
participant. given context is thus the gist of this approach.
C APISCO -T RANSFORMERS. In this variant, Table 2: Results for each run on English test set.
the C APISCO team fine-tuned a pre-trained BERT System run Spear Pears Eucl.D
model on the concreteness rating task, by com- A NDI 0.833 0.834 15.409
plementing the CON CRE TEXT training data with N ON -C APISCO 0.785 0.787 35.663
newly generated training data. The new data gen- KON K RETI K A 3 0.663 0.668 28.613
eration is twofold: for each original sentence, new KON K RETI K A 1 0.651 0.667 29.933
sentences are generated by replacing the target Baseline 2 0.554 0.567 38.451
term with the first lexical substitutes derived with KON K RETI K A 4 0.542 0.545 29.836
BERT target masking approach. Then, more sen- C APISCO C ENTR 0.542 0.538 48.864
tences are borrowed from Italian and English ref- KON K RETI K A 2 0.541 0.545 30.322
erence corpora.
C APISCO T RANS 0.504 0.501 29.927
5.3 KON K RETI K A Baseline 1 0.382 0.377 31.738
withdrawn run3 -0.013 0.067 41.109
The KON K RETI K A team (Badryzlova, 2020) pre- withdrawn run1 -0.124 -0.123 44.068
sented a system that first assigns a concreteness withdrawn run2 -0.127 -0.129 43.890
and an abstractness score to the target lemma, and
then it adjusts these values based on the surround- Table 3: Results for each run on Italian test set.
ing context. In the first step, the system computes System run Spear Pears Eucl.D
semantic similarity between the target vectors and A NDI 0.749 0.749 19.950
a “seed list” consisting of abstract and concrete C APISCO T RANS 0.625 0.617 24.367
words (extracted from the MRC Psycholinguistic C APISCO C ENTR 0.615 0.609 28.608
Database). In the second step, the values where N ON -C APISCO 0.557 0.557 31.588
adjusted to the sentential context considering the Baseline 2 0.534 0.522 40.114
mean concreteness index of the entire sentence. Baseline 1 0.346 0.368 31.046
The team submitted 4 runs based on a heuristically
selected coefficient.
substantially confirm the results: for the results on
6 Results English (Table 2) it is minimal for the output of
the A NDI system, and it increases while Spearman
Four teams participated in the CON CRE TEXT correlation values decrease. The same trend is also
competition: A NDI, C APISCO, KON K RETI K A, confirmed on Italian results (Table 3).
and a withdrawn team. A NDI and C APISCO de-
Tables 6 and 7 report disaggregated Spearman
veloped a system for both languages (English and
correlations for verbs and nouns. This allows
Italian), while KON K RETI K A participated in the
to highlight if and to what extent the participat-
English track only, and the same did the with-
ing systems obtained better results on either POS.
drawn participant. Each team was allowed to sub-
A NDI obtained the best results on both verbs and
mit the output of up to 4 system runs; the final
nouns in both languages. This system (and N ON -
ranking has been compiled based on the results of
C APISCO as well) obtained analogous results on
the best run.
verbs and nouns. On the whole, the rest of the
In Tables 2 and 3 we present the score of each systems obtained results clearly better on English
run for the English and Italian language, respec- verbs and slightly better on Italian nouns. In par-
tively. Although, as mentioned, the Spearman in- ticular, KON K RETI K A (English only) is strongly
dices were adopted as our main evaluation metrics, biased on verbs: its performances on verbs are
we also report Pearson correlation indices and Eu- higher in all 4 runs. C APISCO systems exhibit the
clidean distance, that may be useful to complete most varied behavior.
the assessment of the results. The final ranking is
provided in Tables 4 and 5. 7 Discussion
We can observe a substantial agreement be-
tween Spearman and Pearson indices: the aver- The obtained results confirm transformers as a
aged delta between such figures amounts to 0.012 good device to compute concreteness score for
and to 0.008 on the English and Italian dataset, re- words in context. The virtues of transform-
spectively. Also the Euclidean distance seems to ers in grasping contextual information are largely
Table 4: Final ranking on English test set. ness score of a word in context is a complex task,
Team Spear Pears Eucl.D involving different semantic, cognitive and expe-
A NDI 0.833 0.834 15.409 riential levels.
CAPISCO 0.785 0.787 35.663 The high correlation obtained by the N ON -
KON K RETI K A 0.663 0.668 28.613 C APISCO in the English task is somehow surpris-
withdrawn -0.013 0.067 41.109 ing, since this system makes use only of the mean
concreteness of the sentence (computed from ex-
Table 5: Final ranking on Italian test set.
isting norms) as contextual information. This re-
Team Spear Pears Eucl.D sult is thus related to the availability of existing
A NDI 0.749 0.749 19.950 norms, but it shows that there is a link between
CAPISCO 0.625 0.617 24.367 the concreteness score of a target word in context
and the concreteness scores of the words it oc-
Table 6: Spearman rank differences between curs with. Further analysis are needed, but it sug-
nouns and verbs on English test set. gests that concrete interpretations of a target word
Spear.N Spear.V Diff are associated with concrete context words. Of
C APISCO T RANS 0.443 0.654 0.211 course, systems based exclusively on behavioral
KONKRETIKA 4 0.502 0.701 0.199 norms are strongly dependent on the coverage of
KONKRETIKA 2 0.502 0.683 0.181 the considered vocabulary. In fact, the N ON -
C APISCO C ENTR 0.478 0.659 0.181 C APISCO Italian performances (obtained exploit-
KONKRETIKA 3 0.629 0.762 0.133 ing a ∼ 1.2K vocabulary) are lower than all the
other systems, while on the English track it ranks
KONKRETIKA 1 0.611 0.741 0.13
second (using a ∼ 70K vocabulary).
A NDI 0.836 0.857 0.021
N ON -C APISCO 0.779 0.782 0.003
Table 7: Spearman rank differences between 8 Conclusions
nouns and verbs on Italian test set.
Spear.N Spear.V Diff We presented the results of the CON CRE TEXT
N ON -C APISCO 0.579 0.507 0.072 task at EVALITA 2020 (Basile et al., 2020).
C APISCO T RANS 0.607 0.667 0.060 The task challenges participants to build NLP
C APISCO C ENTR 0.625 0.591 0.034 systems to automatically assign a concreteness
A NDI 0.762 0.749 0.013 score to words in context, evaluating to what ex-
tent target concepts are concrete (i.e., more or
less perceptually salient) within a given context
known, but in the present setting we observe that of occurrence. A novel dataset was developed
their output can be further improved by integrat- for this task as a multilingual comparable cor-
ing behavioral information (this seems to be one pus composed of 550 Italian sentences and 534
major difference between the systems A NDI and English sentences, annotated with the concrete-
C APISCO -T RANSFORMERS). ness/abstractness rating of target nouns and verbs.
The most important output of this challenge is Three teams completed their participation to the
definitely the great performance of the A NDI sys- task, obtaining the following ranking: A NDI (Ro-
tem, that proves to be robust and reliable for the taru, 2020), C APISCO (Bondielli et al., 2020), and
considered task: the system obtains the best rank- KON K RETI K A (Badryzlova, 2020).
ing in both languages, a low deviation from the Future work will address the following steps.
gold standard and a substantial stability in process- First of all, we will improve our dataset by includ-
ing both verbs and nouns. Moreover, the proposed ing further languages, also from different language
system is ready to be applied in a multi-language families and under-resourced languages. Also the
environment, given that non-English sentences are set of considered targets should be expanded, to
automatically translated into English. The A NDI ensure a broader coverage to the dataset, and more
system exploits different kinds of available re- significant results (thanks to the larger experimen-
sources and works with local and contextual in- tal base) to its future users as well.
formation. This shows that deriving the concrete-
References Louise Connell, Dermot Lynott, and Briony Banks.
2018. Interoception: the forgotten modality in per-
Yulia Badryzlova. 2020. KON K RETI K A @ CON CRE - ceptual grounding of abstract and concrete concepts.
TEXT: Computing concreteness indexes with sig- Philosophical Transactions of the Royal Society B:
moid transformation and adjustment for context. In Biological Sciences, 373(1752):20170143.
Valerio Basile, Danilo Croce, Maria Di Maro, and
Lucia C. Passaro, editors, Proceedings of the 7th Sebastian J Crutch and Elizabeth K Warrington. 2005.
evaluation campaign of Natural Language Process- Abstract and concrete concepts have structurally
ing and Speech tools for Italian (EVALITA 2020), different representational frameworks. Brain,
Online. CEUR.org. 128(3):615–627.
Valentina Bambini, Donatella Resta, and Mirko
Annette M de Groot. 1989. Representational aspects
Grimaldi. 2014. A dataset of metaphors from
of word imageability and word frequency as as-
the italian literature: Exploring psycholinguistic
sessed through word association. Journal of Experi-
variables and the role of context. PloS one,
mental Psychology: Learning, Memory, and Cogni-
9(9):e105634.
tion, 15(5):824.
Valerio Basile, Danilo Croce, Maria Di Maro, and Lu-
cia C. Passaro. 2020. Evalita 2020: Overview Pasquale A Della Rosa, Eleonora Catricalà, Gabriella
of the 7th evaluation campaign of natural language Vigliocco, and Stefano F Cappa. 2010. Beyond the
processing and speech tools for italian. In Valerio abstract—concrete dichotomy: Mode of acquisition,
Basile, Danilo Croce, Maria Di Maro, and Lucia C. concreteness, imageability, familiarity, age of acqui-
Passaro, editors, Proceedings of Seventh Evalua- sition, context availability, and abstractness norms
tion Campaign of Natural Language Processing and for a set of 417 italian words. Behavior research
Speech Tools for Italian. Final Workshop (EVALITA methods, 42(4):1042–1048.
2020), Online. CEUR.org.
Francesca Garbarini, Fabrizio Calzavarini, Matteo Di-
Fraser A Bleasdale. 1987. Concreteness-dependent as- ano, Monica Biggio, Carola Barbero, Daniele P
sociative priming: Separate lexical organization for Radicioni, Giuliano Geminiani, Katiuscia Sacco,
concrete and abstract words. Journal of Experimen- and Diego Marconi. 2020. Imageability effect on
tal Psychology: Learning, Memory, and Cognition, the functional brain activity during a naming to def-
13(4):582. inition task. Neuropsychologia, 137:107275.

Piotr Bojanowski, Edouard Grave, Armand Joulin, and Karl F Haberlandt and Arthur C Graesser. 1985. Com-
Tomas Mikolov. 2017. Enriching word vectors with ponent processes in text comprehension and some of
subword information. their interactions. Journal of Experimental Psychol-
ogy: General, 114(3):357.
Alessandro Bondielli, Gianluca E. Lebani, Lucia C.
Passaro, and Alessandro Lenci. 2020. C APISCO @ Felix Hill and Anna Korhonen. 2014. Learning ab-
CON CRE TEXT: (Un)supervised Systems to Con- stract concept embeddings from multi-modal data:
textualize Concreteness with Norming Data. In Va- Since you probably can’t see what i mean. In Pro-
lerio Basile, Danilo Croce, Maria Di Maro, and Lu- ceedings of the 2014 Conference on Empirical Meth-
cia C. Passaro, editors, Proceedings of the 7th eval- ods in Natural Language Processing (EMNLP),
uation campaign of Natural Language Processing pages 255–265.
and Speech tools for Italian (EVALITA 2020), On-
line. CEUR.org. Felix Hill, Douwe Kiela, and Anna Korhonen. 2013.
Concreteness and corpora: A theoretical and prac-
Marc Brysbaert, Michaël Stevens, Simon De Deyne, tical study. In Proceedings of the Fourth Annual
Wouter Voorspoels, and Gert Storms. 2014a. Workshop on Cognitive Modeling and Computa-
Norms of age of acquisition and concreteness for tional Linguistics (CMCL), pages 75–83.
30,000 dutch words. Acta psychologica, 150:80–84.
Marc Brysbaert, Amy Beth Warriner, and Victor Ku- Stavroula-Thaleia Kousta, Gabriella Vigliocco,
perman. 2014b. Concreteness ratings for 40 thou- David P Vinson, Mark Andrews, and Elena
sand generally known english word lemmas. Behav- Del Campo. 2011. The representation of ab-
ior research methods, 46(3):904–911. stract words: why emotion matters. Journal of
Experimental Psychology: General, 140(1):14.
Paula Chocron and Paolo Pareti. 2018. Vocabulary
alignment for collaborative agents: a study with Judith F Kroll and Jill S Merves. 1986. Lexical access
real-world multilingual how-to instructions. In IJ- for concrete and abstract words. Journal of Experi-
CAI, pages 159–165. mental Psychology: Learning, Memory, and Cogni-
tion, 12(1):92.
D. Colla, E. Mensa, A. Porporato, and D.P. Radicioni.
2018. Conceptual Abstractness: From Nouns to Enrico Mensa, Aureliano Porporato, and Daniele P.
Verbs. In Proceedings of the Fifth Italian Confer- Radicioni. 2018a. Annotating concept abstractness
ence on Computational Linguistics (CLiC-it 2018), by common-sense knowledge. In Chiara Ghidini,
volume 2253. CEUR. Bernardo Magnini, Andrea Passerini, and Paolo
Traverso, editors, AI*IA 2018 – Advances in Arti-
ficial Intelligence, pages 415–428, Cham. Springer
International Publishing.
Enrico Mensa, Aureliano Porporato, and Daniele P.
Radicioni. 2018b. Grasping metaphors: Lexical
semantics in metaphor analysis. In Aldo Gangemi,
Anna Lisa Gentile, Andrea Giovanni Nuzzolese, Se-
bastian Rudolph, Maria Maleshkova, Heiko Paul-
heim, Jeff Z Pan, and Mehwish Alam, editors, The
Semantic Web: ESWC 2018 Satellite Events, pages
192–195, Cham. Springer International Publishing.
Leonie M Miller and Steven Roodenrys. 2009. The
interaction of word frequency and concreteness in
immediate serial recall. Memory & Cognition,
37(6):850–865.
Maria Montefinese, Ettore Ambrosini, Beth Fairfield,
and Nicola Mammarella. 2014. The adaptation of
the affective norms for english words (anew) for ital-
ian. Behavior research methods, 46(3):887–903.
Maria Montefinese, Ettore Ambrosini, Antonino
Visalli, and David Vinson. 2020. Catching the in-
tangible: a role for emotion? Behavioral and Brain
Sciences, 43.
Cristina Romani, Sheila Mcalpine, and Randi C Mar-
tin. 2008. Concreteness effects in different
tasks: Implications for models of short-term mem-
ory. Quarterly Journal of Experimental Psychology,
61(2):292–323.
Armand Rotaru. 2020. ANDI @ CON CRE TEXT:
Predicting concreteness in context for English and
Italian using distributional models and behavioural
norms. In Valerio Basile, Danilo Croce, Maria
Di Maro, and Lucia C. Passaro, editors, Proceedings
of the 7th evaluation campaign of Natural Language
Processing and Speech tools for Italian (EVALITA
2020), Online. CEUR.org.
Mark Sadoski, William A Kealy, Ernest T Goetz, and
Allan Paivio. 1997. Concreteness and imagery ef-
fects in the written composition of definitions. Jour-
nal of Educational Psychology, 89(3):518.
Paula J Schwanenflugel and Edward J Shoben. 1983.
Differential context effects in the comprehension of
abstract and concrete verbal materials. Journal of
Experimental Psychology: Learning, Memory, and
Cognition, 9(1):82.
Peter Turney, Yair Neuman, Dan Assaf, and Yohai Co-
hen. 2011. Literal and metaphorical sense identifi-
cation through concrete and abstract context. In Pro-
ceedings of the 2011 Conference on Empirical Meth-
ods in Natural Language Processing, pages 680–
690.
Gabriella Vigliocco, Lotte Meteyard, Mark Andrews,
and Stavroula Kousta. 2009. Toward a theory of
semantic representation. Language and Cognition,
1(2):219–247.