=Paper=
{{Paper
|id=Vol-2624/paper8
|storemode=property
|title=Predicting the Concreteness of German Words
|pdfUrl=https://ceur-ws.org/Vol-2624/paper8.pdf
|volume=Vol-2624
|authors=Jean Charbonnier,Christian Wartena
|dblpUrl=https://dblp.org/rec/conf/swisstext/CharbonnierW20
}}
==Predicting the Concreteness of German Words==
Predicting the Concreteness of German Words
Jean Charbonnier and Christian Wartena
Hochschule Hannover
Expo Plaza 12, 30539 Hannover, Germany
{jean.charbonnier, christian.wartena}@hs-hannover.de
Abstract tity, but found that subjects largely rated the haptic
and visual experiences, even if they were explicitly
Concreteness of words has been measured asked to take into account experiences involving
and used in psycholinguistics already for any senses.
decades. Recently, it is also used in re- Concreteness seems to play an important role in
trieval and NLP tasks. For English a num- human language processing (Borghi et al., 2017).
ber of well known datasets has been estab- Concreteness also has been used for various compu-
lished with average values for perceived tational linguistic tasks like detection of metaphors
concreteness. We give an overview of and non-literal language (Turney et al., 2011; Hill
available datasets for German, their cor- and Korhonen, 2014; Frassinelli and Schulte im
relation and evaluate prediction algorithms Walde, 2019), lexical simplification (Jauhar and
for concreteness of German words. We Specia, 2012), multimodal retrieval (Hessel et al.,
show that these algorithms achieve similar 2018) or estimating the stability of word embed-
results as for English datasets. Moreover, dings (Pierrejean and Tanguy, 2019).
we show for all datasets there are no sig-
Traditionally, word norms are obtained by asking
nificant differences between a prediction
subjects to estimate the value for each property on
model based on a regression model using
a Likert scale. Recently, also various approaches
word embeddings as features and a predic-
have been proposed to predict the concreteness of
tion algorithm based on word similarity
words. On three different datasets we will test two
according to the same embeddings.
algorithms that have given very good results for En-
1 Motivation glish data and compare the results in section 4 after
we have discussed the most common approaches
A number of properties of words, mainly of seman- to predict word concreteness (section 2) and pre-
tic nature, have been studied and used in psycholin- sented the concreteness data available for German
guistic research for decades. These properties often (section 3).
referred to as (affective) word norms, include con-
creteness, imagery1 , age of acquisition, valence, 2 Related work
and arousal. In the present work we will focus on
concreteness. Friendly et al. (1982) define concrete We find basically three approaches to predict the
words as words that “refer to tangible objects, ma- concreteness of a word: (1) adopting the concrete-
terials or persons which can be easily perceived ness value from similar, related or neighboring
with the senses”. Similarly, Brysbaert et al. (2014) words; (2) identifying a dimension in word embed-
define concreteness as the degree to which the con- dings that corresponds to concreteness; (3) training
cept denoted by a word refers to a perceptible en- a regression model on features of words
Copyright c 2020 for this paper by its authors. Use permitted 2.1 Adopting concreteness of related words
under Creative Commons License Attribution 4.0 Interna-
tional (CC BY 4.0) Liu et al. (2014) predict values for imagery, a word
1 Most authors seem to use the term imagery, while others
norm that strongly correlates with concreteness, by
also use imageability and visualness. In German the term Bild-
haftigkeit is the most common one, while also Vorstellbarkeit using the values from synonyms and hypernyms
is found. We will use imagery throughout this paper. found in WordNet.
Rabinovich et al. (2018) predict the concreteness 2.2 Concreteness in word embeddings
of words indirectly by assigning a concreteness
Rothe et al. (2016) try to find low-dimensional fea-
value to sentences in which a word occurs. The
ture representations of words in which at least some
concreteness value of a sentence is based on the
dimensions correspond to interpretable properties
presence of seed words. The set of seed words is
of words. One of these dimensions is concreteness.
constructed by selecting words with derivational
For training and testing they use Google News em-
suffixes that are typical for highly abstract nouns.
beddings and two subsets of frequent words from
The correlation between the predicted values and
the norms of Brysbaert et al. (2014). For their test
the manual assigned values from various subsets
set of 8,694 frequent words they found a moderate
of the dataset from Brysbaert et al. (2014) and the
correlation with the human judgments (Kendall’s
4,295 concreteness values2 from the MRC (Medi-
τ = 0, 623). Similarly, Hollis and Westbury (2016)
cal Research Council) Psycholinguistic Database
looked which dimension of word embeddings cor-
(Coltheart, 1981) ranges from 0.66 to 0.74.
relate to one of the classical word norms. They
Turney et al. (2011) compute the degree of con- found no direct correlations, but after reducing the
creteness of a word as the sum of the similarities number of dimensions for a set of words by apply-
between the word and n abstract paradigm words ing Singular Value Decomposition, they found a
minus the sum of the similarities between the word strong correlation between one of the dimensions
and n concrete paradigm words. The paradigm and concreteness.
words are found as follows: first one concrete and
one abstract paradigm word are selected such that 2.3 Regression models for concreteness
the concreteness values for all words in the training
Tanaka et al. (2013) train a regression model to
data, predicted by using the similarity with these
predict concreteness values. As features they
two words is maximized. Then a second concrete
use a small number of manually constructed co-
and abstract word are added that again maximize
occurrence features, like co-occurrence with sense
the correlation. This process is repeated until n
verbs. For training and evaluation they use a subset
abstract and concrete words are found. Turney
of 3,455 nouns from the MRC Database. Pear-
et al. (2011) limit the selection to 20 abstract and
son’s correlation and Kendall’s τ between the val-
concrete paradigm words. Using half of the MRC
ues from the database and their predictions are
data for training and half for testing, they found
0.688 and 0.508, respectively.
Spearman correlation coefficient of 0.81 between
Paetzold and Specia (2016) train a regression
predicted and observed concreteness values. To
model to predict four word norms, among which
compute the similarity between words they use
concreteness. Like many other studies, they use the
count based word embeddings of 1000 dimensions
data from the MRC database. As features they use
trained on a 5 · 1010 word web corpus. The same ap-
word embeddings trained on a set of various large
proach is followed by Köper and Schulte im Walde
corpora and a number of word features extracted
(2016) to predict concreteness values for German
from WordNet. For each word norm they use half
words using word vectors trained with word2vec
of the words to train the model and half of the
(Mikolov et al., 2013) on the DE-COW14AX Ger-
words for evaluation. For concreteness they find a
man Web corpus. For training and testing they
Pearson correlation coefficient of 0.869.
merged concreteness values from Kanske and Kotz
Ehara (2017) trains regression models to predict
(2010) (called Leipzig Word Norms below) and
four word norms for Japanese and English words.
Lahl et al. (2009) (called WWN below) and added
As features they use word embeddings trained with
in addition translations from sets of English word
word2vec and a probability distribution of words
norms for training. 90% of the data were used for
over topics found using Latent Dirichlet Allocation.
training, 10% for testing. The Pearson correlation
They use a subset of 1,842 words from the MRC
between the test data and the predicted values for
data, from which 1,342 words are used for training
concreteness/abstractness was 0.825.
and 500 for testing. When both feature sets are
trained on the British National Corpus (BNC) and
2 The current version of MRC has concreteness values ag-
used in combination, the best regression model
gregated from different sources for 8,288 words. We assume
that a previous version provided concreteness values for 4,295 gives a Pearson correlation of 0.87 and a Spearman
words. correlation coefficient of 0.876 on the test data.
Ljubešić et al. (2018) used a regression model Word Norms. Only nouns were used to reduce
as well with pre-trained fastText word embeddings the variance other word classes would introduce.
(Mikolov et al., 2018). They found a Spearman cor- The experiment was done in 2006 with 32 native
relation coefficient of 0.887 between the predicted speaker. On two separate days the participants
concreteness values and the values from Brysbaert rated the words 3 times on a 9-point scale, each
et al. (2014) and a Spearman correlation of 0.872 time for one of the three ratings. This was repeated
on the MRC data, in both cases using 3-fold cross 2 years later with two groups, one with 22 repeat-
validation. A similar result was found by Char- ing participants from 2006 and a second with 32
bonnier and Wartena (2019), who reach a Pearson fresh participants. The words were collected from
correlation coefficient of 0.91 on the data from the Duden dictionary and a previous word list by
Brysbaert et al. (2014), using the same vectors and the same authors. Only 1 and 2-syllable words and
10-fold cross validation. Here a minor improve- no compound nouns were allowed.
ment could be realized using part of speech and The Berlin Word Norms (Vo et al., 2009) and the
frequent suffixes as additional features. word norms determined by Schmidtke et al. (2014)
Though all studies use different data and differ- contain values for valence, arousal and imagery but
ent versions of the MRC Psycholinguistic database, no values for concreteness. Some more word norms
use different splits and different number of folds for German, including concreteness, are published
for cross validation and finally use different corre- by Hager and Hasselhorn (1994).
lation coefficients, all studies report very similar
3.1 Merged Dataset
results. The correlations that are found are all in
the range of correlations found between various In order to have a larger dataset for German, pro-
sets of concreteness values (see Charbonnier and viding more training data for supervised prediction
Wartena, 2019, Table 2). algorithms, we created a merged dataset.
The overlap of the datasets is quite small (see
3 Data Table 1), the correlation between the values for the
overlapping parts, however, is high (around 0.9).
Both, for English and German, various word norms Since the Leipzig Word Norm uses low values for
with concreteness values have been created, though concrete and high values for abstract words, the
some are quite small and only available as printed correlation between this and the other datasets is
supplements to older publications. negative.
The dataset created by Baschek et al. (1977) and For the merged data set we use the 7 point scale
Wippich and Bredenkamp (1979) consists of 1698 where 1 means abstract and 7 means concrete. We
words (800 nouns, 400 adjectives, 498 verbs) is do not simply rescale the values but use linear re-
one of the oldest and still one of the largest word gression on the overlapping parts such that the val-
norms for German. We will refer to this dataset as ues for the words in overlapping parts are as close
the Göttingen Word Norms. We removed 40 verbs as possible. We take the values from the Göttin-
containing an underscore, especially all reflexive gen Word Norms as an anchor and transform the
verbs (e.g. sich_wünschen; to wish), from the data other values using the slope and the intercept. The
set. For number of words the experiment was re- transformed concreteness thus is defined as
peated and two values are given. We only use the
first value in these cases. C0 = α + βC (1)
Lahl et al. (2009) collected values for 2,654 where C is the original value. For WWN α = 0.776
words using crowdsourcing to build a dataset called and β = 0.608 and for the Leipzig Word Norms
the Web Word Norms (WWN). For the WWN α = 7.39 and β = −0.540. Finally, we take the
3,907 subjects committed 190,212 ratings, each average from all datasets if a word is present in
for at most 50 words. On average each word has 24 more than one source. The dataset thus offers
ratings. They used a 11-point scale were 0 stands empirical concreteness values for 4,182 German
for the most concrete and 10 for the least concrete words. In Table 1 we see the high correlation
judgment. of the values in the merged data with those in
Kanske and Kotz (2010) collected ratings for va- the original datasets. The merged dataset can be
lence, arousal and concreteness for 1000 nouns. downloaded from http://textmining.wp.
This dataset is known as the Leipzig Affective hs-hannover.de/datasets.html
Table 1: Size of the intersections and the Pearson correlation between the concreteness values in the datasets. As
the Merged set is a composition of the other dataset, the intersection is always equal to the size of the other dataset.
Merged WN Göttingen WN WWN
Inters. Correl. Inters. Correl. Inters. Correl.
Göttingen WN 1698 0.997
WWN 2654 0.969 680 0.900
Leipzig WN 1000 -0.985 127 -0.928 488 -0.875
Table 2: Results of 5-fold cross validation using different methods for all datasets. All results are averaged Pearson
correlation coefficients. For Turney we used 20 words per class.
Merged Göttingen WN WWN Leipzig WN
SVR 0.861 (± 0.026) 0.862 (± 0.040) 0.851 (± 0.023) 0.890 (± 0.027)
Turney et al. 0.849 (± 0.012) 0.842 (± 0.033) 0.851 (± 0.020) 0.901 (± 0.017)
4 Methods is based in the method of Turney et al. It has to
be noted that the search of the prototype words in
For each dataset we use two methods to predict the Turney’s method is extremely slow and not feasible
concreteness values in a five-fold cross validation for large datasets.
scheme. We compare the method of Turney et al. Furthermore, we see that our implementation of
(2011) described above in section 2. Following Tur- the method of Turney et al. gives slightly better re-
ney et al. (2011) and Köper and Schulte im Walde sults for WWN and the Leipzig Word Norms than
(2016) we use 20 abstract and 20 concrete proto- the result found by Köper and Schulte im Walde
type words. As a second method we use Support (2016), who used a random split of the unification
Vector Regression (Drucker et al., 1997) and grid of those two datasets (0.844 and 0.891 vs. 0.825).
search to find optimal hyper parameters (γ = 1, Besides the possibility that they have chosen a dis-
C = 10 with an rbf kernel). As features we use advantageous split, we see two differences: In the
the pre-trained Word-embeddings from fastText for first place we used different word embeddings to
German (Grave et al., 2018). compute the word similarity. Secondly, they added
All test were done using 5-fold cross validation. concreteness values from English datasets with Ger-
We use stratified sampling for the Göttingen WN man translations to the training data. This is only
and the Merged dataset to ensure that each fold has helpful if concreteness is invariant under transla-
the same number of nouns, verbs and adjectives. tion. This might be not the case.
For the other dataset we use random splits.
6 Conclusions and Future Work
5 Results and Discussion
Datasets with concreteness values for German are
The results for all datasets and both methods are smaller and less easily accessible than those for En-
given in Table 2. We see in general very high glish. One contribution of the present work is that
correlation values for all datasets and both methods. we aggregated a consistent dataset with over 4000
All correlation values are in a similar range as the concreteness ratings from three different sources.
correlations between the datasets. A possibility to obtain more concreteness ratings
We can make some interesting observations. The is to train a model on available ratings and predict
first remarkable fact is, that for all datasets there is ratings for other words. We show that prediction
no significant difference between the results from methods that have been tested for English only
the method of Turney et al. (2011) and the regres- before yield similar results for German. Moreover,
sion model. As far as we know, these methods have we show that two of the best available methods that
not been compared directly before. This result is have not been compared on the same data before,
quite surprising, since there are many aspects of yield similar results with no significant differences
the meaning of a word that determine the word on 4 different data sets.
similarity. All of these aspects are used to find the In near future we will extend the merged dataset
similar words on which the concreteness prediction with values from some smaller and older studies.
References topics in multimodal datasets. In Proceedings of
the 2018 Conference of the North American Chap-
Ilse-Lore Baschek, Jürgen Bredenkamp, Brigitte ter of the Association for Computational Linguistics:
Oehrle, and Werner Wippich. 1977. Determina- Human Language Technologies, Volume 1 (Long Pa-
tion of imagery, concreteness and meaningfulness of pers), pages 2194–2205, New Orleans, Louisiana.
800 nouns. Zeitschrift für experimentelle und ange- Association for Computational Linguistics.
wandte Psychologie, 24(3):353–396.
Anna M Borghi, Ferdinand Binkofski, Cristiano Castel- Felix Hill and Anna Korhonen. 2014. Concreteness
franchi, Felice Cimatti, Claudia Scorolli, and Luca and subjectivity as dimensions of lexical meaning.
Tummolini. 2017. The challenge of abstract con- In Proceedings of the 52nd Annual Meeting of the
cepts. Psychological Bulletin, 143(3):263. Association for Computational Linguistics (Volume
2: Short Papers), pages 725–731.
Marc Brysbaert, Amy Beth Warriner, and Victor Ku-
perman. 2014. Concreteness ratings for 40 thousand Geoff Hollis and Chris Westbury. 2016. The principals
generally known English word lemmas. Behavior of meaning: Extracting semantic dimensions from
Research Methods, 46(3):904–911. co-occurrence models of semantics. Psychonomic
bulletin & review, 23(6):1744–1756.
Jean Charbonnier and Christian Wartena. 2019. Pre-
dicting word concreteness and imagery. In Proceed- Sujay Kumar Jauhar and Lucia Specia. 2012. Uow-
ings of the 13th International Conference on Com- shef: Simplex–lexical simplicity ranking based on
putational Semantics-Long Papers, pages 176–187. contextual and psycholinguistic features. In * SEM
2012: The First Joint Conference on Lexical and
Max Coltheart. 1981. The MRC Psycholinguistic Computational Semantics–Volume 1: Proceedings
Database. The Quarterly Journal of Experimental of the main conference and the shared task, and Vol-
Psychology Section A, 33(4):497–505. ume 2: Proceedings of the Sixth International Work-
shop on Semantic Evaluation (SemEval 2012), pages
Harris Drucker, Christopher JC Burges, Linda Kauf-
477–481.
man, Alex J Smola, and Vladimir Vapnik. 1997.
Support vector regression machines. In Advances in
neural information processing systems, pages 155– Philipp Kanske and Sonja A. Kotz. 2010. Leipzig affec-
161. tive norms for german: A reliability study. Behavior
Research Methods, 42(4):987–991.
Yo Ehara. 2017. Language-independent prediction of
psycholinguistic properties of words. In Proceed- Maximilian Köper and Sabine Schulte im Walde. 2016.
ings of the Eighth International Joint Conference on Automatically generated affective norms of abstract-
Natural Language Processing (Volume 2: Short Pa- ness, arousal, imageability and valence for 350 000
pers), pages 330–336. german lemmas. In Proceedings of the Tenth In-
ternational Conference on Language Resources and
Diego Frassinelli and Sabine Schulte im Walde. 2019. Evaluation (LREC’16), pages 2595–2598.
Distributional interaction of concreteness and ab-
stractness in verb–noun subcategorisation. In Pro- Olaf Lahl, Anja S. Göritz, Reinhard Pietrowsky, and
ceedings of the 13th International Conference on Jessica Rosenberg. 2009. Using the world-wide web
Computational Semantics - Short Papers, pages 38– to obtain large-scale word norms: 190,212 ratings
43, Gothenburg, Sweden. Association for Computa- on a set of 2,654 german nouns. Behavior Research
tional Linguistics. Methods, 41(1):13–19.
Michael Friendly, Patricia E. Franklin, David Hoff- Ting Liu, Kit Cho, G. Aaron Broadwell, Samira Shaikh,
man, and David C. Rubin. 1982. The Toronto Tomek Strzalkowski, John Lien, Sarah Taylor, Lau-
Word Pool: Norms for imagery, concreteness, ortho- rie Feldman, Boris Yamrom, Nick Webb, Umit
graphic variables, and grammatical usage for 1,080 Boz, Ignacio Cases, and Ching-sheng Lin. 2014.
words. Behavior Research Methods & Instrumenta- Automatic expansion of the MRC psycholinguistic
tion, 14(4):375–399. database imageability ratings. In Proceedings of
Edouard Grave, Piotr Bojanowski, Prakhar Gupta, Ar- the Ninth International Conference on Language
mand Joulin, and Tomas Mikolov. 2018. Learning Resources and Evaluation (LREC’14), pages 2800–
word vectors for 157 languages. In Proceedings 2805, Reykjavik, Iceland. European Language Re-
of the International Conference on Language Re- sources Association (ELRA).
sources and Evaluation (LREC 2018).
Nikola Ljubešić, Darja Fišer, and Anita Peti-Stantić.
Willi Hager and Marcus Hasselhorn, editors. 1994. 2018. Predicting concreteness and imageability of
Handbuch deutschsprachiger Wortnormen. Hogrefe words within and across languages via word em-
Verlag für Psychologie, Göttingen. beddings. In Proceedings of The Third Workshop
on Representation Learning for NLP, pages 217–
Jack Hessel, David Mimno, and Lillian Lee. 2018. 222, Melbourne, Australia. Association for Compu-
Quantifying the visual concreteness of words and tational Linguistics.
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Werner Wippich and Jürgen Bredenkamp. 1979.
Dean. 2013. Efficient estimation of word represen- Bildhaftigkeit und Lernen, volume 78 of Wis-
tations in vector space. senschaftliche Forschungsberichte. Steinkopff-
Verlag, Darmstadt.
Tomas Mikolov, Edouard Grave, Piotr Bojanowski,
Christian Puhrsch, and Armand Joulin. 2018. Ad-
vances in pre-training distributed word representa-
tions. In Proceedings of the International Confer-
ence on Language Resources and Evaluation (LREC
2018).
Gustavo Paetzold and Lucia Specia. 2016. Inferring
psycholinguistic properties of words. In Proceed-
ings of the 2016 Conference of the North Ameri-
can Chapter of the Association for Computational
Linguistics: Human Language Technologies, pages
435–440, San Diego, California. Association for
Computational Linguistics.
Bénédicte Pierrejean and Ludovic Tanguy. 2019. Inves-
tigating the stability of concrete nouns in word em-
beddings. In Proceedings of the 13th International
Conference on Computational Semantics - Short Pa-
pers, pages 65–70, Gothenburg, Sweden. Associa-
tion for Computational Linguistics.
E. Rabinovich, B. Sznajder, A. Spector, I. Shnayder-
man, R. Aharonov, D. Konopnicki, and N. Slonim.
2018. Learning Concept Abstractness Using Weak
Supervision. ArXiv e-prints.
Sascha Rothe, Sebastian Ebert, and Hinrich Schütze.
2016. Ultradense word embeddings by orthogonal
transformation. In Proceedings of the 2016 Confer-
ence of the North American Chapter of the Associ-
ation for Computational Linguistics: Human Lan-
guage Technologies, pages 767–777. Association for
Computational Linguistics.
David S Schmidtke, Tobias Schröder, Arthur M Jacobs,
and Markus Conrad. 2014. Angst: Affective norms
for german sentiment terms, derived from the affec-
tive norms for english words. Behavior research
methods, 46(4):1108–1118.
Shinya Tanaka, Adam Jatowt, Makoto P. Kato, and Kat-
sumi Tanaka. 2013. Estimating content concrete-
ness for finding comprehensible documents. In Pro-
ceedings of the Sixth ACM International Conference
on Web Search and Data Mining, WSDM ’13, pages
475–484, New York, NY, USA. ACM.
Peter D. Turney, Yair Neuman, Dan Assaf, and Yohai
Cohen. 2011. Literal and metaphorical sense iden-
tification through concrete and abstract context. In
Proceedings of the Conference on Empirical Meth-
ods in Natural Language Processing, EMNLP ’11,
pages 680–690, Stroudsburg, PA, USA. Association
for Computational Linguistics.
Melissa LH Vo, Markus Conrad, Lars Kuchinke,
Karolina Urton, Markus J Hofmann, and Arthur M
Jacobs. 2009. The Berlin affective word list
reloaded (bawl-r). Behavior research methods,
41(2):534–538.