=Paper=
{{Paper
|id=Vol-2813/rpaper06
|storemode=property
|title=Text Complexity and Abstractness: Tools for the Russian Language
|pdfUrl=https://ceur-ws.org/Vol-2813/rpaper06.pdf
|volume=Vol-2813
|authors=Valery Solovyev,Marina Solnyshkina,Mariia Andreeva,Andrey Danilov,Radif Zamaletdinov
|dblpUrl=https://dblp.org/rec/conf/ims2/SolovyevSADZ20
}}
==Text Complexity and Abstractness: Tools for the Russian Language==
International Conference "Internet and Modern Society" (IMS-2020). CEUR Proceedings 75
Text Complexity and Abstractness:
Tools for the Russian Language
Valery Solovyev1[0000-0003-4692-2564], Marina Solnyshkina1[0000-0003-1885-3039],
Mariia Andreeva2,1[0000-0002-5760-0934], Andrey Danilov1 [0000-0002-2358-1157],
Radif Zamaletdinov1[0000-0002-2692-1698]
1 Kazan State Federal University. Kremlyovskaya, 18, 420008, Kazan, Russia
2 Kazan State Medical University. Butlerova, 49, 420012, Kazan, Russia
maki.solovyev@mail.ru
Abstract. The article focuses on two parallel studies aimed at validating an
original automatic tool (RusAC) designed to define the level of abstractness of
Russian texts. The studies were conducted on: (a) the Russian Academic
Corpus (RAC) compiled of the textbooks used in middle and high schools of
the Russian Federation and (b) students’ recalls of academic texts. The design
of RusAC is based on the Russian Dictionary of abstractness / concreteness
compiled by the authors in previous studies, which enlists abstractness ratings
of over 88.000 tokens. The pilot studies pursued on the Russian Academic
Corpus (circa 3 mln tokens) proved that the ratio of abstract words grows in
textbooks of all disciplines across grades from 5 to 11. We also confirmed that
the share of abstract words in Science textbooks is lower than that in the
Humanities textbooks and that abstractness of readers’ recalls is typically lower
than that of the original text as the respondents tend to omit more abstract
words than concrete. The findings of the research may be applied in a wide
range of spheres including education, business, PR, medicine etc. as RusAC
facilitates leveling texts for different categories of readers.
Keywords: Text Complexity, Abstractness, Concreteness, Textbooks.
Introduction
In modern Education leveling and profiling texts is viewed profoundly significant as
graduated reading levels of text books build students' confidence and increase
comprehension. The latter can be achieved only with the help of automated tools able
to discriminate texts for readers of various reading literacy levels. “A computational
approach to distinguishing texts offers researchers and educators a number of exciting
avenues of interest” [1]. It is especially true about distinguishing
abstractness/concreteness ratings of different texts which may serve as good
predictors of text complexity [see 2, 3]. However, an automated tool able to compute
texts abstractness and correlate it with text complexity has recently been a research
niche. In this article we present the study aimed at validating an innovative automated
tool RusAC designed and developed to assess a number of linguistic metrics
Copyright ©2020 for this paper by its authors.
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
76 Computational Linguistics
of Russian texts. The study was organized into three major parts: (1) design and
development of a an automated tool (tagging program) that identifies abstract words
in the texts; (2) validation of the tool through a computerized abstractness analysis
based on the Russian Dictionary of abstractness/ concreteness and the tagging
program.
Until now, in the absence of a dictionary of abstract / concrete words, quantitative
studies of Russian texts complexity including assessment of abstractness of words
have been either limited or impossible. In our previous work we presented the first
version of a computer-generated Russian dictionary of concrete/abstract words
(RDCA) [33]. The present study is the first research in which the authors apply the
Dictionary to assess the complexity of texts. We also view a battery of school
textbooks of a particular subject as a good Corpus, since the complexity of textbooks
is expected to grow from class to class. The hypothesis of the current study is the
following: if the number of abstract words grows from class to class, then the number
of abstract words as a metric can be used in assessments of complexity of other text
thus extending the sphere of applying RDCA.
1 Literature review
1.1 Psycholinguistic approach to concrete / abstract words
The notion of abstractness/ concreteness (hereinafter A / C) has been a focus of
numerous studies [see 4] as the problem of discriminating concrete and abstract words
is considered relevant in linguistics, psychology, education, etc. In the modern
paradigm, the discrimination of abstractness / concreteness rests on the idea that
concrete words denote referents experienced, primarily, through senses, whereas
referents nominated with abstract words refer to ideas or concepts [5].
Psycholinguistic studies suggest a number of differences in processing concrete and
abstract words [6, 7, 8, 9, 10]. Perception and acquisition of abstract words is
hindered by lack of ‘word to world’ mapping, i.e. when comprehending an abstract
concept a person may fail to make correspondences to real word phenomena (c.f.
learning words ‘a car’ and ‘good’) [11, 12]. The latter argument was also supported in
the study of specifics of acquisition and processing of abstract / concrete words by
school children [9, 13]. The research shows that children take longer to acquire
abstract words as compared to concrete ones even when it comes to high frequency
words [14]. Due to this fact, P. Schwanenflugel infers that abstract words are harder
for children to understand [14]. Moreover, when tested in a variety of lexical tasks,
abstract words are found to exhibit slower reaction time and less accurate responses
[15, 16, 17]. Similar conclusions are found in V. Marian’s (2009) studies who claims
that concreteness is found to be a property facilitating words acquisition as concrete
words are recognised and processed more rapidly [18]. Psycholinguistic experiments
also indicate ‘that 75% of the words most frequently produced by school-aged
children (6–12 years of age) are concrete and it is not until adolescence that children
master the majority of abstract words used by adults’ [13].
International Conference "Internet and Modern Society" (IMS-2020). CEUR Proceedings 77
1.2 The rating of abstract/concrete words as a text complexity parameter
Abstractness as a text complexity feature has been confirmed by a number of
researchers viewing it as a text-related variable contributing to the difficulty of
reading comprehension [19]. D. Fisher suggests that the fewer concrete words there
are in a text, the higher is the text complexity [20]. While including abstractness into a
list of features influencing text complexity, Petrie (1992) argues that ‘the degree of
abstraction (abstractness) is difficult to determine’ [see 21]. Sadoski et. al. (2000)
studied concreteness as a text feature that engaged readers' comprehension, interest,
and learning in four text types: persuasion, exposition (Science and Maths), literary
stories, and narratives (History and Social Studies). In the experimental study, 80
under-graduates read either three concrete or three abstract texts, further wrote an
exposition and rated them for familiarity, concreteness, interestingness, and
comprehensibility using 7-point bipolar scales. As a result the authors claim that
concreteness was ‘overwhelmingly the best predictor of overall comprehensibility,
interest, and recall’ [22].
In applied linguistics, the number of concrete / abstract words in texts is validated
to strongly correlate with texts complexity as texts about abstract notions are more
difficult to comprehend than texts about concrete notions. The correlation between
abstractness and text complexity has been also demonstrated in the research of
Russian scholars who conducted the study on separate academic texts [2, 3].
Presenting the results of his study of abstractness of over 20 Russian text-books on
biology, geography, physics and chemistry, R. Mayer ranks them based on their
complexity [23].
1.3 Methods and tools measuring the degree of word abstractness /
concreteness
Many worldwide research aimed at rating words as concrete or abstract involve native
speakers who are asked to use a numerical scale as an effective instrument to measure
A / C [24, 22, 5, 25]. A well known dictionary of English words registering A / C
ratings of 4000 English words, used in the MRC Psycholinguistic Datase, was
compiled based on a 7-point bipolar scale [24]. The respondents participating in the
study tagged each word with an A / C rating from seven (the highest) to one (the
lowest). In such a way every word received a rating from 100 to 700. This dictionary
is still used in much research on the English language and in cross-linguistic studies
[26, 5, 27, 28, 29].
In another study aimed at defining the A / C ratings of 60,099 English words and
2,940 two word expressions (such as “zebra crossing” and “zoom in”) Brysbaert et. al
(2014) asked respondents to assess the abstractness/concreteness the meaning of each
word is by using a 5-point rating scale designed from abstract to concrete [5]. Using
the A/C numerical scale, Wang et. al. (2018) computed the degree of abstractness of
Chinese words from the context-sensitive model of word embedding in rich
contextual information. Word vectors for word distribution study were trained on
Reader Corpus (Chinese Corpus). The authors ‘built paradigms of A/C words’ in two
steps: (1) respondents’ evaluation of 200 Chinese words as concrete or abstract using
‘– 1 / 0 / 1’ scale, with ‘– 1’ being the most concrete, ‘1’ – the most abstract. (2)
Extending obtained results by classification algorithm based on the corpus [25].
78 Computational Linguistics
A similar online study was pursued for the Russian language in which respondents
were asked to evaluate the C / A ratings of 500 most frequent Russian nouns on a 5-
point scale. The C /A ratings of each Russian word were computed as an average of
all the assessments received in the range from 1 to 5.
As the Dictionary data [24] and our estimates were computed based on different
scales we also processed our estimates with the following formula: f (x) = 100 * (1.5
* ((6-x) - 1)) +1), where x is the value obtained in our survey. After this conversion,
the index values range between 100 (the most abstract words) to 700 (the most
concrete).
The findings, i.e. lists of words tagged with ratings of abstractness/concreteness are
uploaded at https://kpfu.ru/tehnologiya-sozdaniya-semanticheskih-elektronnyh.html
and a fragment of the intra-language comparative analysis of the ratings (based on the
abovementioned scale) is presented in Figure 1 below [30].
Fig. 1. A / C ratings of Russian nouns
Researchers designed and developed a number of text complexity software able to
match texts with lists of abstract words [31, 32]. E.g., Coh-Metrix provides the
average A / C ratings for content words in a text thus offering. However, replicating
large scale studies aimed at assessing the level of A / C for the Russian language was
lately a challenge as there was no automated tool defining rank of abstractness of
Russian words. In our latest study we identified it, designed and compiled the Russian
Dictionary of abstractness/ concreteness [see 33].
1.4 Russian Dictionary of Abstractness/concreteness
Creating a large dictionary of abstractness by computing interviewees’ assessments is
time and energy consuming. Therefore, the dictionary was compiled automatically
based on a large corpus of texts, i.e. the Google Books Ngram package
(https://books.google.com/ngrams). The fundamental ideas of the dictionary are as
follows: (A) Abstract words are more often found along with abstract words, while
concrete words are used more frequently with concrete words [37]. (B) We define the
core comprising a certain set of words that are obviously abstract and another set
which is obviously concrete and then expand it to the size of the dictionary selecting
the entries based on (A). A detailed description of the method is provided in [33].
As a result we compiled a dictionary of 88.000 words available at
International Conference "Internet and Modern Society" (IMS-2020). CEUR Proceedings 79
https://kpfu.ru/tehnologiya-sozdaniya-semanticheskih-elektronnyh.html. The values
of the concreteness / abstract index are in the range from -4.91 to 4.56 for nouns and
from -4.01 to 5.33 for adjectives. The A / C index for verbs was not calculated in
accordance with the tradition in Russian linguistics not to consider this semantic
category for verbs. Fig.2 below shows a fragment of the dictionary.
Fig. 2. Russian Dictionary of abstractness/ concreteness (fragment)
The Dictionary provides researchers and testers with an instrument facilitating not
only assessment of texts complexity but leveling and profiling texts for different
categories of readers as well.
2 Analysis
The current study was pursued to answer three main research questions:
RQ1: How does the rating of abstractness change across the grades from
elementary to high schools?
RQ2: How different or similar are the ratings of abstractness of textbooks on
Humanities and textbooks on Science?
RQ3: How does the rating of abstractness of recalls differ from the ratings of
abstractness of the original texts?
To answer the research questions we used the Russian Academic corpus, the
Corpus of Recalls and designed an automatic tool defining abstractness of Russian
texts.
2.1 Materials and methods
In this study we used the Russian Academic Corpus (RAC), a corpus of text-books
used in elementary, middle and high schools of the Russian Federation [33]. As the
corpus builders aim at collecting the best possible representative corpus and the list of
school textbooks is non-exhaustive, RAC has been a work in progress for over four
years and by now reached the size of nearly 3 mln. tokens1 (see Table 1 below). The
1 A token is viewed in the work as an instance of a sequence of characters in some particular
document that are grouped together as a useful semantic unit for processing. In this article it
80 Computational Linguistics
books included are published between 2006 and 2020 and the body of the Corpus is
divided into two sub-corpora: Science Sub-corpus (628920 tokens) and Humanities
Sub-corpus (2105058 tokens). Both sub-corpora comprise textbooks specified in the
“Federal List of Textbooks Recommended by the Ministry of Education and Science
of the Russian Federation to Use in Secondary and High Schools”. The choice of
these particular textbooks was caused by a number of reasons: (a) the fact that the
texts under study use minimum of non alphabetical symbols, graphs, figures etc., (b)
the availability on the textbooks on the Internet (School textbooks and manuals,
2017). The detailed information on the size of the corpus is presented in Table 1
(below).
Table 1. The Size of Russian Academic Corpus
Grade Tokens
Science Humanities TOTAL
1st 21304 4757 26061
2nd 29284 28235 57519
3d 53565 - 53565
4-th 51489 24621 76110
5-th 102467 19527 121994
6-th - 159664 159664
7-th 75205 111788 186993
8-th - 273251 273251
9-th 88335 390821 479156
10-th 207271 656072 863343
11-th - 436322 436322
Total 628920 2105058 2733978
RAC contains 74 documents (textbooks) of all grades and disciplines and as such is
considered a representative sample of the population of Russian school textbooks.
2.2 Corpus of Readers’ Recalls
The Corpus of Students’ Recalls was compiled as a side result of the study aimed at
evaluating the impact of cohesion on readers’ comprehension [35].
Of 289 respondents participating in the study we selected 65 with the General
Knowledge index2 ranging between 13 and 16. Those were 11-12 year old native
Russian speakers. The subjects were individually asked to read one of the
informational texts, MT53 (modified text for the 5th Grade #3) and OT53 (original
text for the 5th Grade #3), both of about 200 words with which they had no previous
experience. The texts were fragments of a Chapter from a textbook on Social Science
5 by Bogolyubov N.F. [36]. The recalls of the respondents were recorded by experts
refers to the total number of words in a text, corpus etc, regardless of how often they are
repeated. A type is the class of all tokens containing the same character sequence.
2 Wechsler “General knowledge” Subtest for Children (WISC GK) as it is widely used to assess
IQ to predict or explain school performance.
International Conference "Internet and Modern Society" (IMS-2020). CEUR Proceedings 81
and assessed holistically on its relevance to the task and statistically: we computed the
number of tokens and propositions in each recall.
The total size of the corpus is 6473 tokens. As the Corpus is presented in 65
separate texts with the average number of words in recalls being 106.4 (MT53) and 92
(OT53) tokens we view the Corpus is representative enough. The statistics on the
Corpus of Readers’ Recalls and selected samples of recalls are uploaded at the site
Technologies of electronic dictionaries’ compilation, at https://kpfu.ru/tehnologiya-
sozdaniya-semanticheskih-elektronnyh. html, last accessed 2020/17/05.
2.3 RusAC as the automatic tool defining abstractness of Russian texts
Text preprocessing (tokenization, etc.) is carried out with Russian TreeTagger
(http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/). RusAC processes texts
for abstractness/concreteness and readability, which together allow the tool to
estimate which of the texts processed is more difficult for comprehension.
Fig. 3. The RusAC text input box
RusAC ensures the following functions:
1) automatic assessment of text complexity based on two descriptive parameters,
i.e. length of words and length of sentences; text complexity is calculated based
on the formula proposed in [38];
2) assigning words in a text with an A/C rating from the dictionary;
3) saving the results of the analysis. RusAC performs the text analysis of texts
saved as doc, txt, rtf files.
82 Computational Linguistics
Fig. 4. The RusAC text output data
3 Results
3.1 Abstractness of school textbooks
In this study we performed a systemic study of abstractness of all the text-books in
RAC grouped into the following sets: Primary school textbooks (30, grades 1-4),
Middle school textbooks (19, grades 5- 8), High school textbooks (25, grades 9-11).
The complete set of textbooks for secondary and high schools comprises 21 books on
Humanities and 11 books on Science (Biology). The procedure for computing the
mean index of concreteness is as follows: (1) we search the texts for the tokens
registered in Russian Dictionary of Abstractness; (2) tag each token with the
corresponding index from the Dictionary; (3) compute the average either for the book
or for a set of textbooks, (a) in the first case the sum of the indices is divided by the
number of tagged tokens in the book and (b) for the set of books the sum of the
indices is divided by the total number of tagged tokens in those books.
Table 2. The A / C ratings in textbooks
Subject Number of textbooks Grade Mean abstractness index
All Primary school 30 1-4 +0,34
Biology 7 5-7 +0,49
Biology 5 9-10 +0,15
History 7 10-11 0
Social Studies 7 5-8 -0,11
Social Studies 7 9-11 -0,15
Literature 5 6-8 +0,08
Literature 6 9-11 -0,14
International Conference "Internet and Modern Society" (IMS-2020). CEUR Proceedings 83
The mean abstractness index (see column 4, Table 2) indicates the following: a) the
highest index of concreteness is demonstrated by texts in Biology and primary school
books: the concreteness of Biology textbooks for middle school is the highest with
+0,49 which is even higher than that of primary school texts which is +0,34
abstractness; b) The index of Social studies textbooks marks the highest level of
abstractness of those texts; c) History books are located in the middle of the scale with
the “0” score probably due to the fact of an equal incidence of concrete and abstract
words. It can be explained by the fact that a pattern of History texts contains
descriptions of some artefacts and narration of events which bear a high degree of
concreteness.
In general, there is a statistically significant (p-value <0.001) dependence of the
abstract index on the Grade level both for the entire collection of textbooks and
separately for subcollections of Biology and Literature textbooks. In Social Sciences
and History textbooks regularity is not significant.
3.2 Abstractness/Concreteness of Readers’ recalls
The texts offered to the participants of the study for recalls, OT53 and MT53, bear
similar average indices of A / C (see Tables 3). It was computed in the same way as
the index of textbooks: all the words in the texts registered in Russian Dictionary of
Abstractness received a corresponding tag with a rating, all the total sum of the
ratings was divided into the number of tagged tokens in the text.
Table 3. OT53 and MT53 Data
Code Word Count Abst_index
MT53 Text 222 0,12
OT53 Text 210 0,17
Table 4. OT53 and MT53 Recall data analysis (fragment)
Recall Code Word Count Abst_index
К5Р09 31 0.72
К5Р10 38 0,44
К5Р13 127 0,26
К5Р14 109 0,92
К5Р21 172 0,02
...
61. КС503 81 0,06
62. КС506 46 0,29
63. КС507 91 0,39
64. КС508 129 0,07
65. КС510 63 0,28
MEAN 0,28
The same procedure was implemented for every recall. The results are presented in
Table 4 and for the complete data visit the website of Technologies of electronic
dictionaries’ compilation, at [34].
84 Computational Linguistics
As the table above demonstrates the average index of A/C for recalls is more than
that of the source texts which confirms that respondents tend to omit more abstract
words and keep the concrete ones in their recalls. As expected, 5th Grade students’
recalls are simpler in terms of traditional metrics and the A / C index. The comparison
of A / C indexes of the recalls and the source texts based on Student criterion
confirms the hypothesis that the difference is statistically significant as the p-value
equals 0.0003.
Conclusion
Abstract words as carriers of the notion of abstractness present a special interest for
linguistics, psychology and pedagogy. In Natural Language Processing studies the
problem is narrowed to designing and developing tools able to tag words in a text
with the corresponding ratings of abstractness/concreteness. The tool evaluating the
level of abstractness of Russian texts was a research and an engineering niche. The
authors of the article created an automated tool, RusAC, performing computation of
the index of concreteness/abstractness. The functions of RusAC are supported by the
Russian Dictionary of Concrete and Abstract words with its total size of 88000 tokens
compiled in our previous study. Implementation of RusAC on two representative
corpora, i.e. Russian Academic Corpus and Corpus of Readers’ Recalls, verified the
hypothesis that the incidence of abstract words in a text impacts its complexity as they
are taking longer to be processed by readers.
School textbooks were selected to test the proposed approach, since they are
graded by levels of complexity from elementary to advanced. Collections of school
textbooks are used in studies of various techniques for assessing text complexity in
different languages in a number of works [39–43]. One of the most important issues is
to select a battery of classroom books of the same author. This eliminates the
influence of the author’s style, concept or pedagogical attitudes on the texts of
textbooks of different classes and allows to analyze textbooks of the same author for
different grades focusing only in complexity.
The study also confirmed the highest index of concreteness of Science books and
primary school books. The Humanities textbooks demonstrate the highest level of
abstractness. The index of abstractness grows across grades one through 11. The
findings are consistent with the earlier published hypothesis on the impact of abstract
terms on text complexity and validate the designed tool. RusAC is freely available for
all categories of users.
Currently, the index of abstractness is typically interpreted as a separate parameter
calculated for texts but not included in the existing formulas of text complexity. In
this way the index of abstractness compares various texts in this aspect without
marking the level of text complexity.
The perspective of the study is viewed as extending the number of entries in
RDCA and improving its quality. In the next stage of research, we plan to pursue a
survey and text recall experiments with students of Grades 9-10 (15-17 years old),
thus expanding the database and providing foundation to compare the level of
abstractness of texts generated by schoolchildren of different grades.
International Conference "Internet and Modern Society" (IMS-2020). CEUR Proceedings 85
Acknowledgements. The research on the Russian Dictionary of Concrete and Abstract words
was supported by the Russian Fund of Basic Research, Grant 19-07-00807. The Survey and
Analysis of the present study was supported by the Russian Science Foundation, Grant 18-18-
00436. The authors also express sincere gratitude to Dr. Artem Zaikin for his assistance in
processing the statistical data.
References
1. McCarthy, P. M., Lewis, G. A., Dufty, D. F., McNamara, D. S.: Analyzing writing styles
with Coh-Metrix. In Proceedings of the Florida Artificial Intelligence Research Society
International Conference. Menlo Park, CA: AAAI Press, 764 – 769 (2006).
2. Mikk, Ya. A.: Optimization of educational text complexity: for authors and editors
[Optimizatsiya slozhnosti uchebnogo teksta: v pomosch avtoram i redaktoram].
Prosveschenie (1981).
3. Krioni N.K., Nikin A. D., Fillipova A.V.: Automated system of academic texts complexity
analysis [Avtomatizirovannaya sistema analiza slozhnosti ucebnyh tekstov]. Bulletin of
Ufa State Aviation Technical University, Volume 11, 1 (28), 101 – 107 (2008).
4. Reuter K., Werning, M., Kuchinke, L., Cosentino, E.: Reading words hurts: the impact of
pain sensitivity on people’s ratings of pain-related words. Language and Cognition, 9 (3),
553 – 567 (2017).
5. Brysbaert, M., Warriner, A. B., Kuperman, V.: Concreteness ratings for 40 thousand
generally known English word lemmas. Behavior research methods, 46(3), 904 – 911
(2014).
6. Crutch, S. J., & Ridgway, G. R.: On the semantic elements of abstract words. Cortex,
48(10), 1376– 1378 (2012).
7. Kousta, S.-T., Vigliocco, G., Vinson, D. P., Andrews, M., & Del Campo, E.: The
representation of abstract words: Why emotion matters. Journal of Experimental
Psychology: General, 140(1), 14–34 (2011).
8. Paivio, A.: Mind and its evolution: A dual coding theoretical approach. Mahwah, NJ:
Erlbaum (2007).
9. Schwanenflugel, P. J.: Why are abstract concepts hard to understand? The psychology of
word meanings, 223–250 (1991).
10. Borghi, A., Binkofski, F., Castelfranchi, C., Cimatti, F., Scorolli, C., & Tummolini, L.:
The challenge of abstract concepts. Psychological Bullettin, 143, 263–292 (2017).
11. Gleitman, L. R., Cassidy, K., Nappa, R., Papafragou, A., Trueswell, J. C.: Hard words.
Language Learning and Development, 1(1), 23–64 (2005).
12. Yu, C., Smith, L.: Rapid word learning under uncertainty via cross-situational statistics.
Psychological Science, 18(5), 414–420 (2007).
13. Vigliocco, G., Ponari, M., Norbury, C.: Learning and processing abstract words and
concepts: Insights from typical and atypical development. Topics in cognitive science
10(3), 533 – 549 (2018).
14. Schwanenflugel, P. J. (Ed.). The psychology of word meanings. Psychology Press (2013).
15. Paivio, A.: Dual coding theory: Retrospect and current status. Canadian Journal of
Psychology, 45(3), 255 – 287 (1991).
16. Nickels, L., Howard, D.: Aphasic naming: What matters? Neuropsychologia, 33(10), 1281
– 1303 (1995).
86 Computational Linguistics
17. Barry, C., Gerhand, S.: Both concreteness and age-of-acquisition affect reading accuracy
but only concreteness affects comprehension in a deep dyslexic patient. Brain and
Language, 84, 84 - 104 (2003).
18. Marian, V.: Language interaction as a window into bilingual cognitive architecture.
Multidisciplinary approaches to code switching, 161 – 185 (2009).
19. Taylor, L., Weir, C. J.: IELTS collected papers 2: Research in reading and listening
assessment (Vol. 2). Cambridge University Press (2012).
20. Fisher, D., Frey, N., Lapp, D.: Text complexity: Stretching readers with texts and tasks.
Corwin Press. (2016).
21. Holleman, B. The forbid/allow asymmetry: On the cognitive mechanisms underlying
wording effects in surveys (Vol. 16). Rodopi. (2000).
22. Sadoski, M., Goetz, E. T., Rodriguez M.: Engaging texts: Effects of concreteness on
comprehensibility, interest, and recall in four text types. Journal of Educational
Psychology 92.1, 85 (2000).
23. Mayer, R. V.: Assessment of the Level of Abstractness of Material Statement of in Natural
Sciences School Textbooks. Standards and Monitoring in Education. 1, 58 – 63 (2017).
24. Coltheart, M.: The MRC Psycholinguistic Database, Quarterly Journal of Experimental
Psychology, 33A, 497 – 505 (1981)
25. Wang, X., Su, C., Chen, Y.: A Method of Abstractness Ratings for Chinese Concepts. In
UK Workshop on Computational Intelligence, 217 – 226 (2018).
26. Crossley, S., Salsbury, T., McNamara, D. S.: Validating lexical measures using human
scores of lexical proficiency. Vocabulary knowledge: Human ratings and automated
measures, Amsterdam: John Benjamins, 105 – 134 (2013).
27. Dellantonio, S., Mulatti, C., Pastore, L., Job, R.: Measuring inconsistencies can lead you
forward. the case of imageability and concreteness ratings. Language Sciences, 5, 708.
(2014).
28. Troche, J., Crutch, S., Reilly, J.: Clustering, hierarchical organization, and the topography
of abstract and concrete nouns. Frontiers in psychology, 5, 360 (2014).
29. Pastore, L., Dellantonio, S., Mulatti, C., Job, R.: On the nature and composition of abstract
(theoretical) concepts: the X-ception theory and methods for its assessment. In Philosophy
and Cognitive Science II, 35 – 58 (2015).
30. Solovyev, V., Andreeva, M., Solnyshkina, M., Zamaletdinov, R., Danilov A., and
Gaynutdinova, D.: Computing Concreteness Ratings of Russian and English Most
Frequent Words: Contrastive Approach. 2019 12th International Conference on
Developments in eSystems Engineering (DeSE), Kazan, Russia, 403-408 (2019).
31. Pitler, E., Nenkova, A. Revisiting readability: a unified framework for predicting text
quality. In Kentaro Inui, Jing Jiang, Vincent Ng, Xiaojun Wan (eds). Conference on
empirical methods in natural language processing (EMNLP ’08). Association for
Computational Linguistics, Stroudsburg, PA, USA, 186–195 (2008).
32. Laposhina, A.: Relevant features selection for the automatic text complexity measurement
for Russian as a foreign language. [Analiz relevantnyh priznakov dlya avtomaticheskogo
opredeleniya slozhnosti russkogo teksta kak inostrannogo] In V.P. Selegey (eds).
Computational linguistics and intellectual technologies: papers from the annual
international conference ‘Dialogue’, Issue 17, 1–7 (2017).
33. Solovyev V. D., Ivanov V. V., Akhtiamov R. B.: Dictionary of Abstract and Concrete
Words of the Russian Language: A Methodology for Creation and Application. Journal of
Research in Applied Linguistics. vol. 10, 215 -227 (2019).
34. Technologies of electronic dictionaries’ compilation, https://kpfu.ru/tehnologiya-
sozdaniya-semanticheskih-elektronnyh.html, last accessed 2020/19/02.
International Conference "Internet and Modern Society" (IMS-2020). CEUR Proceedings 87
35. McCarthy, K.S., McNamara, D.S., Solnyshkina, M.I., Tarasova, F.Kh., Kupriyanov, R.V.:
The Russian language test: towards assessing text comprehension. Vestnik
Volgogradskogo Gosudarstvennogo Universiteta. Serii︠a︡ 2, Iazykoznanie; Volgograd, 18
(4), 231 – 247 (2019).
36. Bogolyubov N.F.: Social Studies Grade 5 [Obschestvoznanie 5 klass]. A textbook for
secondary schools. 3rd Edition. Prosveschenie, 127 (2013).
37. Frassinelli, D., Schulte im Walde, S.: Distributional interaction of concreteness and
abstractness in verb-noun subcategorisation. Proceedings of the 13th International
Conference on Computational Semantics - Short Papers. Association for Computational
Linguistics, 38-43 (2019).
38. Solovyev, V., Ivanov, V., Solnyshkina, M.: Assessment of reading difficulty levels in
Russian academic texts: Approaches and metrics, Journal of Intelligent & Fuzzy Systems.
34(5), 3049–3058 (2018).
39. Al-Tamimi, A.K., et al.: AARI: Automatic Arabic readability index. International Arab
Journal of Information Technology. 11(4), 370-378 (2014).
40. Chen, Y.-T., Chen, Y.-H., and Cheng, Y.-C.: Assessing Chinese Readability using Term
Frequency and Lexical Chain. IJCLCLP. 18(2), 1-18 (2013).
41. Chen, Y.-H. and Daowadung, P.: Assessing readability of Thai text using support vector
machines. Maejo International Journal of Science and Technology 09(3), 355-369 (2015).
42. Si, I. and Callan, J.: A statistical model for scientific readability. In CIKM, 574–576
(2001).
43. Tanaka-Ishii, K., Tezuka, S. and Terada, H.: Sorting Texts by Readability. Comput.
Linguist. 36(2), 203-227 (2010).