International Conference "Internet and Modern Society" (IMS-2020). CEUR Proceedings 75 Text Complexity and Abstractness: Tools for the Russian Language Valery Solovyev1[0000-0003-4692-2564], Marina Solnyshkina1[0000-0003-1885-3039], Mariia Andreeva2,1[0000-0002-5760-0934], Andrey Danilov1 [0000-0002-2358-1157], Radif Zamaletdinov1[0000-0002-2692-1698] 1 Kazan State Federal University. Kremlyovskaya, 18, 420008, Kazan, Russia 2 Kazan State Medical University. Butlerova, 49, 420012, Kazan, Russia maki.solovyev@mail.ru Abstract. The article focuses on two parallel studies aimed at validating an original automatic tool (RusAC) designed to define the level of abstractness of Russian texts. The studies were conducted on: (a) the Russian Academic Corpus (RAC) compiled of the textbooks used in middle and high schools of the Russian Federation and (b) students’ recalls of academic texts. The design of RusAC is based on the Russian Dictionary of abstractness / concreteness compiled by the authors in previous studies, which enlists abstractness ratings of over 88.000 tokens. The pilot studies pursued on the Russian Academic Corpus (circa 3 mln tokens) proved that the ratio of abstract words grows in textbooks of all disciplines across grades from 5 to 11. We also confirmed that the share of abstract words in Science textbooks is lower than that in the Humanities textbooks and that abstractness of readers’ recalls is typically lower than that of the original text as the respondents tend to omit more abstract words than concrete. The findings of the research may be applied in a wide range of spheres including education, business, PR, medicine etc. as RusAC facilitates leveling texts for different categories of readers. Keywords: Text Complexity, Abstractness, Concreteness, Textbooks. Introduction In modern Education leveling and profiling texts is viewed profoundly significant as graduated reading levels of text books build students' confidence and increase comprehension. The latter can be achieved only with the help of automated tools able to discriminate texts for readers of various reading literacy levels. “A computational approach to distinguishing texts offers researchers and educators a number of exciting avenues of interest” [1]. It is especially true about distinguishing abstractness/concreteness ratings of different texts which may serve as good predictors of text complexity [see 2, 3]. However, an automated tool able to compute texts abstractness and correlate it with text complexity has recently been a research niche. In this article we present the study aimed at validating an innovative automated tool RusAC designed and developed to assess a number of linguistic metrics Copyright ©2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 76 Computational Linguistics of Russian texts. The study was organized into three major parts: (1) design and development of a an automated tool (tagging program) that identifies abstract words in the texts; (2) validation of the tool through a computerized abstractness analysis based on the Russian Dictionary of abstractness/ concreteness and the tagging program. Until now, in the absence of a dictionary of abstract / concrete words, quantitative studies of Russian texts complexity including assessment of abstractness of words have been either limited or impossible. In our previous work we presented the first version of a computer-generated Russian dictionary of concrete/abstract words (RDCA) [33]. The present study is the first research in which the authors apply the Dictionary to assess the complexity of texts. We also view a battery of school textbooks of a particular subject as a good Corpus, since the complexity of textbooks is expected to grow from class to class. The hypothesis of the current study is the following: if the number of abstract words grows from class to class, then the number of abstract words as a metric can be used in assessments of complexity of other text thus extending the sphere of applying RDCA. 1 Literature review 1.1 Psycholinguistic approach to concrete / abstract words The notion of abstractness/ concreteness (hereinafter A / C) has been a focus of numerous studies [see 4] as the problem of discriminating concrete and abstract words is considered relevant in linguistics, psychology, education, etc. In the modern paradigm, the discrimination of abstractness / concreteness rests on the idea that concrete words denote referents experienced, primarily, through senses, whereas referents nominated with abstract words refer to ideas or concepts [5]. Psycholinguistic studies suggest a number of differences in processing concrete and abstract words [6, 7, 8, 9, 10]. Perception and acquisition of abstract words is hindered by lack of ‘word to world’ mapping, i.e. when comprehending an abstract concept a person may fail to make correspondences to real word phenomena (c.f. learning words ‘a car’ and ‘good’) [11, 12]. The latter argument was also supported in the study of specifics of acquisition and processing of abstract / concrete words by school children [9, 13]. The research shows that children take longer to acquire abstract words as compared to concrete ones even when it comes to high frequency words [14]. Due to this fact, P. Schwanenflugel infers that abstract words are harder for children to understand [14]. Moreover, when tested in a variety of lexical tasks, abstract words are found to exhibit slower reaction time and less accurate responses [15, 16, 17]. Similar conclusions are found in V. Marian’s (2009) studies who claims that concreteness is found to be a property facilitating words acquisition as concrete words are recognised and processed more rapidly [18]. Psycholinguistic experiments also indicate ‘that 75% of the words most frequently produced by school-aged children (6–12 years of age) are concrete and it is not until adolescence that children master the majority of abstract words used by adults’ [13]. International Conference "Internet and Modern Society" (IMS-2020). CEUR Proceedings 77 1.2 The rating of abstract/concrete words as a text complexity parameter Abstractness as a text complexity feature has been confirmed by a number of researchers viewing it as a text-related variable contributing to the difficulty of reading comprehension [19]. D. Fisher suggests that the fewer concrete words there are in a text, the higher is the text complexity [20]. While including abstractness into a list of features influencing text complexity, Petrie (1992) argues that ‘the degree of abstraction (abstractness) is difficult to determine’ [see 21]. Sadoski et. al. (2000) studied concreteness as a text feature that engaged readers' comprehension, interest, and learning in four text types: persuasion, exposition (Science and Maths), literary stories, and narratives (History and Social Studies). In the experimental study, 80 under-graduates read either three concrete or three abstract texts, further wrote an exposition and rated them for familiarity, concreteness, interestingness, and comprehensibility using 7-point bipolar scales. As a result the authors claim that concreteness was ‘overwhelmingly the best predictor of overall comprehensibility, interest, and recall’ [22]. In applied linguistics, the number of concrete / abstract words in texts is validated to strongly correlate with texts complexity as texts about abstract notions are more difficult to comprehend than texts about concrete notions. The correlation between abstractness and text complexity has been also demonstrated in the research of Russian scholars who conducted the study on separate academic texts [2, 3]. Presenting the results of his study of abstractness of over 20 Russian text-books on biology, geography, physics and chemistry, R. Mayer ranks them based on their complexity [23]. 1.3 Methods and tools measuring the degree of word abstractness / concreteness Many worldwide research aimed at rating words as concrete or abstract involve native speakers who are asked to use a numerical scale as an effective instrument to measure A / C [24, 22, 5, 25]. A well known dictionary of English words registering A / C ratings of 4000 English words, used in the MRC Psycholinguistic Datase, was compiled based on a 7-point bipolar scale [24]. The respondents participating in the study tagged each word with an A / C rating from seven (the highest) to one (the lowest). In such a way every word received a rating from 100 to 700. This dictionary is still used in much research on the English language and in cross-linguistic studies [26, 5, 27, 28, 29]. In another study aimed at defining the A / C ratings of 60,099 English words and 2,940 two word expressions (such as “zebra crossing” and “zoom in”) Brysbaert et. al (2014) asked respondents to assess the abstractness/concreteness the meaning of each word is by using a 5-point rating scale designed from abstract to concrete [5]. Using the A/C numerical scale, Wang et. al. (2018) computed the degree of abstractness of Chinese words from the context-sensitive model of word embedding in rich contextual information. Word vectors for word distribution study were trained on Reader Corpus (Chinese Corpus). The authors ‘built paradigms of A/C words’ in two steps: (1) respondents’ evaluation of 200 Chinese words as concrete or abstract using ‘– 1 / 0 / 1’ scale, with ‘– 1’ being the most concrete, ‘1’ – the most abstract. (2) Extending obtained results by classification algorithm based on the corpus [25]. 78 Computational Linguistics A similar online study was pursued for the Russian language in which respondents were asked to evaluate the C / A ratings of 500 most frequent Russian nouns on a 5- point scale. The C /A ratings of each Russian word were computed as an average of all the assessments received in the range from 1 to 5. As the Dictionary data [24] and our estimates were computed based on different scales we also processed our estimates with the following formula: f (x) = 100 * (1.5 * ((6-x) - 1)) +1), where x is the value obtained in our survey. After this conversion, the index values range between 100 (the most abstract words) to 700 (the most concrete). The findings, i.e. lists of words tagged with ratings of abstractness/concreteness are uploaded at https://kpfu.ru/tehnologiya-sozdaniya-semanticheskih-elektronnyh.html and a fragment of the intra-language comparative analysis of the ratings (based on the abovementioned scale) is presented in Figure 1 below [30]. Fig. 1. A / C ratings of Russian nouns Researchers designed and developed a number of text complexity software able to match texts with lists of abstract words [31, 32]. E.g., Coh-Metrix provides the average A / C ratings for content words in a text thus offering. However, replicating large scale studies aimed at assessing the level of A / C for the Russian language was lately a challenge as there was no automated tool defining rank of abstractness of Russian words. In our latest study we identified it, designed and compiled the Russian Dictionary of abstractness/ concreteness [see 33]. 1.4 Russian Dictionary of Abstractness/concreteness Creating a large dictionary of abstractness by computing interviewees’ assessments is time and energy consuming. Therefore, the dictionary was compiled automatically based on a large corpus of texts, i.e. the Google Books Ngram package (https://books.google.com/ngrams). The fundamental ideas of the dictionary are as follows: (A) Abstract words are more often found along with abstract words, while concrete words are used more frequently with concrete words [37]. (B) We define the core comprising a certain set of words that are obviously abstract and another set which is obviously concrete and then expand it to the size of the dictionary selecting the entries based on (A). A detailed description of the method is provided in [33]. As a result we compiled a dictionary of 88.000 words available at International Conference "Internet and Modern Society" (IMS-2020). CEUR Proceedings 79 https://kpfu.ru/tehnologiya-sozdaniya-semanticheskih-elektronnyh.html. The values of the concreteness / abstract index are in the range from -4.91 to 4.56 for nouns and from -4.01 to 5.33 for adjectives. The A / C index for verbs was not calculated in accordance with the tradition in Russian linguistics not to consider this semantic category for verbs. Fig.2 below shows a fragment of the dictionary. Fig. 2. Russian Dictionary of abstractness/ concreteness (fragment) The Dictionary provides researchers and testers with an instrument facilitating not only assessment of texts complexity but leveling and profiling texts for different categories of readers as well. 2 Analysis The current study was pursued to answer three main research questions: RQ1: How does the rating of abstractness change across the grades from elementary to high schools? RQ2: How different or similar are the ratings of abstractness of textbooks on Humanities and textbooks on Science? RQ3: How does the rating of abstractness of recalls differ from the ratings of abstractness of the original texts? To answer the research questions we used the Russian Academic corpus, the Corpus of Recalls and designed an automatic tool defining abstractness of Russian texts. 2.1 Materials and methods In this study we used the Russian Academic Corpus (RAC), a corpus of text-books used in elementary, middle and high schools of the Russian Federation [33]. As the corpus builders aim at collecting the best possible representative corpus and the list of school textbooks is non-exhaustive, RAC has been a work in progress for over four years and by now reached the size of nearly 3 mln. tokens1 (see Table 1 below). The 1 A token is viewed in the work as an instance of a sequence of characters in some particular document that are grouped together as a useful semantic unit for processing. In this article it 80 Computational Linguistics books included are published between 2006 and 2020 and the body of the Corpus is divided into two sub-corpora: Science Sub-corpus (628920 tokens) and Humanities Sub-corpus (2105058 tokens). Both sub-corpora comprise textbooks specified in the “Federal List of Textbooks Recommended by the Ministry of Education and Science of the Russian Federation to Use in Secondary and High Schools”. The choice of these particular textbooks was caused by a number of reasons: (a) the fact that the texts under study use minimum of non alphabetical symbols, graphs, figures etc., (b) the availability on the textbooks on the Internet (School textbooks and manuals, 2017). The detailed information on the size of the corpus is presented in Table 1 (below). Table 1. The Size of Russian Academic Corpus Grade Tokens Science Humanities TOTAL 1st 21304 4757 26061 2nd 29284 28235 57519 3d 53565 - 53565 4-th 51489 24621 76110 5-th 102467 19527 121994 6-th - 159664 159664 7-th 75205 111788 186993 8-th - 273251 273251 9-th 88335 390821 479156 10-th 207271 656072 863343 11-th - 436322 436322 Total 628920 2105058 2733978 RAC contains 74 documents (textbooks) of all grades and disciplines and as such is considered a representative sample of the population of Russian school textbooks. 2.2 Corpus of Readers’ Recalls The Corpus of Students’ Recalls was compiled as a side result of the study aimed at evaluating the impact of cohesion on readers’ comprehension [35]. Of 289 respondents participating in the study we selected 65 with the General Knowledge index2 ranging between 13 and 16. Those were 11-12 year old native Russian speakers. The subjects were individually asked to read one of the informational texts, MT53 (modified text for the 5th Grade #3) and OT53 (original text for the 5th Grade #3), both of about 200 words with which they had no previous experience. The texts were fragments of a Chapter from a textbook on Social Science 5 by Bogolyubov N.F. [36]. The recalls of the respondents were recorded by experts refers to the total number of words in a text, corpus etc, regardless of how often they are repeated. A type is the class of all tokens containing the same character sequence. 2 Wechsler “General knowledge” Subtest for Children (WISC GK) as it is widely used to assess IQ to predict or explain school performance. International Conference "Internet and Modern Society" (IMS-2020). CEUR Proceedings 81 and assessed holistically on its relevance to the task and statistically: we computed the number of tokens and propositions in each recall. The total size of the corpus is 6473 tokens. As the Corpus is presented in 65 separate texts with the average number of words in recalls being 106.4 (MT53) and 92 (OT53) tokens we view the Corpus is representative enough. The statistics on the Corpus of Readers’ Recalls and selected samples of recalls are uploaded at the site Technologies of electronic dictionaries’ compilation, at https://kpfu.ru/tehnologiya- sozdaniya-semanticheskih-elektronnyh. html, last accessed 2020/17/05. 2.3 RusAC as the automatic tool defining abstractness of Russian texts Text preprocessing (tokenization, etc.) is carried out with Russian TreeTagger (http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/). RusAC processes texts for abstractness/concreteness and readability, which together allow the tool to estimate which of the texts processed is more difficult for comprehension. Fig. 3. The RusAC text input box RusAC ensures the following functions: 1) automatic assessment of text complexity based on two descriptive parameters, i.e. length of words and length of sentences; text complexity is calculated based on the formula proposed in [38]; 2) assigning words in a text with an A/C rating from the dictionary; 3) saving the results of the analysis. RusAC performs the text analysis of texts saved as doc, txt, rtf files. 82 Computational Linguistics Fig. 4. The RusAC text output data 3 Results 3.1 Abstractness of school textbooks In this study we performed a systemic study of abstractness of all the text-books in RAC grouped into the following sets: Primary school textbooks (30, grades 1-4), Middle school textbooks (19, grades 5- 8), High school textbooks (25, grades 9-11). The complete set of textbooks for secondary and high schools comprises 21 books on Humanities and 11 books on Science (Biology). The procedure for computing the mean index of concreteness is as follows: (1) we search the texts for the tokens registered in Russian Dictionary of Abstractness; (2) tag each token with the corresponding index from the Dictionary; (3) compute the average either for the book or for a set of textbooks, (a) in the first case the sum of the indices is divided by the number of tagged tokens in the book and (b) for the set of books the sum of the indices is divided by the total number of tagged tokens in those books. Table 2. The A / C ratings in textbooks Subject Number of textbooks Grade Mean abstractness index All Primary school 30 1-4 +0,34 Biology 7 5-7 +0,49 Biology 5 9-10 +0,15 History 7 10-11 0 Social Studies 7 5-8 -0,11 Social Studies 7 9-11 -0,15 Literature 5 6-8 +0,08 Literature 6 9-11 -0,14 International Conference "Internet and Modern Society" (IMS-2020). CEUR Proceedings 83 The mean abstractness index (see column 4, Table 2) indicates the following: a) the highest index of concreteness is demonstrated by texts in Biology and primary school books: the concreteness of Biology textbooks for middle school is the highest with +0,49 which is even higher than that of primary school texts which is +0,34 abstractness; b) The index of Social studies textbooks marks the highest level of abstractness of those texts; c) History books are located in the middle of the scale with the “0” score probably due to the fact of an equal incidence of concrete and abstract words. It can be explained by the fact that a pattern of History texts contains descriptions of some artefacts and narration of events which bear a high degree of concreteness. In general, there is a statistically significant (p-value <0.001) dependence of the abstract index on the Grade level both for the entire collection of textbooks and separately for subcollections of Biology and Literature textbooks. In Social Sciences and History textbooks regularity is not significant. 3.2 Abstractness/Concreteness of Readers’ recalls The texts offered to the participants of the study for recalls, OT53 and MT53, bear similar average indices of A / C (see Tables 3). It was computed in the same way as the index of textbooks: all the words in the texts registered in Russian Dictionary of Abstractness received a corresponding tag with a rating, all the total sum of the ratings was divided into the number of tagged tokens in the text. Table 3. OT53 and MT53 Data Code Word Count Abst_index MT53 Text 222 0,12 OT53 Text 210 0,17 Table 4. OT53 and MT53 Recall data analysis (fragment) Recall Code Word Count Abst_index К5Р09 31 0.72 К5Р10 38 0,44 К5Р13 127 0,26 К5Р14 109 0,92 К5Р21 172 0,02 ... 61. КС503 81 0,06 62. КС506 46 0,29 63. КС507 91 0,39 64. КС508 129 0,07 65. КС510 63 0,28 MEAN 0,28 The same procedure was implemented for every recall. The results are presented in Table 4 and for the complete data visit the website of Technologies of electronic dictionaries’ compilation, at [34]. 84 Computational Linguistics As the table above demonstrates the average index of A/C for recalls is more than that of the source texts which confirms that respondents tend to omit more abstract words and keep the concrete ones in their recalls. As expected, 5th Grade students’ recalls are simpler in terms of traditional metrics and the A / C index. The comparison of A / C indexes of the recalls and the source texts based on Student criterion confirms the hypothesis that the difference is statistically significant as the p-value equals 0.0003. Conclusion Abstract words as carriers of the notion of abstractness present a special interest for linguistics, psychology and pedagogy. In Natural Language Processing studies the problem is narrowed to designing and developing tools able to tag words in a text with the corresponding ratings of abstractness/concreteness. The tool evaluating the level of abstractness of Russian texts was a research and an engineering niche. The authors of the article created an automated tool, RusAC, performing computation of the index of concreteness/abstractness. The functions of RusAC are supported by the Russian Dictionary of Concrete and Abstract words with its total size of 88000 tokens compiled in our previous study. Implementation of RusAC on two representative corpora, i.e. Russian Academic Corpus and Corpus of Readers’ Recalls, verified the hypothesis that the incidence of abstract words in a text impacts its complexity as they are taking longer to be processed by readers. School textbooks were selected to test the proposed approach, since they are graded by levels of complexity from elementary to advanced. Collections of school textbooks are used in studies of various techniques for assessing text complexity in different languages in a number of works [39–43]. One of the most important issues is to select a battery of classroom books of the same author. This eliminates the influence of the author’s style, concept or pedagogical attitudes on the texts of textbooks of different classes and allows to analyze textbooks of the same author for different grades focusing only in complexity. The study also confirmed the highest index of concreteness of Science books and primary school books. The Humanities textbooks demonstrate the highest level of abstractness. The index of abstractness grows across grades one through 11. The findings are consistent with the earlier published hypothesis on the impact of abstract terms on text complexity and validate the designed tool. RusAC is freely available for all categories of users. Currently, the index of abstractness is typically interpreted as a separate parameter calculated for texts but not included in the existing formulas of text complexity. In this way the index of abstractness compares various texts in this aspect without marking the level of text complexity. The perspective of the study is viewed as extending the number of entries in RDCA and improving its quality. In the next stage of research, we plan to pursue a survey and text recall experiments with students of Grades 9-10 (15-17 years old), thus expanding the database and providing foundation to compare the level of abstractness of texts generated by schoolchildren of different grades. International Conference "Internet and Modern Society" (IMS-2020). CEUR Proceedings 85 Acknowledgements. The research on the Russian Dictionary of Concrete and Abstract words was supported by the Russian Fund of Basic Research, Grant 19-07-00807. The Survey and Analysis of the present study was supported by the Russian Science Foundation, Grant 18-18- 00436. The authors also express sincere gratitude to Dr. Artem Zaikin for his assistance in processing the statistical data. References 1. McCarthy, P. M., Lewis, G. A., Dufty, D. F., McNamara, D. S.: Analyzing writing styles with Coh-Metrix. In Proceedings of the Florida Artificial Intelligence Research Society International Conference. Menlo Park, CA: AAAI Press, 764 – 769 (2006). 2. Mikk, Ya. A.: Optimization of educational text complexity: for authors and editors [Optimizatsiya slozhnosti uchebnogo teksta: v pomosch avtoram i redaktoram]. Prosveschenie (1981). 3. Krioni N.K., Nikin A. D., Fillipova A.V.: Automated system of academic texts complexity analysis [Avtomatizirovannaya sistema analiza slozhnosti ucebnyh tekstov]. Bulletin of Ufa State Aviation Technical University, Volume 11, 1 (28), 101 – 107 (2008). 4. Reuter K., Werning, M., Kuchinke, L., Cosentino, E.: Reading words hurts: the impact of pain sensitivity on people’s ratings of pain-related words. Language and Cognition, 9 (3), 553 – 567 (2017). 5. Brysbaert, M., Warriner, A. B., Kuperman, V.: Concreteness ratings for 40 thousand generally known English word lemmas. Behavior research methods, 46(3), 904 – 911 (2014). 6. Crutch, S. J., & Ridgway, G. R.: On the semantic elements of abstract words. Cortex, 48(10), 1376– 1378 (2012). 7. Kousta, S.-T., Vigliocco, G., Vinson, D. P., Andrews, M., & Del Campo, E.: The representation of abstract words: Why emotion matters. Journal of Experimental Psychology: General, 140(1), 14–34 (2011). 8. Paivio, A.: Mind and its evolution: A dual coding theoretical approach. Mahwah, NJ: Erlbaum (2007). 9. Schwanenflugel, P. J.: Why are abstract concepts hard to understand? The psychology of word meanings, 223–250 (1991). 10. Borghi, A., Binkofski, F., Castelfranchi, C., Cimatti, F., Scorolli, C., & Tummolini, L.: The challenge of abstract concepts. Psychological Bullettin, 143, 263–292 (2017). 11. Gleitman, L. R., Cassidy, K., Nappa, R., Papafragou, A., Trueswell, J. C.: Hard words. Language Learning and Development, 1(1), 23–64 (2005). 12. Yu, C., Smith, L.: Rapid word learning under uncertainty via cross-situational statistics. Psychological Science, 18(5), 414–420 (2007). 13. Vigliocco, G., Ponari, M., Norbury, C.: Learning and processing abstract words and concepts: Insights from typical and atypical development. Topics in cognitive science 10(3), 533 – 549 (2018). 14. Schwanenflugel, P. J. (Ed.). The psychology of word meanings. Psychology Press (2013). 15. Paivio, A.: Dual coding theory: Retrospect and current status. Canadian Journal of Psychology, 45(3), 255 – 287 (1991). 16. Nickels, L., Howard, D.: Aphasic naming: What matters? Neuropsychologia, 33(10), 1281 – 1303 (1995). 86 Computational Linguistics 17. Barry, C., Gerhand, S.: Both concreteness and age-of-acquisition affect reading accuracy but only concreteness affects comprehension in a deep dyslexic patient. Brain and Language, 84, 84 - 104 (2003). 18. Marian, V.: Language interaction as a window into bilingual cognitive architecture. Multidisciplinary approaches to code switching, 161 – 185 (2009). 19. Taylor, L., Weir, C. J.: IELTS collected papers 2: Research in reading and listening assessment (Vol. 2). Cambridge University Press (2012). 20. Fisher, D., Frey, N., Lapp, D.: Text complexity: Stretching readers with texts and tasks. Corwin Press. (2016). 21. Holleman, B. The forbid/allow asymmetry: On the cognitive mechanisms underlying wording effects in surveys (Vol. 16). Rodopi. (2000). 22. Sadoski, M., Goetz, E. T., Rodriguez M.: Engaging texts: Effects of concreteness on comprehensibility, interest, and recall in four text types. Journal of Educational Psychology 92.1, 85 (2000). 23. Mayer, R. V.: Assessment of the Level of Abstractness of Material Statement of in Natural Sciences School Textbooks. Standards and Monitoring in Education. 1, 58 – 63 (2017). 24. Coltheart, M.: The MRC Psycholinguistic Database, Quarterly Journal of Experimental Psychology, 33A, 497 – 505 (1981) 25. Wang, X., Su, C., Chen, Y.: A Method of Abstractness Ratings for Chinese Concepts. In UK Workshop on Computational Intelligence, 217 – 226 (2018). 26. Crossley, S., Salsbury, T., McNamara, D. S.: Validating lexical measures using human scores of lexical proficiency. Vocabulary knowledge: Human ratings and automated measures, Amsterdam: John Benjamins, 105 – 134 (2013). 27. Dellantonio, S., Mulatti, C., Pastore, L., Job, R.: Measuring inconsistencies can lead you forward. the case of imageability and concreteness ratings. Language Sciences, 5, 708. (2014). 28. Troche, J., Crutch, S., Reilly, J.: Clustering, hierarchical organization, and the topography of abstract and concrete nouns. Frontiers in psychology, 5, 360 (2014). 29. Pastore, L., Dellantonio, S., Mulatti, C., Job, R.: On the nature and composition of abstract (theoretical) concepts: the X-ception theory and methods for its assessment. In Philosophy and Cognitive Science II, 35 – 58 (2015). 30. Solovyev, V., Andreeva, M., Solnyshkina, M., Zamaletdinov, R., Danilov A., and Gaynutdinova, D.: Computing Concreteness Ratings of Russian and English Most Frequent Words: Contrastive Approach. 2019 12th International Conference on Developments in eSystems Engineering (DeSE), Kazan, Russia, 403-408 (2019). 31. Pitler, E., Nenkova, A. Revisiting readability: a unified framework for predicting text quality. In Kentaro Inui, Jing Jiang, Vincent Ng, Xiaojun Wan (eds). Conference on empirical methods in natural language processing (EMNLP ’08). Association for Computational Linguistics, Stroudsburg, PA, USA, 186–195 (2008). 32. Laposhina, A.: Relevant features selection for the automatic text complexity measurement for Russian as a foreign language. [Analiz relevantnyh priznakov dlya avtomaticheskogo opredeleniya slozhnosti russkogo teksta kak inostrannogo] In V.P. Selegey (eds). Computational linguistics and intellectual technologies: papers from the annual international conference ‘Dialogue’, Issue 17, 1–7 (2017). 33. Solovyev V. D., Ivanov V. V., Akhtiamov R. B.: Dictionary of Abstract and Concrete Words of the Russian Language: A Methodology for Creation and Application. Journal of Research in Applied Linguistics. vol. 10, 215 -227 (2019). 34. Technologies of electronic dictionaries’ compilation, https://kpfu.ru/tehnologiya- sozdaniya-semanticheskih-elektronnyh.html, last accessed 2020/19/02. International Conference "Internet and Modern Society" (IMS-2020). CEUR Proceedings 87 35. McCarthy, K.S., McNamara, D.S., Solnyshkina, M.I., Tarasova, F.Kh., Kupriyanov, R.V.: The Russian language test: towards assessing text comprehension. Vestnik Volgogradskogo Gosudarstvennogo Universiteta. Serii︠a︡ 2, Iazykoznanie; Volgograd, 18 (4), 231 – 247 (2019). 36. Bogolyubov N.F.: Social Studies Grade 5 [Obschestvoznanie 5 klass]. A textbook for secondary schools. 3rd Edition. Prosveschenie, 127 (2013). 37. Frassinelli, D., Schulte im Walde, S.: Distributional interaction of concreteness and abstractness in verb-noun subcategorisation. Proceedings of the 13th International Conference on Computational Semantics - Short Papers. Association for Computational Linguistics, 38-43 (2019). 38. Solovyev, V., Ivanov, V., Solnyshkina, M.: Assessment of reading difficulty levels in Russian academic texts: Approaches and metrics, Journal of Intelligent & Fuzzy Systems. 34(5), 3049–3058 (2018). 39. Al-Tamimi, A.K., et al.: AARI: Automatic Arabic readability index. International Arab Journal of Information Technology. 11(4), 370-378 (2014). 40. Chen, Y.-T., Chen, Y.-H., and Cheng, Y.-C.: Assessing Chinese Readability using Term Frequency and Lexical Chain. IJCLCLP. 18(2), 1-18 (2013). 41. Chen, Y.-H. and Daowadung, P.: Assessing readability of Thai text using support vector machines. Maejo International Journal of Science and Technology 09(3), 355-369 (2015). 42. Si, I. and Callan, J.: A statistical model for scientific readability. In CIKM, 574–576 (2001). 43. Tanaka-Ishii, K., Tezuka, S. and Terada, H.: Sorting Texts by Readability. Comput. Linguist. 36(2), 203-227 (2010).