Content analyses for the Psychological Purposes: Requirements to Software Supporting Tools N. Almayev Institute of Psychology, Russian academy of sciences, Moscow, Russia e-mail: almaev@mail.ru Abstract. This paper deals with the problem of adequacy of software tools for content analyses within the scope of psychology. Understanding of text by hu- mans and processing of it by the software is considered. Existing practices are criticized and the conditions under which software tools may support human analyses are outlined. Two types of the psychological content analyses tasks are described with the summarization of requirements for both of them. Keywords. Key words: words, judgments, parser, dictionary, human expert, computer learning, knowledge base 1 Introduction Attempts of texts analyses with the regard to author’s personality are older than ques- tionnaires and even task based psychological tests. According to W. Stern, Alfred Binet before the development of the first IQ test was engaged in the study of the ways of how the writer’s personality is expressed in the texts of his origin. Andrey Beliy was adopting content analyses tools for investigation of Pushkin’s, Baratisnkiy’s and Tutschev’s projective attitudes in their descriptions of nature as early as 1916. In 1930-50s first after the works of Murray & Morgan [8] and then after these of McClelland, et al. [6] content analyses of projective stories was the leading paradigm in psychology and even now it holds strong positions in the domain of psychotherapy. Content analyses of projective texts, interviews and discussions have number of advantages in comparison to psychological questionnaires. First of this advantages is the freedom of choice for the topics and words for their expression – the subjects tell us what is relevant for them and not choose between the variants of what is interesting for us but may have no relation to the needs of a subject. The second advantage is that projective stories possess permanent value, i.e. they may be used indefinite number of times with ever new and corrected content analyses scales, while answers in question- naires are bond to respected questions. The third advantage is the amount of personal information that may be acquired in addition to one that is gathered through the gen- eralized scales of content analyses. Generalized scales provide means for comparisons of groups [1], [2] while additional information may serve for deeper personality anal- yses of a subject. Finally, concurrent use of questionnaires and content analyses of ___________________________ Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). In: P. Sosnin, V. Maklaev, E. Sosnina (eds.): Proceedings of the IS-2019 Conference, Ulyanovsk, Russia, 24-27 September 2019, published at http://ceur-ws.org 250 projective stories provide complementary results those reveal both actual needs of the subjects and specific features of ego-concept. Nevertheless, content analyses possess one major disadvantage in comparison to the questionnaires based tests: it is the amount of labor that is required for its’ fulfill- ment. In order to work properly human coder has to deal with one – two scales at once [11] this mean that each text has to be read several times depending on the amount of the scales! Even in the case that all of the scales are coded simultaneously (that inevitably will cause the growth of the number of mistakes) human processing of big corpora of texts takes much effort and time. The need to overcome this main shortage of content analyses resulted in numerous attempts of computerization of its routines those may be traced to 1970 s [6] and even to 1960s, see [4]. 2 Contemporary state of computer based content analyses Surprisingly, but since the first attempts and till now, see [5], [9] the main paradigm of content-analyses computerization has not changed significantly. It is still based on the counting of single words in a text those are believed to relate to certain domains of meaning (categories). As the target words are aggregated in dictionaries this paradigm is named “dictionary based” [4]. Contemporary tools [9] permit finding of colloca- tions of words as well as grammatical parsing, in the sense that target words may be used in the different grammatical forms; nevertheless it doesn’t change the essence of the paradigm. Despite some positive feedback about the usage of dictionary based software tools in psychological studies [10], generally psychologists are not confident in such tools. Sometimes they can work but miss too much as well as produce false alarms too of- ten. Humans can express one and the same meaning using different words, and de- pending on which word is connected to a target word the meaning of the whole changes. For example, such words as “try”, “persistent” should refer to the Achieve- ment motive. But what if they are joined with “try to avoid”, “or persistently tried not to meet with somebody”? These expressions correspond to avoidant behavior while basing on the target words will result in counting them for the Achievement motive. Psychological meaning of one and the same expression may change even dependently on agens or patiens role of the teller. For example “I have beat him” is aggression, but “I was beaten by him” is suffering from aggression, they should be not confused. Such considerations may raise skepticism towards the whole endeavor of computer assessed content-analyses in psychological purposes. What kind of knowledge base it must have in its foundation in order to be relevant and adequate? How this knowledge base should be extracted from the experts? What is the appropriate form of storage of this knowledge? And the most important one – when such software can acquire prac- tical usefulness? In order to transform this skepticism into the valuable source of making software tools more adequate and useful reconsideration of the basics of the meaning phenom- 251 enon is required and for this we have to return to the issue, the origin of logic, rhetoric and text analyses. 3 The decisive role of judgment “No one of these terms [categories, single words], in and by itself, involves an affir- mation; it is by the combination of such terms that positive or negative statements arise. For every assertion must, as is admitted, be either true or false, whereas expres- sions which are not in any way composite such as 'man', 'white', 'runs', 'wins', cannot be either true or false. (Aristotle “Categories”, Part 4. Translated by E. M. Edghill) Thus, in the very beginning of his “Organon” Aristotle teaches us (to put it straight) that isolated words possess no actual meaning. They acquire it only within the judgment i.e. proper junction of words, determined by the rules of grammar. Not “collocations” but judgments should be the subject of content analyses. Moreover, target judgments may be not only of the simple “S is P” type but rather complicated predicative structures possibly of the several levels with some obligate and some op- tional members. Practically it means that the collection of words in a text first should be transformed into the collection of judgments and then within the latter collection the search for target expressions has to be performed. Taking into consideration that automatic parsers for different natural languages exist already for decades this crucial step is not something that lays beyond the scope of contemporary technologies. 4 The problem of context and the necessity of constant knowledge acquisition for content analyses software Grammatical parsers may solve the first stage of the meaning problem – find proper grammatical junctions, but due to the polysemy they cannot solve the second part of the meaning problem – that of the denotation i.e. relation of a judgment to the objects of some consistent reality. For example, “naked conductor” is about a human or about a wire? Humans in solution of such problems base on the very complex ontological knowledge whereas one and the same words and even more or less similar expres- sions can suite for different domains. In content analyses for the purposes of psychol- ogy it is impossible to build such complicated ontology in advance. It may be formed only gradually on the basis of computer learning of the software in constant interac- tion with the human experts. The need for such constant interaction is also deter- mined by the following obstacles: 1) not all of the expressions can be monosemantically resolved basing solely on the grammar rules, 2) not all of the ex- pressions are input according to the grammar rules by humans or speech recognition software. The third and the most decisive reason for constant learning of such a sys- tem is that humans produce ever new ways of speaking about the same matters, thus categories of content should always be updated. 252 It also has to be stated, that such systems must never acquire full autonomy. Partic- ularly, the possible use of content analyses in forensic procedures requires special responsibility. Basing on the previous experience software supporting tools should find and propose the most probable variants of certain expressions categorizations but the final decision has to be left for the human expert, or at least humans must have full access to the any step of analyses and categorization. Therefore, the software under discussion cannot be realized simply on the basis of “neural” networks those somehow “learn” by changes of the weights of their elements through the positive or negative feedback and those are characterized with the very problematic extraction of the probabilities from their respected elements. The proba- bilities of judgments relations to categories must be clear for human experts, and it must be clear for them basing on what kind of the previous judgments such relations were calculated. It means that the system must be able to form concordance for each expression or may be even for each member of judgments, be it in the role of a sub- ject, or a predicate, or a predicate to a predicate, etc.. 5 The two main tasks of psychological content analyses and further specification of requirements to software for both of them Historically first task of content analyses for the psychological purposes was study of motivation, psychological states and affects via projective stories [8], [6], [7], [3]. State of affairs in this field is although being characterized by the vigorous endeavors of dictionary based approach in the past, see [6] contemporary due to the matters de- scribed above is almost totally within the usage of the human coding. Successful transition from the single words (dictionary based) paradigm to the judgment based paradigm in the field of projective texts analyses requires considera- tion of the following matters: 1. Scales for software based content-analyses should be psychometrically proved, their validity and sensitivity must be established, their coding instruction and cod- ing practice must be elaborated. It is hardly possible to develop simultaneously the psychological part and the automatization part of the project, both are under the risk of failure in this case. 2. Content analyses scales must form the system, i.e. balance each other, be not iso- lated; otherwise there always will be the threat of expansion of the isolated scales to the neighboring domain. Generally, psychological part must be developed previously to the software part development starts. The other direction of content analyses application is the study of discussions or “discourse” analyses. French word “discourse” means nothing else but “reasoning”. On the one hand, it means that there is hardly much new in this field since the eight books of Aristotle’s “Topics”, but on the other hand, development of Internet caused exteriorization of numerous discussions on various important social issues. Processing 253 of them entirely by the means of human experts is evidently impossible. At the first glance transition to the judgment based content analyses may be considered as the radical step to the adequate understanding of the matter, indeed, reasoning consists of rhetorical syllogisms (“enthymemes”) and the latter consist of the judgments (propo- sitions). Nevertheless, this field possesses many features those are specific for it both in comparison to the projective stories content analyses and in comparison to the other fields of reasoning, e.g. scientific. Aristotle comparing rhetorical syllogism to the “apodictic” one says that the major premise of enthymeme is not something necessary and universal but something in which the majority of concrete audience or the most prominent of them (judges, rul- ers) will believe. This “majority” implies the study of amount of those who can share respected views and in the case of the Internet discussions – activity of the users in propagating them. Therefore, unlike previous task, internet discussions content anal- yses demand registration of multiuser activity. The following variables should at least be registered: 1) unique names (nicknames) of the participants, 2) total number of the participants, 3) number of posts by each of the participants, 4) amount of the text from each of the participant in 4a) absolute terms, and 4b) relative amount of text by each participant, 5) nature of the message i.e. 5a) topical content – “reasoning”, 5b) link to some content, 5c) pictorial (including the moving pictures of various forms) content. Unlike the case of projective stories the expert’s work in discourse analyses does not presuppose fixed content categories. Experts are supposed to concentrate on the polysemy tasks resolution. Do participants of discussions speak about one and the same objects or the objects are different? Are the characteristics of the objects are generally the same or differ? The experts work appears to be even more complicated if taken into consideration that major premises in enthymemes are often truncated (and maybe not even reflected by the participants of discussions). Instead of the con- tent categories the judgments within Internet discussions may be accumulated to some major propositions (more or less equal statements, possibly with some gradual differ- ences) and the popularity of such should be studied alongside with the activity of different users in propagation of them. It also has to be noted that grammatical sub- jects and predicates should be differentiated from the ontological ones. For example, in old and widespread expression “capitalist pig” grammatical S is “pig”, while “capi- talist” is the P. Ontologically of course relation is reversed, it is the affectively loaded characteristic of the certain class of objects in the social world. Due to the high risks of subjectivity and biases in expert’s evaluation the whole process of concrete judgments aggregation into the major propositions should be clear and transparent for the human experts. 6 Conclusion Despite the differences between the two main tasks of the psychological content anal- yses they have principal common feature – necessity of transition from the dictionary 254 based paradigm to the judgment based one. Technically this transition whereas not simple is possible. 7 Financing The study was supported by the Russian Foundation for Basic Research, project № 18-00-00605 00606 (18-00-00605). 8 References: 1. Almayev N.A., Murasheva O.V., Bessonova Yu.V., Kiselyova N.I. Content-analyses scales of the social motivation test. Results of correlation and factor analyses. Part 2. Eksperimental'naâ psihologiâ [Experimental Psychology (Russia)], 2018. Vol. 11, no. 3, pp. 108–119. doi:10.17759/exppsy.2018110308. (In Russ., аbstr. in Engl.) 2. Almayev N.A., Murasheva O.V., Bessonova Yu.V., Kiselyova N.I. Generalized scales of content analysis of projective narratives in test of social motivation (TSM). Their validity and specifics. Part 1. Eksperimental'naâ psihologiâ [Experimental Psychology (Russia)], 2016. Vol. 9, no. 4, pp. 90–104. doi:10.17759/exppsy.2016090409. (In Russ., аbstr. in Engl.) 3. Gottschalk L.A., Gleser G.C. The Measurement of Psychological States through the con- tent analysis of Verbal Behavior. Berkley: University of California Press, 1969 4. Hogenraad R., McKenzie D.P., Peladeau N. Force and influence in Content Analysis: The Production of new social Knowledge // Quality and Quantity.2003. 37(1) 5. Litvinova T., Litvinova O., Panicheva P., Biryukova E. (2018) Using Corpus Linguistics Tools to Analyze a Russian-Language Islamic Extremist Forum. In Bodrunova S. (eds) In- ternet Science. INSCI 2018. Lecture Notes in Computer Science, vol 11193. Springer, Cham 6. McClelland D.C. Power the inner experience. Irvington Publishers. NY, 1975 7. McClelland D.C., Atkinson J.W., Clark R.A., Lowell E.L. The achievement Motive. Princeton, NJ: Van Nostrand, 1953 8. Murray H.A. (Ed.) Explorations in Personality. NY: Oxford University Press, 1992. 9. Scott, M. Oxford WordSmith Tools Version 4.0. Available at: http://www.lexically.net/downloads/version4/wordsmith.pdfl (2007) 10. Vanheule S., Desmet M., Meganck R. What the heart thinks the tongue speaks: a study on depression and lexical choice // Psychological Reports. 2009.104. P473-481 11. Winter D. The Power motive. NY: Free Press, 1973