1. Introduction

Capturing Human Perspectives in NLP: Questionnaires, Annotations, and Biases⋆

Wiktoria Mieleszczenko-Kowszewicz

Kamil Kanclerz

Julita Bielaniewicz

Marcin Oleksy

Marcin Gruza

Stanisław Woźniak

Ewa Dzięcioł

Przemysław Kazienko

Jan Kocoń

0 0 Department of Artificial Intelligence, Wrocław University of Science and Technology

This article compiles research on the extraction of human characteristics using three diferent methods: questionnaires, annota- tions, and biases. We have performed an analysis of how personalized perception of texts is afected by individual human profile and bias. To acquire comprehensive knowledge about individual user prefer- ences, we have gathered 40 users who annotated 1000 texts in 26 subjective tasks grouped into three categories: positive afect, neg- ative afect, and rational afect. The results revealed that categories of annotation were correlated with psychological dimensions, e.g., agreeableness and conscientiousness, which are traits related to pos- itive afect dimension biases. We have observed the presence of two clearly defined categories among annotators when it comes to the aspect of humor: those who confidently share their perspectives on what they find funny and those who tend to rate humor levels within a narrow range. Moreover, we analyzed intra-annotator agreement to show that people tend to change their ratings over time. Our results show that the higher level of the ranking correlation between anno- tations and agreement calculated using binarized annotations com- pared to the absolute agreement calculated using full annotations im- plies that the 10-point annotation scale might be a significant factor in annotator disagreement.

eol>natural language processing personalization subjectivity annotator bias annotator representation data acquisition

1. Introduction

Resolving natural language processing (NLP) tasks, such as detecting ofensiveness, humor recognition, or emotion recognition, requires the work of annotators labeling large datasets used in training models in machine learning algorithms. Although people vary between themselves on a daily basis, the final evaluation of annotated instances is a decision of the majority of the annotator called the gold standard. The assumption underlying this process is that most people will perceive texts similarly [1]. Annotations not aligned with the majority vote are not included in the final model. As a result, much information about humans is not used. Moreover, annotators’ personalities are flattened and generalized, afecting the 2. Related Work model’s accuracy. Despite existing research [2, 3], there is still a certain lack of exploration in measurement of the The research from recent years has shown that people way how individual characteristics of the text’s audience strongly vary in their perception of text depending on influence the perception of it. the characteristics they possess. This includes features

This article aims to answer the following research ques- such as cognitive skills [4], personality traits [5], or even tions: the emotions they have experienced [6]. This notice1. What is the impact of annotators’ individual char- able diversity between people is reflected in the multiple perspectives presented in the annotations. The work of Basile et al. [7] states that the perspectivist approach should be taken into account when determining the golden standard. What it implies is the need to tailor the standard to each person individually, understanding that the said ground truth is subjective. As the diferences in user reception of the same text inevitably

acteristics on their text perception? 2. How does the evaluation of texts change over time and what are the crucial factors of such an intra-annotator change of the user? 3. What are the main diferences between methods

for capturing human perspectives? 4. What is the impact of annotator sense of humor on the funny content perceived by themselves and other people? 5. What are the ranking dependencies of annotations and absolute agreement between annotators? 2nd Workshop on Perspectivist Approaches to NLP * Corresponding author. † These authors contributed equally. $ wiktoria.mieleszczenko-kowszewicz@pwr.edu.pl (W. Mieleszczenko-Kowszewicz)

© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g ACttEribUutRion W4.0oInrtekrnsahtioonpal (PCCroBYce4.0e).dings (CEUR-WS.org) become apparent, it is crucial to examine it using ap- Table 1 propriate measures [8]. A stability of user’s annotations Annotation dimensions categorized depending on the afect is an interesting take, however we have decided to fo- and rational nature. cus on the deviating from the majority. For this reason, Positive afect Negative afect we have utilized measures such as Personal Emotional (1) calm (8) anger Bias [9] and Human Bias [10]. The first metric calcu- (2) compassion (9) disgust lates the degree of user diferentiation from the average (3) delight (10) fear emotional perception of a given text, while the second (4) inspiration (11) negative metric compares the bias of an annotator and its simi- (5) joy (12) sadness larity to the majority of users. As seen over the years, ((67)) spuorspitriivsee applying these measures when performing experiments in natural language processing tasks [11, 12, 13, 14] conifrmed the efectiveness and a strong improvement in understanding the individuality of a user. Furthermore, it has been shown that compared to standard methods derived from psychology, NLP models are even better at identifying the Big Five personality traits [15]. With that in mind, we have decided to perform an assessment of results from a collection of diferent questionnaires, as well as investigate the annotations of users.

Rational (no afect) (13) agreement (14) embarrassing (15) funny to me (16) funny to someone (17) incomprehensible (18) interesting (19) ironic (20) ofensive to me (21) ofensive to someone (22) political (23) sympathy (24) trust (25) understandable (26) vulgar 3. Capturing Human Perspectives

3.1. Text Selection Procedure

To acquire comprehensive knowledge about individual user preferences, our annotation process consisted of three major steps: (1) annotation of the large collection of texts done by a small group of annotators (6 people), (2) measuring the controversy of the annotated texts with three methods, and (3) selection of texts for annotation involving a large group of users (40 people). In the first step, a small group of experienced annotators annotated a large collection of comments in Polish. They were acquired from various Internet forums regarding news, sport, and lifestyle topics. Then, we measured the controversy [12] of texts in 3 variants: (1) average controversy for all dimensions, (2) average controversy of the top ifve most controversial dimensions for the specific text, and (3) highest controversy value of all dimensions for a certain text. Finally, we separately selected 13 of the texts for annotation with each variant of the controversy. Furthermore, the texts selected by a specific variant consisted of 23 of the texts with the highest controversy and 13 of the texts with the lowest controversy measured by a specific variant. In this way, the final dataset obtained in step (3) comprised texts with diverse controversy, which enabled the extraction of various user perspectives.

3.2. Dataset

Forty annotators participated in the study, with 77.5 % of them being women and 22.5% being men. Their age ranged from 19 to 56 years ( = 39.9, = 10.1).

The dataset we used is one of the iterations of the Doccano 1.0 project, which aims to capture subjective impressions elicited by textual content. The number of annotated texts was 1000. Each of them is no longer than 132 words ( = 24.5, = 16.2). On average, each person annotated around 790 texts and each text was annotated by around 32 annotators. In its entirety, it comes out a little under 31,700 annotations. Each annotation consists of 26 independent dimensions (see Tab 1: For each dimension, the annotator chose a value from 0 to 10, where 0 means that the annotator did not react and 10 means that the reaction was strong. No decision is acceptable, indicating that the person does not know what value to give. Labels with a value of zero occur on average 62% with 22% standard deviation in each dimension. Meanwhile, empty labels occur on average 4% with 8% standard deviation. The distributions of the remaining values, which provide us with information about the actual reactions of the annotators, are shown in Fig. 1.

The dimensions are divided into three groups: positive afect, negative afect, and rational (no afect). This approach is inspired by multiple works [16, 17, 18].

3.3. Measuring Annotator Profile: Questionnaires

Big Five personality traits (Mini-IPIP) [19] is a 20 item questionnaire that measures the factors of the Big Five personality model: extraversion, agreeableness, conscientiousness, neuroticism, and intellect/imagination. Each dimension is measured by four questions, where answers are given on a 5-point scale: 1 = very inaccurate to 5 = very accurate. Agreeableness is considered a social trait that aims to maintain positive relationships with others. People who score high on this trait tend to choose the interpretation of the situation as less controversial (2) compassion (3) delight (15) funny to me (16) funny to someone (17) incomprehensible (18) interesting and choose the more constructive form of conflict reso- they had used in the past month. The subscales are: relution [20]. Extraversion is a trait that describes people laxation (dampening of autonomic arousal), engagement who are active and social, it is also widely known for its (active expression of emotions), rumination (sustained association with positive afect. Conscientiousness is a attention), reappraisal (cognitive reframing), distraction personality characteristic that describes the tendency to (diverting attention) and suppression (inhibition of emobe organized, prepared, hard working, and maintaining tional expression). a high quality of work [21, 22]. Neuroticism refers to the The Physical Health Questionnaire PHQ [27] is tendency of people to experience negative emotions such a 14-item questionnaire that evaluates four dimensions as anxiety, worry, fear, and sadness [23]. Intellect is a trait of somatic health (sleep disturbances, headaches, gastrointhat describes the willingness to seek new experiences, testinal problems and respiratory infections). Items were investigate new ideas, experience new tastes, and visit rated on a 7-point frequency scale with seven possible new places [24]. answers.

Humor Styles Questionnaire (HSQ) [25] is a 32 ele- Patient Health Questionnaire-9 PHQ-9 [28] is a ment questionnaire that evaluates four styles of humor questionnaire consisting of 9 questions about the sympapplied by a person: (1) self-enhancing, (2) afiliative , (3) toms of depression, which the user rates on a scale of 0 to aggressive, and (4) self-defeating. The two positive val- 3. ues indicate (1) the empowerment of self through the Depression is one of the most common mental disuse of humor and (2) the willingness to bond with oth- orders. The core questions of the PHQ-9 address the ers (mostly the recipients of the texts). The remaining symptoms of depression included in the DSM-IV diagnegative values refer to (3) inflicting a verbal attack on nostic criteria: the higher the score, the more severe the other people, as well as (4) themselves through the use depression. of deprecating humor. The values of each of the styles In PHQ and PHQ-9 questionnaires, the lowest scores are calculated through the use of answers to 32 questions correspond to the absence of symptoms, while the higher regarding the sense of humor of an individual, which scores proportionally represent their more frequent ocincludes 8 questions per individual style of humor. The currence. scale of answers consists of 7 possible answers from 1 = Alexithymia measured with the PAQ questiontotally disagree to 5 = totally agree. naire [29] containing 7 questions on the 7-point Lik

The regulating emotion systems in everyday life ert scale [30] ranges from 1 = strongly disagree to 7 = (RESS-EMA) scale [26] evaluates how people regulate strongly agree. It is a trait that impedes identifying own their emotions in daily life. The questionnaire consists feelings, describing them, and limits externally oriented of 12 items measuring 6 emotion regulation strategies (2 thinking style, manifesting in unintentionally ignoring items per subscale). Each item was rated on scales from others’ emotions. 0 = totally disagree to 100 = totally agree, and the respon- Perceived Stress Scale [31] measures stress with 10 dents ticked of which emotion management strategies items on 5-point scale with answers from 0 = never to 4

4. Analytical Results

We used the Human Bias HB(, ) [14] measure to capture the diversity between the preferences of the user and the others. Its value for a user within dimension is a Z-score-based measure that describes the degree of diversity of user ’s annotations ,, of all texts ∈ relative to the mean , and standard deviation , of annotations provided by all users in dimension , as follows:

4.2. Bias and Human Characteristics HB(, ) = ∈ (1) Personality described in Appendix B.1 shows the cor|| relations between the Big Five and the annotations. The results reveal that agreeableness and conscientiousness 3.5. Back Saturation are traits that are strongly related with positive afect For the purposes of the study, we introduced a measure biases. Slightly weaker tendency is observed for negative called Back Saturation (BS). It could be calculated for afect biases. Moreover, these two traits are also modereach text ( ) within a particular dimension as follows: ately correlated with each other, which strengthens the above observation. = − 1 * 3 + − 2 * 2 + − 3 + − 4 + − 5 (2) Styles of humor and subjectivity. Despite the fact that every human understands the concept of humor, where is the rating for the negative dimension. − 1 each person has their own, distinct sense of it. We can refers to the one text back, − 2 to the two texts back, etc., analyze the similarity between each person, aggregate for example, if subsequent texts received the following the annotation scores into groups, and eventually find negativity ratings: T1 - 3, T2 - 5, T3 -3, T4 - 2, T5 - 7, then the humor scores of the majority of annotators, but there the for text T6 is: is a very low chance of encountering people with identical set of scores related to humor. Even so, the same 6 = 7 * 3 + 2 * 2 + 3 + 5 + 3 = 36 (3) scores in this particular research would not imply that the annotators with equally same humor annotations 3.6. Intra-Annotator Agreement possess the exact same sense of humor. This indicates the fact that humor is a hugely subjective task, and with Inspired by the recent works [8] we randomly selected 3 this in mind, we need to take into account the perspective annotators for a very detailed analysis. Its purpose was of the individual user when assessing their results. As not only to examine the consistency of the annotations, humor in natural language processing itself is a vastly but also to try to determine the influence of various fac- personalized task, identifying and categorizing texts with tors on the change of their decisions. The annotation diferent types of humor may shed some light on the deprocess was planned in such a way that some texts ap- tails of a person’s sense of humor. The categorization peared at least twice (hereinafter ‘duplicates’). It was then derived from the Humor Styles Questionnaire in Sec. 3.3 possible to calculate the consistency of the annotations of provides a set of humor types that are widely used in the these texts made by a single annotator (hereinafter ‘intra- field of humor research, not only in the scope of natuannotator agreement’ or IntraAA). For some purposes ral language processing, but also in psychology [34, 35]. we have also introduced soft IntraAA, where annotations When acquired, the four available measures of diferent that difer by only one point (on a scale of 1-10 ) are also types of humor indicate the intensity of experiencing considered as consistent. 1.00 0.75 Po(sbiitaivse) 0.50 Funnyto 0.25 me(bias) 0.00 sFoumnenoyntoe

(bias) 0.25 Negative 0.50 (bias) 0.75 (Mbeiaasn) 1.00 2.09 2.50 0.41 0.80 -0.35 -0.37 0.76 0.08 User 39 Biases 0.61 2.5 2.0 1.5 1.0 0.5 0.0 2.5 2.0 1.5 1.0 0.5 0.0 0.5 0.5 humor, but what is interesting is that we can see in Fig. 2 that these values actually focus on the external perspective of funniness of an individual. It is clearly visible when evaluating the correlation values between the humor style parameters and the dimensions funny to me and funny to someone. as presented in Fig. 3. Other characteristics that are more correlated with the negative than the mentioned results, the HSQ metrics seem to afect dimension. A similar relationship exists between be separated from the standard funniness values, as the vulgar and embarrassing bias. Also, compassion is poscorrelation is much lower than when analyzed between itively related to health problems. There is a general other HSQ values. As for the individualism of a user, the tendency for people who report health problems to persubjective matter of experiencing humor is based on the ceive text as less understandable. emotionality of a user. We have noticed that there are Bias and Stress with Emotions are presented in two distinct groups of annotators in regard to the humor Appendix B.5. Stress is related to positive and slightly dimension, people who feel free to express their views weak to negative afect biases. On the other hand, there of funniness, and individuals who hardly exceed small is a negative relationship between experiencing positive values in both funniness and unfunniness. As shown in afect and rational biases. Negative afect is related to Fig. 3, people from the expressive group, such as User 38, negative afect and rational dimensions. Satisfaction with have a relatively high correlation when talking about the life is weakly negatively related to rational dimension content being funny to others or themselves, as where biases. more reserved people, similar to User 39, tend to be mild in their expression of emotions and feelings. This obser- 4.3. Intra-Annotator Agreement over vation extends the area of subjectivity in humor in NLP Time and emphasizes that not only the experience is analyzed through personalization, but also the expression must be The sample results (calculated for one annotator)1 are noticed and thoroughly examined. Detailed correlation shown in Fig. 4. Interestingly, IntraAA only in few cases between humor and annotation dimensions is presented reaches a level that could be considered very good, or in Appendix B.2. even satisfactory. The situation is even worse if we ex

Emotion regulation and subjectivity The relation- clude the cases of the agreement for null marks, especially ships between regulation of emotions and subjectivity when they account for a large percentage of decisions are described in the appendix B.3. The use of distraction (e.g. for the presented user the score ranges from 0.08 as a strategy exhibits the most positive relationship with to 0.54, and the average is 0.21). However, the use of the positive afect dimension and the selected rational the soft IntraAA, which also considers as congruent anbiases. On the contrary, the relaxation strategy shows an swers those that difer only by one point, shows that the inverse relationship with negative rational biases. diferences between the annotations are most often not

Health and subjectivity is described in Appendix B.4.

Depression and gastrointestinal problems are the health 1For the complete results for all 3 annotators see the Appendix C large - the IntraAA increases significantly (55% on aver- decisions made on two diferent days, the proportion of age for the analyzed users). This shows that the analyzed changes from a more to a less negative label increased annotators were characterized by relatively high stabil- (at the expense of cases of maintaining the assessment; ity. Smaller diferences between strict and soft IntraAA see Fig. 5). would show the dimensions for which annotators are particularly stable. Such dimensions include joy, inspiration, embarrassing, vulgar or ofensive to me .

We also investigated this phenomenon by trying to determine the impact of the negativity of previously annotated texts. For the purposes of the study, we used a measure called Back Saturation (BS - see Section 3.5).

After assigning the appropriate value to each text, we compared respectively the for each text as it appears for the first time and for the second time. The results were combined with changes in the annotator’s decision (see Fig. 6). As it turns out, the analyzed annotators changed their decisions without a clear efect of back saturation. However, we observe an imbalance in the proportion in the case where the evaluation of a text changes to a more negative text by one point. Indeed, we note relatively more cases in which such a decision change is associated with the occurrence of a duplicate after more negative texts.

We believe that a number of factors can afect the change in rating. The basis for the more detailed analysis dFeigciusrioen6. : The correlation between and changes in the was the labels within the negative dimension, primarily because this is the dimension for which relatively most labels other than "zero" appear and because it has relatively low concordance scores. Among other things, the anal- 4.4. Relation between Annotations, ysis looked at the impact of time. It turned out that the Questionnaires and Biases tendency to change the decision increased when the text to be annotated was repeated on a diferent day (some To gather holistic knowledge about the user, we decided duplicates appeared on the same day). Interestingly, for to include text annotations and questionnaires in the data collection process. Then, we used the acquired annotations to calculate the biases that describe the peculiarity of user preferences according to others. Each of the human data acquisition methods are described in Tab. 2. To measure the similarity of knowledge obtained by each of these methods, we used the Pearson correlation coeficient [36]. The results are presented in Fig. 7. The higher correlation values were observed between text annotations and user biases. On the other hand, lower correlation values appeared between questionnaire answers and user biases. The relation between questionnaires and text annotation is described by the least significant correlation values.

quest quest-anno

quest-bias anno-quest anno

anno-bias bias-quest bias-anno bias 10-point scale for each dimension makes it dificult to achieve exact agreement between annotators. Therefore, to better understand the phenomenon, we used three diferent agreement metrics: 1. Cohen’s kappa on raw annotations. 2. Cohen’s kappa on binarized annotations, where all nonzero annotations (1-10) were converted to ones (1). 3. Kendall Tau rank correlation coeficient that measures the ranking agreement between annotators.

In case of Kendall rank correlation metrics, all empty annotations were removed from calculations, as they cannot be ordered. As expected, the average Kappa agreement scores in most dimensions are very low, with a minimum for surprise (0.025) and maximum for political (0.267). In the case of binarized annotations, the Kappa agreement increases significantly (between 0.052 for surprise and 0.513 for political). The Tau coeficient ranges between 0.081 for surprise and 0.589 for political. The results also reveal a positive correlation between the percentage of zero annotations and annotators agreement for given dimension (0.485 Pearson correlation coeficient for the mean kappa and 0.396 for mean kappa binarized).

We also checked the correlation between the mean Tau coeficient and the absolute diferences in the biases of the annotators. Annotators with high bias are more likely to rate texts above average, and annotators with low bias are more likely to rate texts below average. Therefore, the diference of biases on given dimension can be interpreted as the distance between the annotators’ sensitivity on this dimension. As Tab. 3 shows, these correlations are mostly negative but very weak. This means that there is no clear relationship between the annotator ranking agreement and the diference in their sensitivity.

5. Discussion

who are in a positive emotional state are more likely to suggests that lower life satisfaction may contribute to perceive and interpret stimuli in a positive light. Individ- perceiving and evaluating stimuli in a negative light, inuals who have higher levels of life satisfaction may have lfuencing negative dimension biases in the interpretation a generally positive outlook, influencing their perception of the text. People with health problems are more prone and interpretation of stimuli as more positive. Gener- to negative dimension biases. Surprisingly, afective bially, according to questionnaire data, there is a tendency ases are less noticeable when people experience stress. that positive afect dimensions are afected by the level Vulgar and embarrassing bias co-occur with each other. of health (both mental and physical). Interestingly, peo- People who score higher in neuroticism, experiencing ple with health problems evaluate text as more arousing stress, feeling negative emotions, and less satisfied with compassion. life are more prone to perceive texts as more controver

Individuals who do not use relaxation strategies as a sial in those two biases. Depression and general health coping mechanism for stress tend to exhibit a negative problems can reinforce these biases, as well as ruminaafect dimension bias . This suggests that the absence of tion and distraction as emotion regulation strategies. An relaxation techniques may contribute to a tendency to inverse relationship with positive emotions confirms the perceive and interpret stimuli in a negative light when tendency to perceive text as passing less controversial experiencing stress. Individuals who employ afiliative while experiencing similar emotions. A similar tendency humor are less likely to present biases toward negative af- is noticed for people who score higher in intellect and fect dimensions. There is a positive relationship between use relaxation and engagement as strategies of emotion agreeableness and diferentiation in negative dimension regulation. Individuals who are more likely to view a biases, slightly weaker compared to positive dimension text as ofensive or funny tend to experience higher levels biases. This implies that individuals with higher levels of stress, negative emotions, and dificulty in identifying of agreeableness may display more nuanced biases when and understanding their own emotions. Additionally, the it comes to perceiving negative afect. Higher scores in presence of positive afect appears to have a mitigating alexithymia are associated with a greater propensity to efect on this tendency, indicating that higher levels of negative bias. This suggests that individuals who strug- positive emotions are associated with a reduced likeligle with identifying and expressing their own emotions hood of perceiving the text as ofensive or funny. There may be more inclined toward negative biases in their per- is a diference between personality traits that have an ception and interpretation of stimuli. When individuals impact on the ofensive to me and ofensive to someone experience negative emotions, they are more suscepti- bias. Individuals who score higher in agreeableness and ble to perceiving text through a negative dimension bias. conscientiousness have a tendency to perceive the text This implies that the emotional state of negativity can as more ofensive to them, surprisingly the tendency is influence how individuals interpret and evaluate stim- inverse for an ofensive to someone (only for agreeableuli, leading to a bias towards negative afect dimensions. ness). In other words, individuals high in agreeableness Individuals who report lower levels of life satisfaction and ofensiveness may be more sensitive to personal crittend to mark text as more negatively biased. This finding icism or ofensive remarks directed toward them, but they may be less sensitive or more understanding when ods implies the necessity to include all of them in the data it comes to ofensive language or content directed toward acquisition process in order to capture the most relevant others. There is also a positive relationship between per- representations of various human perspectives. ceiving text as ofensive to someone and funny (to me or The analysis of the stability of the ratings showed sevsomeone) with the rumination and suppression strategy. eral important issues. The diference in the evaluation of Interestingly, no significant relationship was observed duplicates made on a diferent day than the annotation between the rumination strategy and the perception of of the first occurrence of the text may indicate a gradual text as ofensive to oneself, suggesting that this particular resilience to the content presented since users rather lowstrategy may not significantly influence one’s sensitivity ered the score for negativity than upheld their judgment. to personal ofense. The use of distraction as a coping The introduction of a new measure to determine the negamechanism has an impact on perceiving content as ofen- tivity of the context in the form of preceding texts ( ) sive and finding humor in it. The inverse relationship is revealed that there is an impact of the negativity of texts observed for conscientiousness. Individuals with higher previously rated by the annotator – if the context for the levels of conscientiousness may be more sensitive to po- duplicate is more negative than for the first occurrence tential threats or negative implications in communication, of the text ( is higher), the annotators tend to assign leading them to perceive text as ofensive to them more a more negative rating to the duplicate than they did for frequently. Individuals with higher levels of intellect are the first appearance. less likely to interpret text as personally ofensive . In other words, intellectual individuals tend to be more objective and less sensitive to potentially ofensive content 6. Conclusions and Future Work directed at themselves. Political bias is higher for people who score higher in intellect. The same tendency is Our results demonstrated that people vary between themfor neuroticism. Individuals who are more agreeable are selves in terms of psychological characteristics, which likely to be a more open-minded and tolerant approach was also reflected in the diversified annotation results. Rewhen it comes to political beliefs, leading to lower lev- lationships between questionnaire results and biases lead els of political bias. Also, health problems can influence to several conclusions. First, there is a common tendency the perception of text as understandable. However, to that specific psychological characteristics are related to generalize such conclusions, we should conduct more similar dimensions inside the group. e.g., agreeableness complex studies that consider the use of more specialized with positive afect. It is a question of future research equipment. The fact that the ranking agreement (Tau to investigate why certain dimensions (e.g., calm with coeficient) and agreement calculated on binarized anno- agreeableness) did not correspond to the group tendentations (Kappa binarized) are significantly higher than cies. Second, it is possible to evaluate the intensity of the agreement calculated on raw annotations suggests psychological characteristics based on the annotation of that the 10 point annotation scale may be problematic texts. Future studies could further explore this issue by for annotators. They generally agreed on the presence of selecting the type of text to annotate and developing popa given dimension in the text, but difered in determining ulation norms. The main conclusion that can be drawn its exact intensity. Nevertheless, the values of the Tau is that psychological characteristics influence multiple coeficient are high for most of the tasks, which means perspectives on text perception. Our research also shows that the annotators generally agreed on the ranking of that it may be worth including information about annothe dimension intensity of texts. tator characteristics in machine learning solutions. We

Higher correlation values between text annotations have shown that people tend to change their ratings over and user biases compared to their relationship with ques- time, and in many cases, the diferences in annotations tionnaires may be related to the text dependency of those (and therefore intra-annotator agreement) are very high. methods. On the other hand, more significant positive Undoubtedly, this depends on many factors. One of them and negative correlations between questionnaires and may be the influence of previously annotated texts. We biases compared to correlations between questionnaires presented a study conducted by us on a selected sample of and annotations may be caused by the aggregative na- annotators. Our future work in this regard would involve ture of biases. They aim to distill user annotations to increasing the scope of this work to more dimensions emphasize the main diferences between user preferences and a larger number of annotators. The limitations of the compared to others. Furthermore, the highest number of present studies naturally include the unbalanced gender negative correlation values was observed between ques- and age group. Another limitation concerns insuficient tionnaires and biases. This outlines the diferent types sample size to generealize our findings. The source code of text-agnostic knowledge about the user that can be used during research is publicly available2. obtained with this method in comparison to annotations 2https://github.com/CLARIN-PL/capturing-human-perspectives/ and biases. Therefore, the distinct nature of those meth- tree/main ation for Information Science and Technology 73 (2022) 3–18.

This work was financed by (1) the National Science Cen- [7] V. Basile, F. Cabitza, A. Campagner, M. Fell, Toward tre, Poland, project no. 2021/41/B/ST6/04471; (2) Contri- a perspectivist turn in ground truthing for predicbution to the European Research Infrastructure ’CLARIN tive computing, arXiv preprint arXiv:2109.04270 ERIC - European Research Infrastructure Consortium: (2021).

Common Language Resources and Technology Infras- [8] G. Abercrombie, V. Rieser, D. Hovy, Consistency tructure’, 2022-23 (CLARIN Q); (3) the Polish Ministry is key: Disentangling label variation in natural lanof Education and Science, CLARIN-PL; (4) the Euro- guage processing with intra-annotator agreement, pean Regional Development Fund as a part of the 2014- arXiv preprint arXiv:2301.10684 (2023). 2020 Smart Growth Operational Programme, projects no. [9] P. Milkowski, M. Gruza, K. Kanclerz, P. Kazienko, POIR.04.02.00-00C002/19, POIR.01.01.01-00-0288/22 and D. Grimling, J. Kocon, Personal bias in prePOIR.01.01.01-00-0923/20; (5) the statutory funds of the diction of emotions elicited by textual opinions, Department of Artificial Intelligence, Wroclaw Univer- in: Proceedings of the 59th Annual Meeting of sity of Science and Technology; (6) the Polish Ministry the Association for Computational Linguistics of Education and Science within the programme “Inter- and the 11th International Joint Conference on national Projects Co-Funded”; (7) the European Union Natural Language Processing: Student Research under the Horizon Europe, grant no. 101086321 (OMINO). Workshop, Association for Computational LinHowever, the views and opinions expressed are those of guistics, Online, 2021, pp. 248–259. URL: https: the author(s) only and do not necessarily reflect those //aclanthology.org/2021.acl-srw.26. doi:10.18653/ of the European Union or the European Research Execu- v1/2021.acl-srw.26. tive Agency. Neither the European Union nor European [10] P. Kazienko, J. Bielaniewicz, M. Gruza, K. Kanclerz, Research Executive Agency can be held responsible for K. Karanowski, P. Miłkowski, J. Kocoń, Humanthem. centred neural reasoning for subjective content processing: Hate speech, emotions, and humor, InforReferences mation Fusion (2023).

[11] J. Bielaniewicz, K. Kanclerz, P. Miłkowski, M. Gruza, [1] D. Hovy, S. Prabhumoye, Five sources of bias in nat- K. Karanowski, P. Kazienko, J. Kocoń, Deepural language processing, Language and Linguistics sheep: Sense of humor extraction from embedCompass 15 (2021) e12432. dings in the personalized context, in: 2022 IEEE [2] K. Kenyon-Dean, E. Ahmed, S. Fujimoto, J. Georges- International Conference on Data Mining WorkFilteau, C. Glasz, B. Kaur, A. Lalande, S. Bhanderi, shops (ICDMW), 2022, pp. 967–974. doi:10.1109/ R. Belfer, N. Kanagasabai, et al., Sentiment analy- ICDMW58026.2022.00125. sis: It’s complicated!, in: Proceedings of the 2018 [12] K. Kanclerz, A. Figas, M. Gruza, T. Kajdanowicz, Conference of the North American Chapter of the J. Kocon, D. Puchalska, P. Kazienko, Controversy Association for Computational Linguistics: Human and conformity: from generalized to personalized Language Technologies, Volume 1 (Long Papers), aggressiveness detection, in: Proceedings of the 2018, pp. 1886–1895. 59th Annual Meeting of the Association for Com[3] A. M. Davani, M. Díaz, V. Prabhakaran, Dealing putational Linguistics and the 11th International with disagreements: Looking beyond the majority Joint Conference on Natural Language Processing vote in subjective annotations, Transactions of the (Volume 1: Long Papers), Association for ComputaAssociation for Computational Linguistics 10 (2022) tional Linguistics, Online, 2021, pp. 5915–5926. URL: 92–110. https://aclanthology.org/2021.acl-long.460. doi:10. [4] A. Tourimpampa, A. Drigas, A. Economou, P. Rous- 18653/v1/2021.acl-long.460. sos, Perception and text comprehension. it’sa mat- [13] K. Kanclerz, M. Gruza, K. Karanowski, ter of perception!, International Journal of Emerg- J. Bielaniewicz, P. Miłkowski, J. Kocoń, P. Kazienko, ing Technologies in Learning (Online) 13 (2018) 228. What if ground truth is subjective? personalized [5] M. M. Nitzschner, U. K. Nagler, J. F. Rauthmann, deep neural hate speech detection, in: Proceedings A. Steger, M. R. Furtner, The role of personality of the 1st Workshop on Perspectivist Approaches in advertising perception: An eye tracking study, to NLP@ LREC2022, 2022, pp. 37–45.

Psychologie des Alltagshandelns 8 (2015) 10–17. [14] J. Kocoń, M. Gruza, J. Bielaniewicz, D. Grimling, [6] X. Sun, X. Zhou, Q. Wang, S. Sharples, Investigating K. Kanclerz, P. Miłkowski, P. Kazienko, Learning the impact of emotions on perceiving serendipitous personal human biases and representations for subinformation encountering, Journal of the Associ- jective tasks in natural language processing, in: 2021 IEEE International Conference on Data Mining (ICDM), IEEE, 2021, pp. 1168–1173. lan, The psychometric assessment of alexithymia: [15] A. Cutler, D. M. Condon, Deep lexical hypothe- Development and validation of the perth alexsis: Identifying personality structure in natural lan- ithymia questionnaire, Personality and Individual guage., Journal of Personality and Social Psychol- Diferences 132 (2018) 32–44.

ogy (2022). [30] R. Likert, A technique for the measurement of [16] D. Demszky, D. Movshovitz-Attias, J. Ko, A. Cowen, attitudes., Archives of psychology (1932).

G. Nemade, S. Ravi, Goemotions: A dataset of fine- [31] S. Cohen, R. C. Kessler, L. U. Gordon, Measuring grained emotions, arXiv preprint arXiv:2005.00547 stress: A guide for health and social scientists, Ox(2020). ford University Press on Demand, 1997. [17] L. Feldman Barrett, J. A. Russell, Independence and [32] E. Diener, D. Wirtz, W. Tov, C. Kim-Prieto, D.-w. bipolarity in the structure of current afect., Journal Choi, S. Oishi, R. Biswas-Diener, New well-being of personality and social psychology 74 (1998) 967. measures: Short scales to assess flourishing and [18] J. B. Nezlek, P. Kuppens, Regulating positive and positive and negative feelings, Social indicators negative emotions in daily life, Journal of personal- research 97 (2010) 143–156.

ity 76 (2008) 561–580. [33] E. Diener, R. A. Emmons, R. J. Larsen, S. Grifin, The [19] M. B. Donnellan, F. L. Oswald, B. M. Baird, R. E. satisfaction with life scale, Journal of personality Lucas, The mini-ipip scales: tiny-yet-efective mea- assessment 49 (1985) 71–75. sures of the big five factors of personality., Psycho- [34] K. Förster, P. Kanske, Upregulating positive afect logical assessment 18 (2006) 192. through compassion: Psychological and physiolog[20] L. A. Jensen-Campbell, W. G. Graziano, Agreeable- ical evidence, International Journal of Psychophysness as a moderator of interpersonal conflict, Jour- iology 176 (2022) 100–107.

nal of personality 69 (2001) 323–362. [35] G. Haydon, J. Reis, L. Bowen, The use of humour [21] B. W. Roberts, C. Lejuez, R. F. Krueger, J. M. in nursing education: An integrative review of reRichards, P. L. Hill, What is conscientiousness and search literature, Nurse Education Today (2023) how can it be assessed?, Developmental psychology 105827.

50 (2014) 1315. [36] K. Pearson, Vii. note on regression and inheritance [22] L. D. Smillie, C. G. DeYoung, P. J. Hall, Clarifying the in the case of two parents, proceedings of the royal relation between extraversion and positive afect, society of London 58 (1895) 240–242.

Journal of personality 83 (2015) 564–574. [23] S. Balta, E. Emirtekin, K. Kircaburun, M. D. Grifiths,

Neuroticism, trait fear of missing out, and phubbing: A. Annotator Profiles The mediating role of state fear of missing out and problematic instagram use, International Journal Annotator profiles comprised with the results of the quesof Mental Health and Addiction 18 (2020) 628–639. tionnaires mentioned in 3.3 are presented in Fig. 8. [24] R. R. McCrae, D. M. Greenberg, Openness to ex- Personality: Agreeableness: The average score of perience, The Wiley handbook of genius (2014) 15.6 suggests that people tend to be moderately coopera222–243. tive and compassionate towards others (with a standard [25] R. A. Martin, P. Puhlik-Doris, G. Larsen, J. Gray, deviation of 2.2). Extraversion: The average score of 12.1 K. Weir, Individual diferences in uses of humor and indicates that, on average, individuals tend to have a their relation to psychological well-being: Develop- moderate level of sociability and assertiveness (with a ment of the humor styles questionnaire, Journal of standard deviation of 4.3). Conscientiousness: With an research in personality 37 (2003) 48–75. average score of 15.1, individuals, on average, exhibit a [26] H. Medland, K. De France, T. Hollenstein, D. Mus- moderate level of organization and responsibility (with sof, P. Koval, Regulating emotion systems in ev- a standard deviation of 2.7). Neuroticism: The average eryday life, European Journal of Psychological As- score of 12.8 implies that, on average, individuals tend sessment (2020). to have a moderate level of emotional stability and ex[27] A. C. Schat, E. K. Kelloway, S. Desmarais, The physi- perience negative emotions (with a standard deviation cal health questionnaire (phq): construct validation of 3.6). Intellect: The average score of 15.1 suggests that, of a self-report scale of somatic symptoms., Journal on average, individuals tend to exhibit a moderate level of occupational health psychology 10 (2005) 363. of intellectual curiosity and openness to new ideas (with [28] A. Kokoszka, A. Jastrzębski, M. Obrębski, Ocena a standard deviation of 2.5). Humor Style: Afiliative psychometrycznych właściwości polskiej wer- humor: The average score of 29.4 indicates that on aversji kwestionariusza zdrowia pacjenta-9 dla osób age people tend to use humor extensively to strengthen dorosłych, Psychiatria 13 (2016) 187–193. social bonds and improve relationships (with a standard [29] D. Preece, R. Becerra, K. Robinson, J. Dandy, A. Al- deviation of 5.5). Self-enhancing humor: The average score of 25.2 suggests that, on average, individuals tend trointestinal problems (with a standard deviation of 4.3). to use humor extensively as a coping mechanism to main- Respiratory Infections: The average score of 3.5 indicates tain a positive outlook during stressful situations (with a that, on average, individuals report a relatively low level standard deviation of 6.7). Aggressive humor: With an of respiratory infections (with a standard deviation of average score of 19.5, individuals, on average, exhibit a 2.7). moderate tendency to use humor as a means of teasing or mocking others (with a standard deviation of 4.8). Selfdefeating humor: The average score of 19.1 implies that, B. Heatmaps on average, individuals tend to moderately engage in self-disparaging humor and put themselves down (with a Heatmaps may vary in the number of dimensions disistthaynmdairad: dTehveiaativoenraogfe5.s4c)o.rSetroefss14a.n8dinEdmicoatteiosntsh:aAt, leoxn- tpiloanynedaiirne.thOenslcyopdeimofenbsiaiosenss atnhdattheexhreibsuitltas ocforthreelaqtuioesnaverage, individuals tend to have a low level of dificulty value of 0.1 or higher are displayed. in identifying and expressing emotions (with a standard deviation of 6.4). Stress: With an average score of 14.5, B.1. Personality Traits individuals, on average, perceive a low level of stress in In Fig. 9, agreeableness is moderately correlated with their lives (with a standard deviation of 6.8). Positive the dimension connected with positive afect dimensions afect: The average score of 22.4 suggests that, on aver- (positive, delight, inspiration, surprise and compassion) age, individuals experience a moderate level of positive whereas weakly with joy. The relationship with negaemotions (with a standard deviation of 4.7). Negative tive afect dimensions is slightly weaker. Data analysis afect: The average score of 17.5 implies that, on aver- revealed a weak positive relationship between extraverage, individuals experience a moderate level of negative sion and selected positive emotion bias. There is a weak emotions (with a standard deviation of 5.6). Satisfaction (positive, surprise and compassion) and moderate (dewith life: The average score of 21.6 indicates that, on light, inspiration, joy) relationship between conscienaverage, individuals have a low level of satisfaction and tiousness trait and a few positive afect biases, and a happiness with their lives (with a standard deviation of weak negative relationship between negative emotion 5.7). Emotion’s Regulation: Relaxation: The average bias (negative, sadness, and anger). Two rational bias score of 94.4 suggests that, on average, individuals en- (ofensive to me and funny to me) are related to consciengage in relaxation techniques to manage their emotions tiousness. There is a weak negative correlation between to a moderate extent (with a standard deviation of 61.7). neuroticism and positive afect biases ( joy, delight, inspiEngagement: With an average score of 125.5, individuals, ration and compassion). A similar relationship is observed on average, exhibit a moderate level of involvement and for some negative afect biases ( negative, fear) and the immersion in activities as a means of emotion regulation opposite for anger. This trait is positively weekly corre(with a standard deviation of 62). Rumination: The mean lated with rational biases (embarrassing, vulgar, political, score is 93.7, reflecting a moderate tendency to ruminate understandable, ofensive to someone ). The inverse relaor dwell on negative thoughts or emotions (with a stan- tionship is observed for anger and ofensive to me biases. dard deviation of 65.3). Reappraisal: The mean score is Intellect is negatively related to three rational biases 115.5, indicating a moderate tendency to reinterpret sit- (ofensive to me , vulgar and embarassing) and positively uations to regulate emotions (with a standard deviation with two (political and understandable). There is a weak of 58.1). Distraction: The mean score is 89.6, reflecting a negative association between intellect and positive afmoderate preference for using distractions as an emotion fect biases (compassion, surprise, calm, inspiration and regulation strategy (with a standard deviation of 63.4). delight).

Suppression: The mean score is 46.5, indicating a relatively lower tendency to suppress or hide emotions (with a standard deviation of 50.4). Health Depression: The B.2. Humor average score of 6.1 indicates that, on average, individ- In Fig. 10, afiliative humor has a weak positive correlauals report a relatively low level of depression (with a tion with rational dimension biases (understandable) and standard deviation of 5.6). Sleep disturbance: The aver- a negative correlation with embarrassing and interesting. age score of 10.7 suggests that, on average, individuals For the negative afect dimension biases, a weak positive experience a low level of sleep disturbance (with a stan- correlation can be seen for negative, fear, sadness, disgust dard deviation of 6.0). Headaches: The average score of and anger bias. Self-enhancing humor is negatively 6.7 indicates that, on average, individuals report a low correlated with rational dimension biases (funny to me, level of headaches (with a standard deviation of 4.3). Gas- ofensive to someone, understandable, interesting, polititrointestinal Problems: The average score of 7.4 suggests cal, embarrassing). Aggressive humor is positively corthat, on average, individuals report a low level of gasrelated with negative afect dimensions ( funny to some- (embarrassing, interesting) and negatively with (political, one, ofensive to someone , understandable,interesting and understandable). political) and positive dimensions biases (positive, joy, delight, surprise) and negative dimension bias (disgust). B.3. Emotion Regulation Self-defeating humor is correlated with positive affect dimensions (positive, joy, delight, inspiration, sur- In Fig. 11, engagement has only weak negative correprise, compassion), negative afect dimensions ( negative, lations with the rational dimension (ironic, embarrassfear, sadness, disgust and anger) and rational dimensions ing, vulgar, understandable, ofensive to someone, funny to someone). For the positive afect dimension, a weak erate for surprise bias and weak positive for compassion, positive correlation can be seen for positive and calm bias. positive, calm and inspiration bias. There is also a weak positive correlation for negative bias, Reappraisal is weakly correlated with the dimension and a weak negative correlation is for disgust biases. associated with positive dimensions (surprise, positive,

There is a moderate positive relationship between of- compassion) and negative dimensions (negative, sadness). fensive to someone bias and rumination. For the other The same absolute value occurs for ironic and funny to rational dimension (funny to someone, funny to me) a me bias, except that the former shows a weak positive weak positive correlation occurs. The correlations for correlation and the latter a weak negative correlation. positive afects dimension are similarly distributed: mod- There is a moderate relationship between distraction and four positive dimensions (positive, inspiration, joy, correlation is found with compassion bias. compassion). A similar correlation occurs for the two rational dimensions (ofensive to someone, funny to some- B.4. Health one). Other rational (ofensive to me, funny to me, ironic, vulgar) and positive (delight, surprise) dimensions show In Fig. 12, there is a weak relationship between depresa weak positive correlation. sion and disgust bias. At the same time, there is a nega

Suppression is moderately correlated with two ratio- tive correlation with positive afects (joy, positive). From nal dimension (funny to someone, funny to me). Other ra- positive afects only compassion is related to depression tional dimension (vulgar, embarrassing, ofensive to some- in a weak positive correlation. There are also relationone, ironic) have a weak positive relationship. A similar ships with rational diferentiation: weak positive (ironic, ofensive to someone), moderate positive (vulgar, embar- turbance and positive bias. rassing), and moderate negative correlation (understand- The item most highly correlated is understandable bias, able). with a moderate positive correlation with headaches.

A positive afect ( compassion) is weakly positively re- Other rational dimensions show weak positive correlalated to sleep disturbance. There is a similar but neg- tions (ironic, interesting, embarrassing, vulgar). There is ative correlation with joy bias and rational efects ( un- a weak positive association between headache and negaderstandable, interesting, incomprehensible). There are tive afect ( anger). There is a weak negative correlation also weak positive correlations with rational afects ( em- with incomprehensible bias. barrassing,vulgar). The ironic bias has a weak positive Gastrointestinal problems are mostly correlated correlation. There is no relationship between sleep dis- with rational afect: with moderate positive correlation (ironic, vulgar, ofensive to me ), weak positive correla- correlation. For positive bias, there is a weak positive tion (embarrassing, ofensive to someone ) and negative correlation. correlations: moderate (understandable) and weak (in- A moderate or weak negative correlation can be obcomprehensible). There are as many weak positive rela- served between perceiving a text as understandable and tionships with positive afects ( compassion, positive) as headaches, gastrointestinal problems, depression, and with negative ones (disgust, fear). sleep disturbance. The same somatic health dimensions

For respiratory infections, the strongest correlations have a positive correlation with interpreting a text as vulare with rational afect, both weakly positive ( ironic, in- gar or embarrassing. There is a positive correlation with teresting) and weakly negative (incomprehensible, ofen- the ironic bias in all physical health dimensions studied. sive to me). Negative afect ( anger) has a weak negative

B.5. Stress and Emotions

In Fig. 13, experiencing stress is moderate (vulgar, embarrassing) and weakly (ofensive to someone , funny to someone) positively related to rational biases. The negative relationship is noticed only with understandable rational bias. Positive afect (positive, inspiration) and negative afect (negative, fear, sadness) biases are weakly negative related to stress. Experienced positive afect is negatively moderately related to rational biases (funny to someone, ofensive to someone , political, vulgar and embarassing). Among positive afect dimensions only positive bias is positively correlated. From the negative afect dimensions, only disgust bias is weakly negatively correlated. Both the negative afect dimensions ( negative, fear, sadness, disgust) and rational dimensions (embarrassing, political, ofensive to someone and funny to someone) are weakly related to negative afect. Only the vulgar bias is moderately related to the negative afect. There is a weak positive relationship between positive dimensions biases (positive, inspiration and surprise) and satisfaction with life. There is a weak inverse relationship between negative dimension biases (fear, sadness). Rational dimension biases (embarrassing, vulgar, political, ofensive to someone) are weakly negatively related to satisfaction with life. A positive correlation is only observed for the understandable bias.

C. Intra-Annotator Agreement C.1. Intra-Annotator Agreement for

Afective Dimensions 0 0 6 0 0 3 0 0 6 0 0 2 0 0 4 0 0 7 0 0 7 0 r ea ,53 ,15 , F

% % % 3 0 3 74 ,14 ,0 5 4 , 3 7 7 0 ,7 ,1 ,1 ,6 ,59 ,15 ,

% 5 0 1 5 51 ,18 ,0 68 ,70 , 1 , 0 , 8 5 4 ,6 C 0 0 5 0 0 4

0 0 8 0 %

% ,36 ,18 , % 5 1 6 28 ,19 ,7 6 1 , 0 ,9 7 ,4 3 6 ,5 0 0 2 0 0 1 0 0 9 0 4 8 ,8 ,7 ,

8 0 0 1 0

7 , % % % % % lam ,39 ,08 , C

% 2 4 1 0 4 ,8 ,13 ,1 68 ,70 , 7 , 8 8 4 ,6 0 0 3 0 0 7 0 0 8 0 e g n ,65 ,11 , A %

% % 2 7 3 8 46 ,12 ,5 94 ,51 , 0 , 8 , 9 3 9 ,5 ,67 ,14 , %

% % 6 4 7 8 55 ,06 ,1 86 ,80 , 1 , 2 , 3 3 5 ,6 D 0 0 6 0 0 5 0 0 6 0 4 4 3 4 a S u

S e l s n o e s D ( i y s Jo n e

e im iv

t d a g e e v N i t

I t h g i l e e v i t i s r

o o P f c fe a m t n e e e r g A r o t a t o n n 4 r t A A A a a r t ) B ( > ) A ( % % % % 1 1 ,5 , 6 0 9 0 0 ,

3 2 6 4 5 ,5 , 1 8 % % % % 9 9 8 0 , 3 , 9 2 % % % % 8 0 9 2 ,8 , 3 0 7 , 6 ,

4 2 2 % % 2 7 ,8 , 8

9 2 , 8 % %

6 4 ,

8 7 4 2 1 ,1 ,

8 1 ,

7 5 ,

2 9 2 3 8 4 5 1 3 % % 3 9 4 , 4 % 6 6% ,8 1 9 3 7 , 1 1 3 1

4 6 0 % 6 2 4 , 5 , 0 9 6 , 6 9 7 ,

8 2 5 % % 0 7 ,6 , 6 4 , % % 4 1 7 ,

6 6 1 1 5 2 1 3 % % 2 8 ,9 ,9 4 2 , 3 3 5 2

% % 9 28 ,4

3 % % 4 7 ,6 ,6 4 6 , 2 4 8 2

% % 8 14 ,6

4 % % % % 7 39 , , 8 5 , 9 2 4 ,

8 1 5 7 2 2 0 6 1 1 1 % 0 5 4 , 7 , 0 4

2 4 1 6 2 6 % % % 2 5

8 9 , 1 , 5 ,5 ,1 ,9 , % 8 4 73 ,05 ,4 78 ,21 , % 1 5 2 5 ,7 1 , 0 0 4 0 0 7 0 0 8 0 % 6 ,65 ,08 , % % 3 9 8 67 ,15 ,4 49 ,50 , 1 , 2 5 3 ,7 1 , 0 0 6 0 0 6 1 ,7 ,2 ,3 , 8 6 0 ,7 0 0 6 0 0 5 0 0 9 0 n % 9 ,47 ,22 , 9 35 ,15 ,5 3

6 2 3 ,4 ,2 ,8 ,4 2 , 0 0 3 0 0 2 0 0 2 0 9 5 5 ,7 ,8 ,9 ,8 0 0 0 0 % %

% 9 ,41 ,12 , ] 0 ] r o se r e u z [ n sg o n n i [ ] 0 ] 1 o re r e s z u - [ n s o g k ) ) ) g

n e l t a c

g o l i u

n m i i s s a r l r a a

b % 6 7 6 95 ,22 ,5 1

3 ,9 ,91 ,18 , 1 , 0 0 9 0 0 9 0 0 9 0 ,8 5 6 ,9 % 5 % 3 6 8 7 4 ,9 ,2 ,4 ,9 ,2 ,6 0 ,4 , 3 0

8 4 ,4 ,6 ,22 ,1 9

0 ,4 0 1 0 0 5 0 ,79 ,17 , % % 6 6 9 2 89 ,06 ,8 59 ,42 ,9 8 3 ,8 % % 2 6 3 3 64 ,04 ,8 97 ,21 ,3 5 7 , 2 , 6 ,7 r t In I n ra a

r o t

o a op i op ir op rev ) ftI Ion o ev rc tf

tf (A tS tS rP tS tS rP tS trS rP A (B oS S S A In su su su vA