Investigation of Temperament Characteristics Influencing the Academic Achievement of First-year University Students* Elena V. Shadrina[0000-0001-5663-529X], Olga E. Oshmarina[0000-0002-9403-9285], Marianna M. Korenkova[0000-0002-1611-6708], Galina M. Zalesskaya National Research University Higher School of Economics HSE branch in Nizhny Novgorod, Nizhny Novgorod, Russia,,, Abstract. The article examines the influence of human temperament on academic performance and predictions of "risky" students. Analysis was held with the help of statistics methods and methods of data mining. The baseline data for the study is information about students, collected using the online support system for the educational process at HSE - LMS (Learning Management System). The study found a relationship between temperament and academic success, making it possible to predict "risky" students. The result of the study was the recommendations of the education office to draw the attention of "risky" students, to carry out preventive measures: the organization of electives, the assignment of a curator, a check of the student's readiness for classes. Keywords: Psychology of higher education, academic achievements, human temperament, data mining, decision tree, on-line questionnaire. 1. Introduction Students' performance is the main parameter, on the basis of which it is possible to evaluate the clarity of knowledge and how well it is understood. If there is an opportunity for early prediction of examination results, preventive measures can be taken: electives and additional course consultations may be arranged in order to lower the number of students with unsatisfactory results or academic failures and dropouts. In small groups inclinations of students can be determined by a teacher during classes due to personal contact. But if the * Copyright ©c 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0) number of students rises, it becomes more difficult to monitor those who will be likely to fail the examination. The Faculty of Informatics, Mathematics, and Computer Science (HSE Nizhny Novgorod) grows and accepts more and more first-year students with different motivation level, natural aptitude, perseverance, and other personal features that can give them an advantage or create obstacles on the way to the academic success. Despite the fact that students come who enter the faculty have a high score on the Unified State Exam (school leaving tests), the percentage of those who are expelled after the first year is very high - about 30%. Since there is high competition among Universities for students, they (students) have become each University's essential resource. Higher education institutions must take into account students’ needs more than ever nowadays and do their best to meet these needs. Consequently, some major goals for a University today are: to find efficient ways of attracting students; to hold their interest; to strike a balance between academic requirements and catering for students’ needs and interests; how to keep students without reducing the quality of their studies. It is of equal importance to be able to understand who are the successful students, what they are interested in, and what their intentions and ambitions are. Therefore, the problem of predicting students' performance on the basis of their personal features becomes increasingly burning. During the last 20 years different authors conducted studies detecting factors, which affect the academic performance most of all. Particularly, works of Aruselvan, Campbell, Pal describe all-round studies on this subject [1, 2, 3]. Superby with co-authors conducted a large-scale study at a Belgium university detecting the most influential factors [4]. A newer study was conducted by Lust and co-authors at Belgium and French universities with the help of advanced methods of data mining and machine learning [5]. Finally, a similar study was conducted at the Higher School of Economics in 2014 [6]. All these studies show that factors connected to academic behaviour (attendance, keeping notes, confidence in selection of a university and profession, doing homework, attending electives), as well as personal history factors (parents' education, presence of both parents in the family, school grades) are of the most importance. They also demonstrate that character features are also important, but not so significantly as factors specified. In his turn, Poldin with co-authors in his article explains that sociable students by means of communication develop personally, and therefore become more successful [7]. The work of Valeeva detects a relationship proving that in the course of time social isolation of students with academic failures takes place, creating additional risks of being expelled from higher education institutions for them [8]. In the works mentioned above the effect of friendship ties on student performance is noted, so the idea emerged to look more closely at the psychological aspects of students' personality. Jung divided all individuals on the basis of reasons for psyche activity, which can be external or internal, into extroverts and introverts [9]. This division has become the basis for a modern understanding of temperament. In this work we will base on the typology of four temperaments, which logically follows from it: • the choleric temperament is characterized with intensity and power of emotional processes. Choleric people are quick-tempered, passionate and energetic. • a sanguine individual is distinguished by comparatively weak intensity of psyche processes with a quick change of certain processes with other. Sanguine people are cheerful, hard-working, they easily cope with various tasks. • a phlegmatic person is distinguished by slowness, sluggish movements, lack of energy. Feelings of a phlegmatic person are even and quiet. Phlegmatic people are devoted persons and it is difficult for them to switch into new activity types. • the melancholic temperament is characterized with depth of emotional expressions, but slow flow of psyche processes. Feelings and emotions of a melancholic person are usually uniform, such people are sensitive to external circumstances and often prove themselves to be passive and sluggish [10]. The temperament shows the equability of mind, emotional disposition, perseverance of an individual and therefore determines such personality processes as performance, ability to fulfill a task assigned, to deal with stresses and problems. For this reason definition of students' temperament may be helpful in predicting and preventing a number of possible difficulties occurring in the course of study. Moreover, this will help to produce methods of work excluding these difficulties taking into account strengths of the temperament of each student in terms of individualized approach. This study will define a type of personality, level of person's activeness, rationality, extroversion, ability to set study priorities of the first and second year students of the Faculty of Informatics, Mathematics, and Computer Science to see whether every student in prone to have academic failures. This work is aimed at studying the influence of personal features (i.e. temperament as a personality basis) on academic performance and detection of students who will highly likely fail their examination. It is not rational to wait until April to guide students who really need to be supported. We want to establish the statistical model which will help us to predict academic success or failure as soon as possible to provide necessary support. Our main hypothesis is that there is a relationship between temperament and student's academic performance. The information about personal features of students received by means of questioning via Google online form was used as a material for the study. First- and second-year students of the Faculty of Informatics, Mathematics and Computer Science of academic year 2017-2018 were the study subject. Their average grade and its dependence on temperament, extroversion level, stability and other personal features were the study object. 2. Data Our main objective is to classify students into three groups: Good - ‘low-risk’ students, with a high probability of succeeding; Medium - ‘medium-risk’ students, who may succeed due to the measures taken by the university; and Bad - ‘high-risk’ students, who have a high probability of failing (or dropping out). Thus, we need to create a data pool in which every student is described according to a range of personal characteristics. In order to obtain the data on the temperament and some other specific characteristics of students we arranged for an online interview using Google questionnaire forms. It was distributed at the middle of the academic year 2017- 2018. Altogether 90 first-year and 50 second-year students of the Faculty of Informatics, Mathematics, and Computer Science took part in it. The questionnaire was made in order to describe each student with a certain number of criteria. The interview included questions providing the information on activeness at school, perseverance, and ability to set priorities. This information describes personality traits of a student and gives additional grounds for more accurate study. The temperament defining block includes 12 questions, each describing one of the temperaments: choleric, sanguine, phlegmatic, or melancholic. Then using student's answers we calculated such parameters as extroversion and rationality (Figure 1) and defined the temperament based on these traits. It should be noted that the term "rationality" hereinafter shall be used to mean "stability", i.e. smooth development of processes, changing of feelings and mood. Besides, the information on with whom students live (alone, with parents, with mates or in a dormitory) was also collected as this may, to different extent, have an impact on academic performance depending on student's personality traits. Fig 1. Temperament classification on the basis of extroversion and rationality Final questionnaire comprised 17 closed questions. The information about intermediate academic success after the first half of the academic year from the internal University database has also been collected. So after merging the information we extracted 14 binary variables for each student and one in percentage rate. The decision variable used for the validation of our model is a variable of the final year rate, built a posteriori, grouping students according to their academic performance. Discussing the questionnaire results, allocation of temperaments in the sample is shown in Figure 2. Melancholics (27 persons) and sanguines (26) represent separately approximately one-fourth of the interviewed. The number of cholerics (34) three times exceeds the number of phlegmatics (13). Hereinafter, for illustration purposes, red colour will be used for cholerics, yellow — for sanguines, green — for phlegmatics, and blue — for melancholics. Fig 2. Allocation of types of temperament We used a grade average and the information on retaking of examinations from the general rating of the Faculty of Informatics, Mathematics and Computer Science for assess the academic performance. As the number of interviewed first-year students was twice as much as the number of the second-year students we decided to make a training sample from the total number of the second-year students and a half of the first-year students, 100 persons in total. A test set that will be used for checking the model performance includes 40 students of the first-year groups. Training sample students was divided into Good, Medium and Bad categories based on their grade average and the occurrence of retaken examinations, with each categories representing low, medium and high probability to fail any of exams (see Figure 3). The HSE academic year consists of 4 modules, with exams at the end of each and a 10-point grading system. After summer and winter sessions, the education office ranks the students of the faculty according to the average grade from the highest score (the most successful students) to the lowest (unsuccessful students). In our study the category of Good includes students of the first 33% of rating without retaken exams, the Medium category includes next 33% without retaken exams, and the Bad category includes all the rest students (see Figure 3a, Figure 3b). Fig 3. Breakdown of students by categories Fig 3a. Categorization of temperament(in number). Fig 3b. Categorization of temperament(in %) Fig 3c. Categorization of temperaments using characteristics of Exam success/Grade average Such surprisingly interesting breakdown data are explained by the fact that almost each second student of the interviewed had a retaken examination in a term. This is typical for the Faculty of Informatics, Mathematics and Computer Science because of a number of disciplines studied at the beginning of the first and second years of education that are rather difficult to be passed successfully. It is worth noting that the allocation to category Bad does not mean that the student will not pass three or more exams and is on the verge of expulsion. From the graph (see Figure 3b), it can be seen that at the stage of initial analysis, both general tendencies and differences between temperaments can be traced. The pattern of categorization in choleric and phlegmatic students is similar (mostly Bad, then Medium, and then Good), while half of sanguine persons belong to category Medium. For melancholic people, the largest number of students falls into category Bad. It should be noted that allocation to the bad category does not indicate that a student fails all the examinations and is verging towards expulsion. Many of the students assigned thereto have a high grade average and one failed exam. Since the aim of this study is to define all the students who are expected to have failures, therefore, even students with good progress and some academic failure may be categorized as Bad. 3. Methods The task of dividing students into the categories of high, medium and low probability of having academic failures is a task of classification based on supervised learning [11]. At the first stage of work, we have to select these characteristics that significantly influence the grade average and retaken examinations. At the second stage of work, we are to develop the best algorithm of classification of students from the training sample (with the greatest number of guessed Bad category students). In other words, such method should enable us to assign each next student to one of the predetermined classes subject to their formalized characteristics. For this purpose we used such simplest methods of computer-assisted learning as k-Nearest Neighbors algorithm [12] and a decision tree [13, 4]. As we did not know beforehand which method would give the most exact result, we tested both of them and to tune the parameters for the maximum guessing. At the third stage of work, the best model will be determined and used for predictions for the test group. 3.1. Correlation Calculation We represented the data obtained in a convenient form for subsequent analysis and interpretation. Students' answers were translated into Boolean variables: 1 was used if a student agreed with a statement and 0 if otherwise. First of all, we selected from all the characteristics those which produce the most impact on the grade average and retaken examinations using the correlation factor. The correlation factor enables to evaluate the dependence between two or more values and is quite successfully used in education sociology and data mining [10]. To calculate the correlation, we used the function cor(), the "pearson" method, of the RStudio software. Calculation results are shown in table 1. We assume that the value of the correlation coefficient is significant for the purpose of our investigation if its absolute value is more than 0.2 and insignificant, if it is between 0 and 0.2. Table 1. Dependence between academic performance and temperament Characteristic Grade average Retaken correlation examination correlation Activeness at school 0.06 0.09 Perseverance 0.19 -0.24 Setting of priorities 0.29 -0.18 Living in a dormitory 0.06 -0.05 Living with mates -0.01 0.01 Living with parents -0.01 0.02 Living alone -0.04 0 Extroversion 0.07 -0.18 Rationality -0.004 -0.08 Choleric -0.1 0.044 Sanguine 0.05 -0.16 Phlegmatic 0.13 0.017 Melancholic -0.04 0.2 As we can see, the characteristics of “perseverance” and “setting of priorities” have a significant dependence on academic performance, which seems obvious, while temperament and the resulting psychological characteristics are less correlated with the studying success. But since the research is based on these characteristics, they were highlighted in bold, if their absolute value of correlation with at least one of the signs is greater than 0.1. So, the dependence between academic performance and temperament does exist, and it is rather significant in some cases. For example, cholerics have a lower grade average (cor = -0.1) while perseverance of phlegmatics provides for higher marks (cor = 0.13). Sanguine re-take examinations rarely, and for melancholics the probability of retaking exams is considerable (cor = 0.2). The obtained results show that there is dependence between the academic performance and temperament. 3.3. Correlation Calculation Comparison of kNN and Decision Tree algorithms kNN evaluation function demonstrated as follows. Three series of calculations were carried out: • using all characteristics with high correlation and a grade average. The best result was achieved with 8 nearest neighbors (84%). • all characteristics without grade average. The best result was shown with 5 neighbors (60%). This is a good indication as probability of allocation to a required category is twice as higher as random guessing of category (which is 33%) • using only grade average. The best result was obtained with 4 neighbors and was 77% of guessing. This is well-reasoned because a student with a low grade average is more likely to be assigned to the bad students category and, accordingly, more neighbors of such student with a low grade average will be assigned to Bad. So, as we can see it better not to use psychological characteristics when applying kNN method for classification since they impair calculation accuracy as compared with the case of using only grade average. Using of decision tree demonstrated as follows: • the accuracy was 74% when all the characteristics were used for formation • when grade average was excluded the tree was based on Extroversion and Rationality characteristics and the accuracy was 62% • the tree based only on the grade average provided the accuracy of 76% • the best result was obtained when the tree was generated basing on Perseverance, Setting of Priorities, Extroversion, Rationality and temperament type and achieved 84% Fig 4. Decision tree that predicts the categories of the students The accuracy of answers of both models was about 75% but the better result was shown by the decision tree model. Therefore we used this method with these parameters to analyse a test set. Figure 4 shows the decision tree that predicts the categories of the students. 4. Results The students of the first-year groups were classified using the decision tree generated at the previous step. According to this classification students were divided into three groups: Bad, Medium and Good. In order not to reveal personal information, the last name of each student was replaced with the symbol Student 1, Student 2, .., Student N. The distribution of the students in the three categories is the following: 22 students were listed in the Bad category (the category of risky students), 12 students - in the Medium category, 6 students - in the Good category. After the final session in the 4th module and all the retaken exams, it became possible to check the results of the research. Table 2 provides summarized data of the research results. Table 2 Summarized data Predicted Total Students Dropout high average category number retaking exams students or score (more than of students with 7.5) students ILP2 people % people % people % Bad 22 14 64% 6 27% 0 0% Medium 12 3 25% 1 8% 4 33% Good 6 0 0% 0 0% 6 100% The category of Bad has 22 students, 14 students (64%) – retook exams, 2 of 6 students (27%) were expelled from the University and 4 stayed on the basis of ILP due to academic debts on one or two subjects. The category of Medium accounts 12 students, 3 students (25%) retook exams, and were transferred to the second year of studies without any academic 2 ILP – Individual learning plan – is a plan of repeating a failed discipline or a number of disciplines for a certain fee. debts. One student from the Medium category left the University as she made a decision to change her career aspiration. There was no exam retaking among the students of Good category (6 people), all the students from this category continued to study at the University. Thus, we can conclude that the result of the distribution of students into categories is in agreement with the real situation. Attention also should be paid to the average score of students: there are more students with a high average score in the Good category, fewer in the Medium category and there are no such students in the Bad category. Temperament determination also opens new opportunities for individual work with a student. For example, in order to get the best performance of a phlegmatic it is better not to switch from one task to another too often, as a phlegmatic is able to work at one task effectively and for a long time, while switching between tasks is difficult for such person. At the same time new, interesting tasks should be always assigned to a sanguine. In case of a melancholic severe criticism, rudeness or irony are inappropriate, and a negative estimate should be softened or given very carefully. A choleric should always compete and get challenges, however, it is also necessary to sometimes encourage them and help to overcome difficulties, as cholerics prone to quit a matter without seeing it through. As a result of the study, the following recommendations were given to the educational office of the Faculty of Informatics, Mathematics and Computer Science HSE - Nizhny Novgorod: • pay close attention to the students from category Bad (e.g. conversation with students or as a last resort their legal representatives); • provide availability of additional classes in the form of elective for the students of categories Bad and Medium; • engage training assistants into helping students with understanding and fulfilling home assignments; • to pay additional attention to students from the category Medium with a low average score after the first half of the academic year. 5. Conclusion In this paper, the influence of temperament on the academic success of students in the University HSE in Nizhny Novgorod was studied. In the 2017- 2018 academic year we conducted a survey among students of the first and second year of study. This study defined the most important psycho factors affecting students' academic performance: "Perseverance" and "Ability to set priorities": their presence dramatically raised the average grade and decreased the probability of retaking exams. As a rule, hot-tempered cholerics have a lower average grade, while quietness and steadiness of phlegmatics helps them to study better. Sanguines retake exams more rarely, while the risk of it for melancholics is significant. The more extroversion is expressed in a student, the higher his or her average grade is. As a result, using the decision tree method, a statistical model was built. The model distributes students into three notional categories, showing low, medium and high risk of retaking exams: Good, Medium, Bad. The model showed a generally good percentage of prediction: 64% of guessed students retaking exams. Thus, we have found a connection between temperament and academic success, making it possible to predict "risky" students. We believe that our research might be useful to other universities for: 1. identifying academically unsuccessful students and focusing on "risky" students. One of the most important goals of the Faculty of Informatics, Mathematics, and Computer Science (HSE Nizhny Novgorod) is to train and transfer without academic debt to the 2nd as many first-year students as were accepted for the program. Otherwise, the resources of the state (in the case of budget education) or student personal funds (in the case of paid education) spent on education will not be used rationally. 2. forming an individual educational trajectory. At HSE-Nizhny Novgorod today, there are flexible opportunities for switching from one educational program to another using ILP. We assume that forming studying groups, taking into account students personal characteristics’ will increase the performance of each student. In the Faculty of Informatics, Mathematics, and Computer Science (HSE Nizhny Novgorod) in the 2018-2019 academic year several training groups were formed according to the level of students’ knowledge in the discipline of Programming based on the test results. Those with higher results were put together in one group, medium studied with medium, and there was also a group of weaker students. Training becomes more effective if you teach discipline by level: by loading strong students with complex tasks (without losing their motivation to study) and “bringing” the weak and medium ones to the level of strong ones. Our colleagues from the Linguistic Department (HSE Nizhny Novgorod) have been teaching foreign languages by levels for several years already: the total pool of 1-year students, regardless of the educational program, is divided into levels such as A (Upper Intermediate) B (Intermediate) C (Pre-Intermediate), which improves the quality of teaching a foreign language for each student. We understand that there are certain limitations to our study. In this paper, a small pool of baseline data is presented (140 student responses). And the Faculty of Informatics, Mathematics, and Computer Science (HSE Nizhny Novgorod) is not large enough to talk about “real” Big Data and use all the opportunities of machine learning methods. In the future, we plan to use the longitude data for results verification over several years. It would also be interesting to try other data mining methods to predict academic failures and dropout. 