The Visualization of Educational Measurement Results in Adaptive eLearning Systems for the Analysis of the Assessment Materials Quality Yulia Lavdina 1, Oleg Gustun 1 and Evgeniy Antonov 1,2 1 National Research Nuclear University MEPhI (Moscow Engineering Physics Institute), 31 Kashirskoe Shosse, Moscow, 115409, Russian Federation 2 Plekhanov Russian University of Economics, Stremyanny lane 36, Moscow, 117997, Russian Federation Abstract In the adaptive eLearning system educational measurements are used to control the learning process realized on the feedback approach. Based on the current educational measurements results of the students’ achievements the decision is made on the formation of the educational content given to them at the next stage of learning. To obtain reliable and accurate assessments of the students’ educational achievements, it is necessary to use assessment materials that satisfy the requirements for reliability, homogeneity, discriminatory power, validity and other characteristics. In order to evaluate the values of these quality indicators of assessment tools, it is necessary to do sufficient statistical research of the educational measurements results. At the same time the researcher often encounters the situation when the measurement data are insufficient to achieve the required significance level of the conclusions obtained. In this case assistance in the control decision support can be acquired by analyzing the visual presentation of the educational measurements results. We propose the visual presentation of the assessment materials characteristics, which helps to evaluate the values of quality indicators of assessment tools during the accumulation of new measurement data. This information combined with a visual analysis of the results of students’ achievements makes it possible to identify the trends in the changes of dynamics in students’ competencies and to form the educational control action corresponding to their academic performance levels. Keywords 1 Educational measurement, adaptive control, visual analysis, quality indicators, assessment materials. 1. Introduction During the interaction among the participants of the educational process there appear information flows aimed at both sides. A student receives educational materials, tasks and instructions how to do them from a tutor. The student gives the tutor the solutions of the tasks used to put the student a mark. These information flows differ in various characteristics: in the volume, intensity, interactivity and others. The goal-oriented change of these characteristics enables to organize the learning process control. The change of the educational impact on a student must be made on the basis of reliable information about the student’s state. Therefore for the learning process control it is necessary to have appropriate tools to measure the current student’s academic performance level. In accordance with measurement goals educational measurement tools can be divided into two types:  final measurement for obtaining total characteristics of the student’s academic performance after completing the learning process; GraphiCon 2021: 31st International Conference on Computer Graphics and Vision, September 27-30, 2021, Nizhny Novgorod, Russia EMAIL: julia_lavdina@mail.ru (Y. Lavdina); gustun@gmail.com (O. Gustun); eantonov@kaf65.ru (E. Antonov) ORCID: 0000-0002-1188-8119 (Y. Lavdina); 0000-0001-7197-0459 (O. Gustun); 0000-0003-1498-9131 (E. Antonov) ©️ 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org)  current measurement for obtaining feedback during the control of the learning process. Different types of tools must meet various demands in quality indicators and accuracy indicators. For the final measurement it is necessary to get valid assessment with high reliability and maximal accuracy. The cost of an error is high for a student here as it is impossible to change the final mark afterwards due to the fact that the learning process has already been completed [1]. Accuracy is not a strict demand during current measurement. The cost of an error is not as high but it is essential to reduce the time of measurement and simultaneously to effectively divide students into classes to which the discriminatory power of the test must conduce. The values of quality indicators of assessment tools are evaluated on the basis of statistics research of educational measurement results [2]. In this case applied calculation methods depend on the type of measurement instruments and the way of their application, measurement conditions, measuring scales [3]. Researchers quite often face difficulties by interpreting results even in the most typical measurement conditions and simplified calculation methods, for example, when initial data are essentially noisy or insufficient to achieve the required level of conclusions reliability [4]. In this case data visualization can be helpful in both choosing methods of data processing and calculation methods and in support of making decisions during control [5]. 2. Adaptive control of the learning process As a process developing in time, learning can be divided into separate stages. At each stage a student acquires a part of an educational material course. The methodology of acquiring this separate part can recur from stage to stage, it is only the educational content that changes. For instance, at another stage of learning a student gets acquainted with theoretical material of the current course theme (obtains knowledge), does practical assignments (forms competences), gets activity experience (develops skills). At the next stage these steps recur for a new theme. Thus, learning process develops according to the spiral, a student’s and a tutor’s actions recur cyclically at each new learning stage. The effectiveness of such learning process depends a lot on the fact how rationally it is organized. Profundity and strength of knowledge acquired by a student, their success in fulfilling assignments, the amount of experience obtained are important for each learning stage. And these characteristics are interconnected. To influence the total learning process is possible by acting upon separate steps of acquiring a theme at each learning stage. Profundity of learning theoretical material depends on the amount of educational information and time spent on it by a student. Values of these parameters determine the intensity of learning which can differ for different students. Strength of theoretical knowledge increases when a student applies them in practice by fulfilling educational assignments. One benefits most of all from completing the assignments which correspond to the student’s academic performance level [6]. The influence on learning process effectiveness is achieved by successful fulfilment of the practical assignments difficulty level of which corresponds to the student’s academic performance level. Skills development is evaluated by a student’s experience obtained and depends on the time they spent on working with educational content and the number of efforts they made. The intensity of learning process, the difficulty of educational assignments, the time of working with educational content are parameters which can be changed by acting upon every student individually. In order to evaluate the values of action parameters it is necessary to have a student’s model and information about its current state [7]. A student’s model must involve the student’s knowledge, competences and skills characteristics which are latent and not directly measurable. They can be judged by features that are possible to assess by educational measurements methods. We will consider the student’s model which have the student’s academic performance level of solving practical problems as a parameter [6]. Student gain this level after their learning theoretical educational material. The evaluation of this parameter is obtained as a result of educational measuring by a test which is held at each stage of the learning process. It enables to organize adaptive control of the learning process with the use of feedback [8]. If one uses the difficulty of practical problems as the parameter of control action, the control consists in selecting the assignment whose difficulty corresponds to (or excesses a little) the student’s academic performance level of the course theme given (Figure 1). Figure 1: Scheme of adaptive eLearning control system To realize feedback one uses the measurement device which assesses the student’s academic performance level of solving problems during interactive work with the student during computer testing. 3. The quality of educational measurement materials The main advantages of computer testing as a procedure used for the current measurement of a student’s academic performance level are the promptness of conducting an individual test and the rapid processing of its results. Such educational measurements are not difficult to organize for mass use, they can be regularly conducted within minimal educational time [9, 10]. In this article we are analyzing the case when a computer test contains several items with alternative choice of answers presented to a student consecutively. Each item of the test is going to be assessed dichotomously: a student gets score 1 for a correct answer and score 0 for an incorrect one. Item difficulty and variance are calculated as indicators for items in classical test theory. Item difficulty is defined as the proportion of examinees who answer an item correctly [11]. This is a technical term, which was traditionally formulated somewhat contradictorily: the easier item actually has the higher item difficulty value. The total score mean and total score variance are calculated for the test which contains several items [12]. Moreover, as a measurement tool the composite test being analyzed must possess necessary quality indices such as reliability, homogeneity, discriminatory power, validity [13]. These indices are calculated on the basis of information about the score distribution of each item and its relationship to other items in the test, they are evaluated by statistical research of the educational measurement results. The use of visualization methods in analyzing test data by the methods of classical test theory is limited by the fact that researchers mainly build graphs and diagrams based on calculated indicators and characteristics that reflect summarized or complex properties of the objects of research. A large amount of source and intermediate numerical data are not taken into account in the analysis, since researchers find them difficult to illustrate, perceive and interpret. This leads to the fact that information that may be important and interesting for the researcher eludes the analysis. Below we give the examples of tables of source data, for which the values of the summarized indicators are equal, but the quality of composite tests varies significantly. Unfortunately, most of the researcher's tools do not have a well-developed and well-thought-out visualization functionality. At least in the literature there is no description of such tools. Therefore, we propose an approach to data analysis in which visualization methods are applied both when presenting descriptive statistics data and when performing statistical inference. When doing research of the educational measurement results during the adaptive control of the learning process, which is proposed in [6; 7], visualization methods make it possible to accelerate the decision-making on controlling action in limited time conditions. 4. The visualization of descriptive statistics of a composite test 4.1. Score matrix In order to calculate item difficulty and their variance a researcher makes person-item score matrix, each its entry involves score of student i on test item j. The example of matrix shown in Figure 2 is presented according to [11]. In the rows below the matrix the values of item difficulty pj and their variance σj are displayed. In the column which is on the right of the matrix the values of students’ total scores for the test ri are indicated. Figure 2: Person-item score matrix These additional rows and the column help a researcher analyze test results for both evaluating the test quality and assessing students’ academic performance level. But the values in the entries of person- item score matrix themselves do not give illustrative presentation of information contained in the matrix. Conducting the current test is usually aimed at assessing a student’s academic performance level of a local theme of an educational course. Therefore the number of concealed factors characterizing a student and influencing the result of doing the test is quite limited. The analysis of the current test results is often aimed at finding the value of only one latent variable parameter of a student’s model. Then in order to display the matrix we can use the approach applied to constructing Guttman Scales [14], which demonstrate a cumulative effect. Firstly, matrix columns have to be sorted by values of item difficulty (for example, in the descending order of pj); secondly, matrix rows must be sorted by values of students’ total scores (for instance, in the descending order of ri as well). Thus, the matrix will try to look like upper triangular matrix relatively to the secondary diagonal. The sorted matrix presented in Figure 3 enables to interpret test results more illustratively than the one in Figure 2. Visual matrix perception improves if its entries are colored differently in accordance with the values contained in them (Figure 3, left matrix). The coloring on the right matrix of Figure 3 corresponds to the ideal distribution of matrix elements in the Guttman Scale, areas of 1 are indicated in blue tones, and areas of 0 are indicated in red tones. Usually the quantity of students who have completed a test significantly outnumbers the quantity of test assignments, therefore person-item score matrix is, as a rule, rectangular and long vertically. At the same time it can have a big number of the same rows. In this case after sorting matrix rows there appear the blocks of the same rows located together. These blocks can be replaced by one row with the indication of the fact how many times it is found in the matrix. It is also necessary to memorize identifiers of students whose answers are the same. It is convenient to illustrate the number of the same rows in each block by the bar chart located vertically on the right and left of matrix. Figure 3: Colored person-item score matrixes 4.2. Variance-covariance matrix To analyze the total score variance of a test variance-covariance matrix is made. It is the symmetric matrix on the main diagonal of which the values of each item scores variance are located. In the entries there are values of covariance for each pair of test assignments [15]. The sum of values in all of the entries of variance-covariance matrix determines the total score variance of a test so a researcher has to take into consideration the covariances for all different pairs of assignments. If a test consists of n assignments, the number of covariances is equal to n(n–1)/2 (for the given example this number is equal to 10). This number grows in quadratic dependence from n. The situation is getting more complicated due to the fact that the values of covariances cannot be directly interpreted. For this reason it is not easy to conduct the analysis of covariances for a test consisting of quite a big number of assignments [16]. However, this is the task which a researcher has to do. The reason is that covariances of test assignment pairs can have negative values, which means that assignments in a pair act in an opposed way (if the first assignment is correctly done, the probability of a wrong answer to the second one increases). In this case summed up with other positive covariances total score variance turns to be low. To visualize the analysis of variance-covariance matrix the heatmap is used when matrix entries are colored in different color palette in accordance with covariance value [17]. It is convenient to use two colors here for positive and negative numbers with their gradations. The heatmap of the variance- covariance matrix for the example analyzed is displayed in Figure 4. It is useful to demonstrate the examples of person-item matrices (Figures 5 and 6), for which dichotomous values of scores are distributed so that all parameters of difficulty and variance equal the corresponding values of matrices depicted in Figures 2 and 3. Figures 5 and 6 show two examples of the results of measurement conducted by using two composite tests containing the same number of items. For these tests, both total score means and both total score variances are equal, as well as the difficulty and variance values for each of the corresponding test items. The reliability values of these two tests calculated by using the Cronbach’s coefficient are also the same. Only covariances values in the variance-covariance matrixes vary, which is graphically shown in the matrixes heatmaps. Figure 4: Colored variance-covariance matrixes Figure 5: Example 1, person-item score and variance-covariance matrixes Figure 6: Example 2, person-item score and variance-covariance matrixes Without the analysis of the variance-covariance matrix, it is impossible to detect a difference of the quality of the tests. For a test with a large number of items, the order of matrix increases significantly, and the advantages of using a heatmap to analyze it become apparent. Moreover, the heatmap helps when we want to make a new test from separate assignments with familiar parameters of difficulty and variance, which have never appeared in the same test. Then in order to receive the forecast of total score variance of a new test we can use the sorted person-item score matrix by which approximate covariances values are calculated. 5. The visualization of results of statistical inference Visual analysis of indicators obtained by statistical inference is possible with both the help of earlier described ways of data presentation and on the basis of other methods. Reliability index expresses the degree of relationship between true and observed scores, and it is determined by using the correlation coefficient of them. The reliability can be calculated by means of the variance of the composite scores and the covariances of the composite test items. Reliability coefficients are calculated by different method depending on conditions and purposes of testing. For example, Cronbach's coefficient alpha enables to estimate the lower bound of the reliability coefficient for a test on the basis of results of the single administration of the test. To calculate it we can use the values of variances of all items and total score variance of a test: 𝑘 ∑ 𝜎𝑗2 𝛼≥ (1 − 2 ), (1) 𝑘−1 𝜎 where 𝑘 is the number of items on the test, 𝜎𝑗2 is the variance of item 𝑗, 𝜎 2 is the total test variance. For two tests the matrices of which are displayed in Figures 5 and 6 the values of items variances and total score variance are the same in pairs. It means that reliability coefficients for these two texts are equal. However, the visual analysis of variance-covariance matrices makes it possible to make a conclusion that these tests differ and the measurement can give various errors. For this reason in order to calculate reliability coefficients they usually use several methods at the same time adding the results analysis of descriptive statistics to them. For this reason in order to calculate reliability coefficients several methods are simultaneously used and added by the analysis of the descriptive statistics results. By analyzing the homogeneity of parallel test forms the results visualization consists in drawing diagrams reflecting the inner features of a test assignment and relations among their different variants. In Figure 7 there is an example of user-researcher web-interface in which data for the analysis of homogeneity of two different variants of a test item are reflected. The values and diagrams enable a researcher to make the conclusion that characteristics of two variants in the same test are similar and their differences are not statistically significance. 6. The technical implementation of researchers tools Technical implementation of statistical results visualization, scheme displayed in Figure 8, presupposes an analyst’s work with the web application having access to a database of adaptive eLearning system, which collects and stores measurement data. The PostgreSQL is used as a database for storing students' answers and the main information of the education process. An algorithm of the quality of educational measurement materials is developed in Python programming language and is added to the web application. We used the Apache Superset platform as a visualization tool, it allows us to create the dashboard (presented in Figure 7). This approach was implemented and tested. To sum up, this stack of technologies provides an analyst opportunity to estimate education materials. The Apache Superset visualization platform is a key tool to find and understand what education materials should be modified. Figure 7: Researcher’s interface for the analysis of homogeneity Figure 8: Scheme of technical implementation 7. Conclusions The choice of data visualization method depends on analysis goals, the researched data character, the content of another stage of analysis. The visualization of calculation results should be used not only for interpreting final results and their evaluation but also between the stages of analysis. It is useful to order the rows and columns of the person-item score matrixes and check it against the Guttman Scale already at the initial stage of research. If we consider the sorted person-item score matrixes as black-and-white image, then perhaps the use of image processing methods will provide additional information of interest to the researcher. When calculating descriptive statistics indicators, it is necessary to find the variance-covariance matrix and build a heatmap for it. Without its analysis it is impossible to guarantee the quality and reliability of the composite test. Visual data presentation helps a researcher put forward hypotheses, check the correctness of models, control conclusions quality, determine further research strategies. The proposed methods for researching the educational measurements results are suitable for the adaptive control of a learning process, and the developed visual analysis tools make them more accessible to researchers. 8. References [1] C. V. Gipps, Beyond Testing: Towards a Theory of Educational Assessment, Falmer Press, London, 1994. [2] W. Finch, B. French, Educational and Psychological Measurement, Routledge, 2019. [3] C. Secolsky, D. B. Denison, Handbook on Measurement, Assessment, and Evaluation in Higher Education, 2nd. ed., Routledge, 2018. [4] D. Cramer, Advanced quantitative data analysis, Open University Press, Maidenhead- Philadelphia, 2003. [5] R. Mazza, Introduction to Information Visualization, Springer-Verlag, London, 2009. doi: 10.1007/978-1-84800-219-7. [6] N.M. Leonova, Sintez algoritmov adaptivnogo strukturno-parametricheskogo upravleniya obrazovatelnoy deyatelnostyu: Pod red. A.D. Modyaeva, MEPhI, Moskva, 2006. [7] N.M. Leonova, M.V. Markovskiy, Imitacionnye matematicheskie modeli processov adaptivnogo upravleniya obrazovatelnoy deyatelnostyu: Pod red. A.D. Modyaeva, MEPhI, Moskva, 2006. [8] S. Sastry, M. Bodson, Adaptive Control: Stability, Convergence, and Robustness, Prentice Hall, New Jersey, 1989. [9] W. Linden, G. Clas, Computerized Adaptive Testing: Theory and Practice, Springer, 2000. doi:10.1007/0-306-47531-6. [10] D. Yan, A. Davier, C. Lewis, Computerized Multistage Testing: Theory and Applications, Chapman and Hall/CRC, 2016. doi:10.1201/b16858. [11] Crocker Linda, Algina James, Introduction to Classical and Modern Test Theory, Harcourt Brace Jovanovich, New York, 2006. [12] F. M. Lord, M. R. Novick, Statistical Theories of Mental Test Scores, IAP, 2008. [13] R. P. McDonald, Test Theory: A Unified Treatment, L. Erlbaum Associates, 1999. [14] P. E. Lester, D. Inman, L. K. Bishop, Handbook of Tests and Measurement in Education and the Social Sciences, 3rd. ed., Rowman & Littlefield, 2014. [15] Dato N.M. de Gruijter, Leo J. Th. van der. Kamp, Statistical Test Theory for the Behavioral Sciences, Chapman and Hall/CRC, 2008. [16] C. R. Reynolds, R. A. Altmann, D. N. Allen, Mastering Modern Psychological Testing, 2nd. ed., Springer, 2021. doi: 10.1007/978-3-030-59455-8. [17] P. C. Bruce, A. Bruce, Practical statistics for data scientists: 50 essential concepts, 1st. ed., O'Reilly Media, 2017.