=Paper=
{{Paper
|id=Vol-2920/paper_4
|storemode=property
|title=Educational Data Mining Methods for the Analysis of Student's Digital Footprint
|pdfUrl=https://ceur-ws.org/Vol-2920/paper_4.pdf
|volume=Vol-2920
|authors=Evgenia Baranova,German Shvetcov,Tatiana Noskova
}}
==Educational Data Mining Methods for the Analysis of Student's Digital Footprint==
Educational Data Mining Methods for the Analysis of Student’s Digital Footprint∗ Evgenia Baranova German Shvetsov Tatiana Noskova ev_baranova@mail.ru shvetzoff.german@yandex.ru noskovatn@gmail.com Herzen State Pedagogical University of Russia, St. Petersburg, Russian Federation Abstract The collection and analysis of a student’s digital footprint is an integral part of the devel- opment of education in the Russian Federation in the context of digitalization of the economy. Among other things, such practice is associated with the assessment of the formation of students’ competencies in blended learning. The study aims to develop approaches, methods and tools for analyzing data generated in the digital education environment (DEE), in order to identify corre- lations between the structure and content of educational programs and students’ performance. Educational data mining (EDM) methods as well as methods of design and development of information systems and databases are relyed upon. The following methods of data processing were used: correlation analysis and distillation of data for human judgement. The experimental base of the study is the DEE of the Herzen State Pedagogical University of Russia, which ensures the formation of data on educational programs, individual learning paths of students and their successful development. A large amount of detailed data collected for over 10 years on various aspects of the educational process was used for the analysis. As a result, the authors have developed a tool for collecting and analyzing the digital footprint of students in the context of blended learning using EDM methods. In accordance with these methods, calculations were made that allow to draw preliminary conclusions about the nature of relationships between the selected indicators (the number of activities on the course, the partici- pation of students in the course chat(s), the final assessment for the course, the time spent on the course) obtained as part of the students’ work in the Moodle distance learning system (Moodle DLS) and the results of midterm assessment. For example, the value of relationships between the indicator of the final assessment for online education courses (OEC) and results of the midterm assessment is 0.7108, which indicates a strong correlation. The value of relationships between the number of activities on the course and the grade for the midterm assessment (0.314) shows a weak correlation. On the basis of such calculations, it is possible to formulate assumptions on how par- ticular activities may help in mastering various disciplines and on the OEC effectiveness, as well as prepare recommendations for updating educational resources in order to improve their quality. The experience, approaches, and results presented by the authors demonstrate the feasibility of EDM methods for the analysis of educational process. Keywords: student’s digital footprint, data mining, educational data mining, big data, online learning, digital education environment ∗ Copyright c 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 1 1 Introduction Digitalization of the economy presents a significant challenge for the entire education system. Digital economy requires the education system not just to “digitize” individual processes, but to create an integrated approach that would set new goals, change the structure and content of the educational process. An effective education involves not only the transfer of information from professor to student, it also requires complex social interactions and adaptation to the needs of each student and their cultural and social context. Digital transformation changes the content, forms and methods of ed- ucation. “Digitization” primarily affects school and university education systems. AI-driven mass online courses, chatbots and lesson plans are just a few examples of digital transformation in higher education [Kurbatsky, 2019]. Rapid introduction of information technologies in various spheres of our life, including education, leads to a partial or complete transfer of human activity to the virtual environment. Information posted by users on their pages in the digital environment can be considered as a product of activity that allows for identifying the individuality of a particular person, making assumptions about their psychological characteristics and cognitive abilities [Tulupyeva et al., 2015]. The continuous digital transformation of society makes its own adjustments to traditional learn- ing models, as well as to online education models [Shamsutdinova, 2020]. Building student digital profiles based on analysis of the so-called digital footprint is one of the recent trends. In the latest studies, the concept of digital footprint of a student is interpreted in different ways, but there is a general perspective — a transition from a narrow, technical understanding of the footprint as a set of IP addresses and reports on the network activity of students to a broader consideration. According to O. P. Zhigalova: “Digital footprints considered as the result of educational and professional activities in a digital format represent data that allow for determining the level of com- petencies, forming a learning path, evaluating opportunities and strategies for further improvement and professional development in a particular field.” [Zhigalova, 2019] According to V. N. Kurbatsky: “The digital footprint is an array of data on the results of a student’s educational and project activi- ties, including all the materials that the student creates: presentations, prototypes, audio and video recordings, road maps, etc.”[Kurbatsky, 2019] The prevalence and influence of social networks on the digital image of students is growing. In the works of S. A. Vartanov [Vartanov, 2018]; G. A. Nikolaenko [Nikolaenko, 2019]; T. V. Tulupyeva [Tulupyeva et al., 2016] and others, we can find that such information can be very different and include hundreds of footprints: from photos and videos to comments, likes, reposts and other virtual activities. The digital footprint of a student starts from the moment of entering the university and con- tains personal data, information about admission, study department, educational program, academic performance during the entire period of study, as well as educational analytical data collected au- tomatically when students use distance learning systems and open education platforms for online courses. The digital footprint in the educational space is an alienated result of a person’s educational activity. The entire educational space that has a digital footprint must be created with the use of pedagogical design. The methods of processing information about the educational process, in particular the digital footprint of its subjects, include educational data mining (EDM) methods, which provide extensive opportunities for analyzing and interpreting the results obtained. However, despite the obvious ad- vantages of the EDM methods, research related to their use is clearly insufficient for simulating higher educational processes. There are no developed methodology, tools for data analysis and data interpretation based on these methods and designed for: 2 • identification of critical points in the process of mastering educational programs, disciplines and modules that cause the greatest difficulties for students, • determining the relationships and mutual influence of traditional and distance learning compo- nents on the success of students in mastering educational programs, • early identification of students in the group of risk who can be expelled, etc. The study targets the development of methods and tools for analyzing data generated in the dig- ital education environment (DEE) to identify links between the structure and content of educational programs and results of students’ learning activities in order to optimize the educational process. 2 Literature analysis In the Russian and foreign literature data mining (DM) is interpreted as a collective name for a set of methods, that is also referred to as “information extraction”, “data excavation”, “knowl- edge extraction”, “template analysis”, “knowledge discovery in databases” in other papers and books [Ovsyanitskaya, 2013]. DM approaches are based on classification, clustering, simulation and forecasts, decision tree construction, evolutionary programming, and fuzzy logic. Initially, the DM methods were based on research in the field of applied statistics. To date, the development of such methods is significantly associated with research of artificial intelligence (pattern recognition, search for knowledge represen- tation models, neural networks, genetic algorithms), papers on information visualization, machine learning, etc. The DM technology is supported by the concept of patterns which describe regularities expressed in human-readable forms. The modern research related to the application of methods of data mining, machine learning and statistics on various types of information about the educational process is called educational data mining (EDM) [Belonozhko et al., 2017][Zorić, et al., 2020]. Within the framework of this field of study, models and methods of educational data processing aimed at improving the educational process are developed. EDM specifies the source area of large data sets (Big Data, hereinafter referred to as the BD) in relation to the educational process; aims to find patterns (samples, templates, schemes, regularities) that are characteristic of this subject area. A detailed overview of the key EDM components was presented in [Peña-Ayala, 2014]. A. Peña-Ayala identified three categories of main research modules related to EDM: problems, methods, and algorithms. The use of EDM is also considered in studies conducted to understand how students interact with educational technologies [Angeli et al., 2017]. The paper [Manyika et al., 2011] states that EDM methods make it possible to build models for describing various aspects of the educational process in order to identify factors that affect the success of mastering of learning paths by students, including teaching, administrative and organiza- tional activities. The paper [Castro, 2007] suggests that clustering and visualization methods can be applied to analyze educational data and thus improve the process of electronic learning. Following the study [Zamora-Musa & Velez, 2017] with the use of EDM, association rule method, based on the data obtained from the survey, connections between the categorical variables of the survey (envi- ronmental efficiency, environmental influence, environmental utility, control of the environment, ease of learning in the environment) are identified to further improve the immersive environment (aug- mented reality technology, provide the effect of full or partial presence in an alternative space). In the study [Razaque & Alajlan, 2020], the influence of external factors (relationships, alcohol consump- tion, parents’ education, frequency of meetings with friends, place of residence) on student academic performance is determined using EDM methods. According to V. A. Larionova and A. A. Karasik, data mining allows to study the behavioral pattern of students; to learn how to take into account individual characteristics of each in the formation 3 of educational paths [Larionova & Karasik, 2019]. The study of A. Abdulmohsen suggests that the main data sources used in EDM nowadays include the following categories: 1. Traditional learning, when the knowledge is passed on to students through personal contact. Data can be collected by traditional methods, such as observation and questionnaires, which allow to study cognitive skills of students and determine the success of their learning. The statistical method and psychometry can be applied to the data. 2. Online learning and education management systems (EMS) provide students with materials, instructions, communication, and reporting tools that enable them to learn independently. Data mining techniques can be applied to data stored in databases. 3. Systems of Intelligent learning and adaptive educational hypermedia provide content to students based on their profiles, so it is necessary to use data mining methods to analyze the created user profiles. Specifically, the authors of the following researches [Ezekiel & Mogorosi, 2020], [Rawad & Rémi, 2020] considered the use of data mining methods to identify relationships between the interaction of stu- dents with the DLS elements (files, tests, forums) and their final academic performance, as well as relationships between the data of students of the mass open online course (MOOC) — age, gender, geographical region, education, socio-professional status, and their academic performance. Our literature analysis: [Abdulmohsen, 2016], [Baker & Yacef, 2009], [Bowers et al., 2012], [Bowers, 2010], [Shrestha & Pokharel, 2021], allows to identify the following areas of use of EDM methods in educational practice, the most important for higher education: • modeling the behavior of students in the learning process in order to predict the development of their cognitive abilities, to identify the contingent of students at risk, with a high probability of being expelled; • development of new models and ways of presenting knowledge in the subject area that would correspond to diverse learning styles and cognitive capabilities identified in students using EDM methods; • studying the processes of interaction of students with the digital environment, identifying the components of the environment, studying the effects that the educational environment has on the results of learning. 3 Research Methods Within the framework of this study, the focus is on the components of digital footprint associated with the development of learning paths by students. The target of the study is the components of digital footprint, represented by midterm academic performance of students in the form of grades obtained during the study of disciplines, practical studies and modules, as well as results of students’ educational activities in Moodle DLS in the process of mastering the educational program and their correlation. The research is carried out with a view to identifying critical stages for students and optimization of the structure, content and methods of educational programs implementation in high education. Research tools include information systems and databases for generating and collecting data, and EDM methods for analyzing, processing, and measuring the data. The method of analyzing the student digital footprint developed by the authors is based on the formulated in [Lunkov & Kharlamov, 2014] stages of EDM methods application for data analysis 4 and consists of the content of the stages specified in relation to the subject area in accordance with the study’s aim, the characteristics of means and tools for obtaining, processing, analyzing data and presenting results. The first stage involves setting an objective and identifying entities of the subject area under consideration. The need to integrate various technologies of distance learning and traditional forms of implementation of the educational process is dictated by the modern world. In this environment, analyzing and identifying the relationships of student performance indicators in the implementation of various forms of education becomes relevant. Let us highlight the main information sources of data: • an educational program, which is a set of key features of education (volume, content, targets), organizational and pedagogical conditions and forms of assessment, presented as a curriculum, a calendar curriculum, work programs of academic disciplines, courses, disciplines (modules) and other components, including evaluation and methodological materials; • the activity of the faculty, including development of teaching materials (lectures, practical classes, tests), including in a digital format, assessment of students’ current and midterm aca- demic performance, etc.; • students’ learning activity, which is evaluated in the course of current, midterm and final aca- demic performance. Specification of the relationships between these information sources of the educational process allowed to develop a chart of information flows, shown in Figure 1. Figure 1: Information Flow Chart 5 The second stage involves the preparation of data for EDM and it is based on the methods of de- signing and developing information systems [Pedersen, 2019] and databases [Campbell & Majors, 2017]. At this stage, data modeling is carried out, i.e. the definition and analysis of data requirements that are necessary for the implementation of EDM, and the development of structures for operational, reference and archival databases (DB) to represent data sources generated using internal and external information systems. To implement the information flow chart presented in Figure 1, the Herzen State Pedagogical University has developed and rolled out an integrated educational management information system (EMIS) in the educational process http://oio.herzen.edu.ru. The EMIS is integrated with various DEE components, including Moodle DLS. It includes a set of information systems (IS) and web resources based on the distributed database HERZEN, which provides storage of complete information about various aspects of the educational process, including • data on the main professional educational programs carried out; • academic performance of students; • information about the academic load of faculties; • schedules of classes and exams; • data on e-learning courses conducted by teachers used in the implementation of educational programs (disciplines, modules, practical classes), etc. Data is formed in the database for subsequent analysis by means of certain tools (information systems) – the EMIS components, developed together with the authors. IS “Training and Work Plans” is the tool for generating data on implemented main professional educational programs. It provides the creation of electronic curricula of educational programs in accordance with the requirements of the Federal State Educational Standard of Higher Professional Education and the Federal State Educational Standard of Higher Education, the automatic formation of work plans and the volume of training assignments of departments for all levels of education and forms of training. To collect data on the midterm academic performance of students, the IS “Dean’s Office” is used. It provides the formation of students’ personal data, information about characteristics of the student’s educational program, information about the structure and composition of the educational program, and data on individual learning paths as well as the success of their mastering [Baranova et al., 2020]. Academic load is distributed between teachers using the IS tool “Department Load”. A system designed to create an electronic schedule of classes and exams for students in line with work plans and distributed load of teachers is the IS “Schedule”. The generated schedule is checked for correctness: compliance with the restrictions on the number of training hours per day for the group and the teacher; the teacher can not be assigned classes at the same time with different groups; the classroom can not be assigned to conduct different classes at the same time; lectures should be conducted only by teachers with a degree, etc. All these tools allows for generating information in databases about the core vocational education programs (CVEP), students, the workload of the faculty for data analysis using EDM methods. Nowadays, electronic learning and distance learning technologies are widely used. To obtain complete and diverse information about the educational process for EDM, it is necessary to gener- ate data on the content and results of electronic learning in DLS. Today, one of the most popular systems for supporting the educational process in the context of distance education is Moodle DLS. The platform’s most important advantages that make it popular are openness, mobility, portability, extensibility, widespread use, etc. Moodle DLS contains a wide set of tools for creating and maintaining electronic learning courses, various opportunities for presenting educational materials, monitoring the progress, ensuring the 6 feedback between teachers and students, which allows students to post their work in DLS, receive reviews and teacher’s advice [Zaitseva, 2012]. Training courses, simple in structure, contain educational and methodological materials for the development of the discipline in the form of presentations, lecture texts, lists of literature, exam questions, etc. Most of the courses published in the university’s Moodle DLS involve a wider use of various DLS tools and include, as a rule, a working program of the discipline; educational and methodological materials for mastering the discipline; methodological recommendations on studying the e-learning course and preparing for various types of classes and assessment for students; a forum for communication and discussion of questions on the course for students and the teacher; assessment funds for monitoring students’ knowledge. Moodle DLS, based on MariaDB DBMS (a branch of MySQL DBMS), provides storage of vari- ous data on students’ interaction with the digital environment during the learning process, including current academic performance. Moodle database query engine allows getting various data sets, in- cluding • general information about the course, teachers, and students; • the amount of time spent by the student in the online course; • the number of changes made by students in the various modules of the course; • the results of completed tasks by students in the course; • information about student activity on course forums, etc. The tools that provide a link between the data sources presented in EMIS and Moodle DLS include IS “Electronic Monitoring” and “Electronic Atlas” web resource — components for monitoring the activities of the faculty related to online learning. The systems provide an opportunity for the teacher to form an application for the creation of an e-learning course in Moodle DLS, as well as the ability to process such an application. Such resources allow organizing a link between the data on disciplines, courses in Moodle DLS and the teachers who implement education. The list of disciplines and the teacher’s e-learning course are displayed on their personal page in the “Electronic Atlas” (Fig. 2). This connection allows establishing a correspondence between the body (students, faculty) and their activities in Moodle DLS, provides an opportunity to obtain more detailed information about the structure of specific courses. The third stage includes the collection and analysis of the final data, the definition of EDM methods for their processing and visualization of the results. Correlation analysis and the method of Distillation of Data for Human Judgment were used as EDM methods to identify the relationships between the entities of the subject area. 7 Figure 2: Web resource “Electronic Atlas”. The section “Teacher’s Disciplines” The term “distillation of data” was introduced by D. Siemens and R. Baker and involves data clustering, i.e. automatic division of elements into groups based on the “similarity” of a set of char- acteristics in order to reduce the dimension of the original data set, and data visualization, i.e. representation in a form that promotes human perception [Siemens & Baker, 2012]. At this stage, it is planned to use tools designed to obtain summary final data for analyzing the quality of the educational program as a whole according to various criteria, such as the success of the program mastering by students, the qualification characteristics of the faculty, the most difficult components of the program to master, the demand for electronic educational courses in disciplines, etc. The web resource “Digital Passport CVEP” developed by the authors allows obtaining quan- titative values of the indicators of the above criteria for analyzing the structure, composition, and various aspects of the implementation of educational programs. For example, to generate for each educational program: • data on the structure of the personnel implementing the program: academic degrees and positions of teachers, participation of teachers in the implementation of the CVEP (disci- plines/modules, practical classes, management of the thesis and SFE), departments’ partici- pation in the implementation of the CVEP, teacher activity in Moodle DLS (number of hours, structure of developed courses), etc. • data on the structure of the student body: the percentage of foreign citizens, the share of students on a contractual basis, the movement of the student body (admission, expulsion, graduation), employment of graduates [Shrestha & Pokharel, 2021] (availability of work, job in one’s degree field, employment according to the field of activity); 8 • summary data on student attendance and academic performance, student ratings in the group, learning activities in Moodle DLS, etc. For visual representation, the generated data can be aggregated by the selected attributes: form of study, level of education, field of study/major, orientation (scope of education), on a bud- getary/contractual basis, etc. The method proposed by the authors allows determining the nature of the relationships between the selected entities of the subject area, analyzing the student’s digital footprint in order to make changes to the content of educational programs and predicting the success of students’ mastering of learning paths. 4 Results The authors developed the tool named “Digital Passport of Core Vocational Education Program” (DP CVEP). This tool provides access to analytical data on the CVEP and its implementation, allows to track the dynamics of student expulsions, and to identify the categories of students who are at risk of being expelled, educational programs, and training areas that are difficult to master at early stages. The tool allows to automatically cluster data on the CVEP (faculty, student body) by certain characteristics (staff of the faculty, academic degrees, workload by type of contract, summary student performance, current students) and visualize them. The resource is a component of EMIS, created on the basis of modern technologies for developing web solutions [Stauffer, 2019], currently includes a set of reports for the university management, which allows to generate and visualize analytical data on the implementation of CVEP, teaching staff, student body, and class attendance. To assess the intensity of students’ work in Moodle DLS, indicators were identified, for example, the number of actions performed with the course elements, including the number of clicks on the course pages, loading resources, clicking on links, sending messages in forums and chats. Highlighted indicators also included: the activity of students in the chat(s) of the course (the number of messages sent in the chat(s)); the final grade for the course (formed on the basis of the points received when completing tasks or tests); the time spent on the course by students. Next, a number of disciplines were selected that play a key role in training for various educational programs, the development of which is based on the extensive use of e-learning course in Moodle DLS: geometry, organization theory and management of organizational policy in education, plant systematics, ancient world history, protection and preservation of cultural heritage, etc. Information about the success of students’ mastering of the educational program is built in the form of grades for midterm assessment in modules, disciplines and practical classes in the Herzen database. To connect data from two databases, an integration model was developed (Figure 3) and implemented in the DP CVEP. 9 Figure 3: Model for integration of HERZEN DB and Moodle DB The tool allows identifying the correlation between the selected indicators of students’ perfor- mance in Moodle DLS and the successful results of their training. The calculation of the indicators’ values is automatic and possible thanks to the queries to MariaDB. The DP CVEP tool implemen- tation enabled us to obtain the data on the training of five groups, each containing 20 students on average. Correlation coefficients were calculated for each discipline, student group, and selected indi- cator, reflecting the dependence of the indicator values on students’ grades for the disciplines midterm assessments (Table 1). Indicators of the number of students’ actions and their activity in the chat(s) of the course characterize the additional activity of students on the way to achieve the final result. The final grade for the course is formed on the basis of the quality of the student’s task performance throughout the completion of the online course. The calculations presented in Table 1 are part of the results of the study. Although these results are based on a relatively small amount of data, they make it possible to formulate assumptions that, in the future study, will be tested on large amounts of data by EDM methods. For instance, based on the data obtained, one can assume that the values of the third indicator have the most consistent high correlation, with an average value of 0.7108. Thus, the availability of tools providing an evaluation of the current performance of students in the OEC (assignments, tests) with substantial certainty allows predicting the success of the student’s mastering of the discipline. At the same time, the first indicator has the smallest correlation, with an average value of 0.314. It is possible to draw a preliminary conclusion that this indicator doesn’t present a significant factor for characterizing the quality of student’s learning process in the course. 10 Table 1: Correlation r-coefficients for selected indicators The analysis of data in the context of disciplines showed significant differences in the values of indicators, related, according to the authors, to the specificity of the subject area and the structure of the OEC of disciplines. On the basis of the analysis of the results of users’ interaction with educational and methodological materials of the OEC, it is possible to formulate assumptions about the effectiveness of the course and recommendations for its further improvement. 5 Discussion Therefore, during the analysis of educational data, the authors identified the stages of application of EDM methods which include: study of the nature of the subject area; design and development of tools for the implementation of data collection and data generation; selection of approaches to identify the relationships between these data, and implementation of developed data processing tools and tools for a visual presentation of the results. Experimental data were obtained in DEE of the Herzen State Pedagogical University of Russia. Calculations of dependencies between the highlighted indicators of the student’s digital footprint confirmed the feasibility and importance of EDM methods implementation, as it allowed to draw conclusions about the impact of such indicators on academic results of students within the framework of current and midterm academic performance. It would be fair to conclude that the use of EDM methods is directly dependent on the preparedness of the educational organization: availability of appropriate equipment and skills, as well as digital and IT competence of the students. The results 11 of further research aimed at identifying factors influencing the success of students’ mastering of the courses will allow, according to the authors, to create recommendations for an upgrade of the structure and content of educational programs in order to improve the quality of training. 6 Conclusion The obtained results allow authors to move towards the next stage of the study, based on an expanded database, and analyze the students’ performance within all the educational programs available at the university. Based on such data, models (patterns) describing students’ learning activities within the frame- work of blended learning will be developed, which, according to the authors, will allow predicting the success of students mastering of the learning path, identifying students at risk and “critical points"” of educational programs that cause students difficulties in learning, and developing recommendations to improve the online education courses, etc. Thus, the use of EDM methods in the framework of the proposed methodology and implemen- tation of the developed tools for the analysis and interpretation of detailed data on various aspects of the learning process will improve the efficiency and effectiveness of education. The method of analysis of the digital footprint of students suggested by the authors can be used for informed managerial decision-making during the design, editing, and implementing university educational programs with a view to improving the quality of training. References [Abdulmohsen, 2016] Abdulmohsen A. (2016) Data Mining in Education. (IJACSA) International Journal of Advanced Computer Science and Applications, 2016, Vol. 7, No. 6. [Angeli et al., 2017] Angeli, C., Howard, S., Ma, J., Yang, J., Kirschner, P. A.(2017) Data mining in educational technology classroom research: can it make a contribution? Computers Education, 2017, No. 113, pp. 226–242. [Baranova et al., 2020] Baranova E. V., Vereshchagina N. O., Shvetsov V. (2020) Digital Tools for Learning Activities Analysis // Izvestia: Herzen University Journal of Humanities Science, Saint Petersburg, 2020, No. 198, pp. 56–65. [Baker & Yacef, 2009] Baker R., Yacef K.(2009) The state of educational data mining in 2009: A review and future visions // Journal of Educational Data Mining, 2009, Vol. 1, No. 1, pp. 3–17. doi: 10.5281/ZENODO.3554657. [Belonozhko et al., 2017] Belonozhko P. P., Karpenko A. P., Khramov D. A.(2017) Analysis of Educa- tional Data: Directions and Perspectives of Application // Internet-journal NAUKOVEDENIE, 2017, Vol. 9, No. 4. Available online: https://cyberleninka.ru/article/n/analiz-obrazovatelnyh- dannyh-napravleniya-i-perspektivy-primeneniya/viewer (accessed on 5 July 2020). [Bowers et al., 2012] Bowers A. J., Sprott R., Taff S. A.(2012) Do we know who will drop out? A review of the predictors of dropping out of high school: Precision, sensitivity and specificity // The High School Journal, 2012, Vol. 96, No. 2, pp. 77–100, doi: 10.1353/hsj.2013.0000. [Bowers, 2010] Bowers A. J.(2010) Analyzing the longitudinal K-12 grading histories of entire cohorts of students: Grades, data-driven decision making, dropping out and hierarchical cluster analysis // Practical Assessment Research and Evaluation, 2010, Vol. 15, Article 7, doi: 10.7275/r4zq- 9c31. 12 [Campbell & Majors, 2017] Campbell L., Majors C. (2017) Database Reliability Engineering: De- signing and Operating Resilient Database Systems, O’Reilly Media, 2017. [Castro, 2007] Castro, F.(2007) Applying Data Mining Techniques to e-Learning Problems / F. Cas- tro, A., Vellido, A., Nebot, F. Mugica// Studies in Computational Intelligence, 2007, Vol. 62, pp. 183–221. [Ezekiel & Mogorosi, 2020] Ezekiel U. O., Mogorosi, M.(2020) Educational Data Mining for Monitor- ing and Improving Academic Performance at University Levels. (IJACSA) International Journal of Advanced Computer Science and Applications, 2020, Vol. 11, No. 11. [Manyika et al., 2011] Manyika J., Chui M., Brown B., Bughin J., Dobbs R., Roxburgh C., Byers A.(2011) Big Data: The Next Frontier for Innovation, Competition, and Productivity / Ed. McKinsey Global Institute, 2011. [Kurbatsky, 2019] Kurbatsky V. N. (2019) Digital Footprint in the Educational Space as the Basis of Transformation of a Modern University // HSE: scientific-methodical and journalistic journal, 2019, No. 5, pp. 40–45. [Larionova & Karasik, 2019] Larionova V. A., Karasik A. A.(2019) Digital Transformation of Uni- versities: Notes on the Global Conference EdCrunch Ural on Technologies in Education // University Management: Practice and Analysis, 2019, No. 23, pp. 130–135. [Lunkov & Kharlamov, 2014] Lunkov A. D., Kharlamov A. V.(2014) Intelligent Data Analy- sis // Textbook. National Research Saratov State University, 2014. Available online: http://elibrary.sgu.ru/uchl it/1141.pdf (accessed on 3 October 2020). [Nikolaenko, 2019] Nikolaenko G. A.(2019) The Perspectives of Using Digital Traces of Researchers for Analyzing their Communication Strategies (by the Example of the Social Network Research- Gate) // Sociology of Science and Technology, 2019, [Ovsyanitskaya, 2013] Ovsyanitskaya L. Y.(2013) Intelligent Data Analysis as a Component of Pedagogical Management // Education and science, 2013, No. 10. Available on- line: https://cyberleninka.ru/article/n/intellektualnyy-analiz-dannyh-kak-sostavlyayuschaya- pedagogicheskogo-upravleniya (accessed on 14 October 2020). [Pedersen, 2019] Pedersen K.(2019) Modern Science Proves Intelligent Design: The Information Sys- tem Worldview, Archway Publishing. [Peña-Ayala, 2014] Peña-Ayala, A. (2014). Educational data mining: A survey and a data mining- based analysis of recent works. Expert systems with applications, 2014, 41 (4), pp. 1432–1462. [Razaque & Alajlan, 2020] Razaque A., Alajlan A.(2020) Supervised Machine Learning Model-Based Approach for Performance Prediction of Students. Journal of Computer Science, 2020, Vol. 16, 1150–1162. [Rawad & Rémi, 2020] Rawad C., Rémi B. (2020) Internationalizing Professional Development: Us- ing Educational Data Mining to Analyze Learners’ Performance and Dropouts in a French MOOC. International Review of Research in Open and Distributed Learning, 2020, Vol. 21, No. 4. [Shrestha & Pokharel, 2021] Shrestha S., Pokharel M. (2021). Educational data mining in moodle data. International Journal of Informatics and Communication Technology (IJ-ICT). 10. 9. 10.11591/ijict.v10i1.pp9-18. 13 [Siemens & Baker, 2012] Siemens G., Baker R. S.(2012) Learning analytics and educational data mining: towards communication and collaboration // LAK’12. Proceedings of the 2nd Interna- tional Conference on Learning Analytics and Knowledge. New York: Association for Computing Machinery, 2012, pp. 252–254, doi: 10.1145/2330601.2330661. [Shamsutdinova, 2020] Shamsutdinova T. M. (2020) Cognitive Model of Electronic Learning Tra- jectories Based on Digital Footprint // Otkrytoye Obrazovaniye, 2020, No 2. Avail- able online: https://cyberleninka.ru/article/n/kognitivnaya-model-traektorii-elektronnogo- obucheniya-na-osnove-tsifrovogo-sleda (accessed on 21 September 2020). [Stauffer, 2019] Stauffer M. (2019) Laravel: Up Running: A Framework for Building Modern PHP Apps 2nd Edition, O’REILLY, 2019. [Tulupyeva et al., 2015] Tulupyeva T. V., Suvorova A. V., Azarov A. A., Tulupyev A. L., Bordovskaya N. V. (2015) Computer Tools in the Analysis of Students’ Digital Footprints in Social Network: Possibilities and Primary Results // Computer Tools in Education, 2015, No. 5, pp. 3–13. [Tulupyeva et al., 2016] Tulupyeva T. V., Tafinyseva A. S., Tulupyev A. L. Approach to the Analysis of Personal Traits Reflection in Digital Footprint// Vestnik psihoterapii, 2016, No. 60 (65), pp. 124–137. [Vartanov, 2018] Vartanov S. A. (2018) Digital Media and Big Data: a Mathematical Approach to Media Environment Analysis // Vek Informatsii, 2018, Vol. 1, No. 2, pp. 211–213. [Zaitseva, 2012] Zaitseva O.(2012) Use of LMS Moodle in Education // Education and Mentoring: Methods and Practice, 2012, No. 2, pp. 59–64. [Zamora-Musa & Velez, 2017] Zamora-Musa R. Velez J. (2017) Use of Data Mining to Identify Trends between Variables to Improve Implementation of an Immersive Environment, Journal of Engi- neering and Applied Sciences, 2017, Vol. 12, No. 22, pp. 5944–5948. [Zhigalova, 2019] Zhigalova O. P. (2019) Formation of the Educational Environment in the Conditions of Digital Transformation of Society // Scholarly Notes of Transbaikal State University, 2019, Vol. 14, No. 2, pp. 69–74. [Zorić, et al., 2020] Zorić A., Obrenovic B., Akhunjonov U. (2020). Benefits of Educational Data Mining. 2020. 14