=Paper= {{Paper |id=Vol-2870/paper96 |storemode=property |title=Application of Big Data Methods in E-Learning Systems |pdfUrl=https://ceur-ws.org/Vol-2870/paper96.pdf |volume=Vol-2870 |authors=Natalia Sharonova,Iryna Kyrychenko,Glib Tereshchenko |dblpUrl=https://dblp.org/rec/conf/colins/SharonovaKT21 }} ==Application of Big Data Methods in E-Learning Systems== https://ceur-ws.org/Vol-2870/paper96.pdf
Application of Big Data Methods in E-Learning Systems
Natalia Sharonovaa, Iryna Kyrychenkob and Glib Tereshchenkob
a
    National Technical University "KhPI", Kyrpychova str. 2, Kharkiv, 61002, Ukraine
b
    Kharkiv National University of Radioelectronics, Nauky Ave. 14, Kharkiv, 61166, Ukraine


                Abstract
                Analytics and Big Data play an important role in the future of higher education. This paper
                analyzes and practices the use of e-learning technology tools to provide relevant information
                for teachers and students trying to optimize the learning process. The combination of data
                processing and analytical training is an aid that will greatly enhance higher education and
                determine the path for further development in the new educational era.

                Keywords 1
                E-learning technological tools, curriculum analysis, educational data acquisition, Big Data,
                learning management system

1. Introduction
    Currently, there is already a large volume of data from pupils. who have access to LMS [1]. The
growth of students in new distance education systems is driving in the educational field, there is a new
trend. The recent rise in popularity of MOOC is an example of the new expectations that are being
offered to university students. This tendency leads to a change in the role of behavior in different
educational roles, where both teachers and students have to respond to new methods and change their
traditional teaching methods. This phenomenon is not limited to public schools, which must adapt their
structure and information structures to accommodate the demands of students in order for them to gain
access to their academic programs.
    Mining data is widely used in the education industry to find problems in the industry. Student
performance is of great concern in educational institutions where performance may be influenced by
several factors. There are three necessary components to forecasting: parameters that affect student
performance, data mining methods, and a third, data mining tools. These parameters can be
psychological, personal and environmental. The research conducted in this paper is aimed at supporting
the quality of the education of the institute, minimizing the diverse impact of these factors on student
success.
    Big Data is the automated fusion of organized data stored in libraries with unstructured data from
emerging outlets such as social media, electronic devices, cameras, smart meters, and financial systems.
Big Data, on the other hand, is described by the McKinsley Global Institute as "data sets that transcend
the capacity of traditional database software to record, process, handle, and analyze." Today, this
approach enables businesses to collect and interpret all data, regardless of the type, volume, or speed of
transmission, and make more informed decisions based on that data.
    It has been decided that there is so much to discover about how to manage Big Data in the same way
that everyone else does. But one thing is certain: conventional data-processing methods would not lead
to Big Data research performance [2]. The number of data sources, the volume of data, the processing
time, and even the key business models all contribute to a broad data space. Recommendations to use
the same old tools under these new conditions are not suitable for data analysis.


COLINS-2021: 5th International Conference on Computational Linguistics and Intelligent Systems, April 22–23, 2021, Kharkiv, Ukraine
EMAIL: nvsharonova@ukr.net (N. Sharonova); iryna.kyrychenko@nure.ua (I. Kyrychenko); hlib.tereshchenko@nure.ua (G. Tereshchenko)
ORCID: 0000-0002-8161-552X (N. Sharonova); 0000-0002-7686-6439 (I. Kyrychenko); 0000-0001-8731-2135 (G. Tereshchenko)
             ©️ 2021 Copyright for this paper by its authors.
             Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
             CEUR Workshop Proceedings (CEUR-WS.org)
    Data mining is the study and discovery of secret information by "machines" (algorithms, artificial
intelligence) in raw data that was previously unknown, non-trivial, functional, and interpretable by
humans [3].
    The main stages of solving problems using Data Mining methods are:
    1. setting the task of analysis;
    2. data collection;
    3. data preparation (filtering, supplementing, coding);
    4. choice of model (data analysis algorithm);
    5. selection of model parameters and training algorithm;
    6. model training (automatic search of other model parameters);
    7. analysis of the quality of training, if unsatisfactory, then move to item 5) or item 4);
    8. analysis of identified patterns, if unsatisfactory, then move to paragraph 1), 4) or 5).
    The choice of data analysis method is based on some features of the source data. In our case, we can
distinguish the following features:
        No prior knowledge of the data being analyzed, since we are in the initial stages of analysis;
        The number of groups to which each sample object will be assigned is unknown in advance;
        Object partitioning must take place on a whole set of features, not on a single dimension.
    Based on these features, a clustering method was selected for this study using a mathematical
apparatus for cluster analysis.
    Cluster analysis is a set of mathematical methods designed to form relatively "distant" friends of
groups of "related" objects based on distances or relationships between them.
    Clustering differs from classification in that the solution of the problem is possible without any prior
knowledge of the analyzed data.
    Cluster analysis has the advantage of allowing you to split objects not only by a single parameter,
but by a whole range of attributes, as well as viewing a large amount of raw data of almost any kind.
    The task of clustering is to divide the studied set of objects into groups of "similar" objects, which
are called clusters [4]. A cluster in English means a bunch, a bundle, a group.
    Classification tasks involve assigning each data object to one (or more) of predefined classes, and
in a clustering task assigning each of the data objects to one (or more) of previously unknown classes.
    Note a number of features inherent in the problem of clustering. The decision depends heavily on
the nature of the data objects and their attributes, i.e. they can be uniquely defined objects, accurately
quantified objects, and may be objects that have a plausible or fuzzy description.
    The decision also depends heavily on the representation of the clusters and the predicted
relationships between the data objects and the clusters. That is, you need to consider such features as
the ability or inability to attach objects to multiple clusters. It is also necessary to define the very concept
of cluster membership: unambiguous (belonging / not belonging), probabilistic (belonging probability),
fuzzy (degree of belonging).

2. Choosing the algorithm
   Cluster analysis divides a group of objects G into m (m - integer) clusters (subsets) Q1, Q2, ..., Qm
based on data found in a large number of X, such that Gj belonged to one and only one subset of the
partition, and objects belonging to the same cluster were identical, while objects belonging to different
clusters were heterogeneous.
   When clustering, the number of clusters generated is crucial. Clustering is designed to detect natural
object thickening on a local level. As a consequence, the number of clusters is a parameter that, if
considered undefined, can significantly complicate the form of algorithm and, if understood, can
significantly affect the consistency of the result.
   Usually nothing is known at the beginning of a data survey, so clustering algorithms are usually built
as a way to sort through the number of clusters and determine its optimal value.
   The number of methods of splitting a set into clusters is quite large. All of them can be divided into
hierarchical and non-hierarchical [4].
   Hierarchical clustering combines small clusters into large clusters or splits large clusters into small
clusters. Hierarchical algorithms are in turn divided into agglomerative and divisible ones.
    Agglomerative methods are characterized by sequential integration of the original elements and a
corresponding decrease in the number of clusters. At the beginning of the algorithm, all objects are
separate clusters. Initially, the most similar objects are clustered. The merge then proceeds until all the
objects form a single cluster.
    The sequential division of the initial cluster composed of all items, as well as the resulting increase
in the number of clusters, define divided approaches. Both objects belong to a single cluster at the start
of the algorithm, which is then separated into smaller clusters in subsequent stages, resulting in a series
of splitting sets.
    Non-hierarchical algorithms try to group the data into clusters so that the objective function of the
partition algorithm reaches the extremum (minimum).
    A very important issue is the problem of selecting the required number of clusters. Sometimes m
number of clusters can be chosen a priori. However, in the general case, this number is determined in
the process of splitting the set into clusters. Studies were conducted by Fortier and Solomon, and it was
found that the number of clusters should be accepted to achieve the probability α that the best fit was
found. Thus, the optimal partition number is a function of a given fraction β of the best, or in some
sense permissible, partitions in many of all possible [5]. The total scattering will be greater than the
fraction β of the admissible partitions.
    Based on the information obtained for the experiment were selected algorithms COBWEB,
DBSCAN, hierarchical clustering algorithm, XMEANS and EM algorithm.
    Cluster analysis of model data using the selected algorithms showed good results, which is why
these algorithms are used for further clustering of real data.
    The DBSCAN algorithm is an algorithm for clustering spatial data with the presence of noise,
proposed by Martin Esther, Hans-Peter Krigel, and colleagues in 1996 as a solution to the problem of
splitting data into arbitrary clusters. It is density-based: for a given set of points in some space, it assigns
to one group the points that are closest (the points with many neighbors) and marks the points that lie
in areas of low density (whose neighbors are too far apart) as emissions. DBSCAN is one of the most
common clustering algorithms as well as the most cited in the scientific literature. The authors of
DBSCAN have experimentally shown that the algorithm is able to recognize clusters of different
shapes, such as in Figure 1.




Figure 1: Examples of arbitrary shape clusters recognized by DBSCAN

   The algorithm's theory is that each cluster has a normal density of points (objects) that is significantly
higher than the density outside the cluster, as well as a density in areas of noise that is lower than all of
the clusters' density. More specifically, each point of the cluster must have at least a certain number of
points in its neighborhood of a given radius, which is determined by a limit value.
   This algorithm investigates the cluster for given parameter values as follows: first it selects as a seed
a random point which is a nucleus, then places in the cluster the bait and all points densely reachable
from it.
    The EM algorithm is based on the calculation of distances, ie the identification of areas that are more
"populated" than others. In the process of the algorithm there is an iterative improvement of the solution,
and stopping is carried out at the moment when the required level of accuracy of the model is reached.
    The basis of the EM algorithm is the assumption that the investigated set of data can be modeled
using a linear combination of multidimensional normal distributions. It is assumed that the data in each
cluster is subject to a certain distribution law, namely, the normal distribution.
    The EM algorithm is an iterative algorithm, each iteration consists of two steps: a step of
mathematical expectation (E-step) and a maximization step (M-step) [4].
    The E-step calculates the expected value of the likelihood function, with hidden variables being
considered as observable. In the M-step, the maximum likelihood estimate is calculated, thus increasing
the expected likelihood calculated in the E-step. This value is then used for the E-step in the next
iteration. The algorithm runs to convergence. Here are the steps from a mathematical point of view. To
do this, consider the function:
        𝐹(𝑞, 𝜃) = 𝐸𝑞 [𝑙𝑜𝑔𝐿(𝜃; 𝑥, 𝑍)] + 𝐻(𝑞) = −𝐷𝐾𝐿 (𝑞 ∥ 𝑝𝑍|𝑋 (· |𝑥; 𝜃)) + 𝑙𝑜𝑔𝐿(𝜃; 𝑥),                  (1)
where q is the probability distribution of the unobserved variables Z, pZ|X(· |x;θ) – conditional
distribution of unobservable variables at fixed observable x and parameters θ, H – entropy, DKL –
distance from Kulbak-Leibler.
    Then the steps of the EM algorithm can be represented as:
   a) E (xpectation) step: Select q to maximize F:
                                        𝑞 (𝑡) =∗ 𝑎𝑟𝑔 𝑚𝑎𝑥𝑞 𝐹(𝑞, 𝜃 (𝑡) );
   b) M (aximization) step: We choose θ to maximize F:
                                       𝑞 (𝑡+1) =∗ 𝑎𝑟𝑔 𝑚𝑎𝑥𝜃 𝐹(𝑞 (𝑡) , 𝜃).
    The X-Means algorithm is one of the most popular clustering methods. It is also a generalization of
the k-means algorithm and uses it in its implementation. One of the main differences of this algorithm
can be called the absence of the requirement of the exact number of required clusters, only the required
range of values for the number of clusters is specified.
    The basic idea of the algorithm is that at each iteration the center of mass for each cluster obtained
in the previous step is recalculated, then the vectors are broken down into clusters again according to
which of the new centers turned out to be closer to the chosen metric.
    When no iteration shifts the clusters' center of mass, the algorithm stops. This occurs for a finite
number of iterations since the number of possible partitions of the finite set is normally finite, and the
cumulative square deviation of V does not increase with each step, making looping impossible.
    This algorithm uses the Bayesian model selection criterion [6]. It follows from the principle of
maximum likelihood. This criterion is determined by the formula:
                                     𝐵𝐼𝐶 = −2𝑙𝑛(𝐿) + 𝑘𝑙𝑛(𝑛) ,                                          (2)
where L is the maximum value of the likelihood function of the observed sample with a known number
of parameters, k is the number of (estimated) parameters used, n is the number of objects in the sample.
    Hierarchical clustering is a set of algorithms that use the division of large clusters into smaller
clusters or the merging of smaller clusters into larger ones. Accordingly, allocate divided and
agglomerative clustering. In this work, the agglomerative Lance-Williams clustering was used.
    To calculate the distance R (W, S) between clusters W = U ∪ V and S, knowing the distances R (U,
S), R (V, S), R (U, V), we need to use a formula that generalizes most reasonable ways:
                   𝑅(𝑈 ∪ 𝑉, 𝑆) = 𝛼𝑈 × 𝑅(𝑈, 𝑆) + +𝛼𝑉 × 𝑅(𝑉, 𝑆) + 𝛽 ×                                    (3)
                             × 𝑅(𝑈, 𝑉) + +𝛾 × |𝑅(𝑈, 𝑆) − 𝑅(𝑉, 𝑆)| ,
where αU, αV, β, γ - numerical parameters.
    The COBWEB algorithm is a classic incremental conceptual clustering method that defines clusters
as groups of objects that belong to one concept - a specific set of attribute-value pairs. It creates a
hierarchical clustering in the form of a tree: each node of this tree references a concept and contains a
distribution of all the descriptions of that concept, which includes the probability of the concept's
belonging to a given node and the conditional probabilities of the species:
            𝐶𝑈 = ∑∑∑𝑃(𝐴 = 𝑈_𝑖𝑗 | 𝐶_𝑘)𝑃(𝐶_𝑘 | 𝐴 = 𝑈_𝑖𝑗)𝑃(𝐴 = 𝑈_𝑖𝑗) 𝑘_𝑖𝑗.                                (4)
    The values are summed across all Ck categories, all Aj properties, and all Uij property values. The
value of P (Aj = Uij | Ck) is called predictability. This is the probability that the object for which the
property Aj takes the value Uij belongs to the category Ck. The higher the value, the more likely the
properties of two objects in the same category have the same values. The value of P (Ck | A = Uij) is
called predictiveness. This is the probability that for Ck objects, the Aj property takes the value Uij.
The greater the value, the less likely it is for objects that do not belong to this category to take the
specified value.
    The value of P (A = Uij) is a weighting factor that enhances the influence of the most common
properties. By sharing these values together, the high utility of a category means a high probability that
objects in one category have the same properties, and a low likelihood of having those qualities in
objects in other categories.
    The algorithm for constructing a tree uses a heuristic measure of estimation, called category utility
- an increase in the expected number of correct assumptions about the value of attributes while knowing
their belonging to a certain category relative to the expected number of correct assumptions about the
value of attributes without this knowledge. To embed a new object in a tree, the COBWEB algorithm
iteratively scans the entire tree in search of the "best" node to which that object is assigned.
    Selecting a node is based on the placement of the object in each node and calculating the usefulness
of the category of the slice obtained. It also calculates the usefulness of a category for the case when an
object belongs to a newly created node. As a result, the object refers to a node for which the usefulness
of the category is greater.
    As a result of the study of existing clustering algorithms, the EM algorithm was selected for the
experiment.

3. Choosing software
    One of the objectives of the study is the choice of software for the process of clustering and
subsequent visualization of the results.
    For this purpose, a great deal of work was done to search for existing statistical packages. As you
can see, all existing programs can be divided into three main categories: private research, implemented
using popular software math packages, expensive commercial solutions focused on corporate statistical
research, and a small proportion of statistical packages that are freely available. In order to select the
mathematical package for the study, a number of existing statistical processing tools were considered.
    The Fuzzy Clustering and Data Analysis Toolbox - software package for Matlab provides three
categories of functions:
        clustering algorithms that break data into clusters with different approaches: K- means and K-
    medoid - algorithms for stable clustering; FCMclust, GKclust and GGclust unstable clustering
    algorithms;
        analysis functions that evaluate each fixed partition performed by an algorithm based on indices
    (Xie and Beni's, Dunn, Alternative Dunn, Partition index);
        visualization features that implement Sammon's modified method of displaying data in a
    smaller space.
    This program is installed as a plug-in, does not provide a ready-made interface for analysis, but
allows you to further use the functions described above when developing applications on Matlab [7].
    Cluster Validity Analysis Platform (CVAP) is a software tool, also implemented on Matlab. Based
on a user-friendly graphical interface, it includes several algorithms for cluster analysis (K-means,
hierarchical, SOM, PAM), as well as the most widely used indexes of their performance. By working
in this application, the user is not only able to download their data, but also save the results of work.
The undoubted advantage is that the graphical part allows to analyze several algorithms for one index
at a time.
    SPSS Statistics is a paid modular, fully integrated software package that covers all stages of the
analytical process, focused on solving business problems and related research problems. The intuitive
interface has many statistics management features. It has clustering algorithms [8].
    RapidMiner is a machine learning and data processing environment that protects the user from the
grunt work. Instead, he is asked to "draw" the entire desired data processing method as a chain (graph)
of operators and then execute it. RapidMiner displays the operator chain as an interactive graph and an
XML expression (the main language of the system).
    Now more than 400 operators are implemented in the system. Of them:
        operators of precedent training, which implements clustering, classification, regression and
    association search algorithms;
        pre-processing operators (filtering, sampling, filling in the gaps, reducing the dimension;
        operators of work with signs (selection and generation of signs);
        meta-operators (for example, multi-parameter optimization operator);
        operators of quality assessment (sliding control);
        visualization operators;
        data downloading and storage operators (including working with special formats: arff, C4.5,
    csv, bibtex, databases, etc.).
    WEKA is written in Java at the University of Waikato (New Zealand) and provides the user with
the ability to pre-process data, solve clustering, classify, regress and search for associative rules, as well
as visualize data and results [9]. The program is very easy to learn (probably has the most intuitive
interface among all programs of this type), is free and can be supplemented with new means of pre-
processing and data visualization.
    The output can be represented as a matrix of feature descriptions. WEKA provides access to SQL
databases through Java Database Connectivity (JDBC) and can accept SQL query results as output.
    WEKA has the Explorer UI, but the same functionality is available through the Knowledge Flow
Component Interface and from the command line [10]. There is a separate Experimenter application to
compare the very root of the ability of machine learning algorithms on a given set of tasks.
    Explorer has several panels:
        Preprocess panel allows you to import data from a database, CSV file, etc., and apply filtering
    algorithms to them, for example, translate quantitative characters into discrete ones, delete objects
    and features by a given criterion;
        Classify panel allows you to apply;
        Classification and regression algorithms for data sampling, estimating the predicted ability of
    algorithms, visualizing erroneous predictions, ROC curves, and the algorithm itself, if possible (in
    particular, decision trees);
        The Associate panel search bar is concerned with identifying any meaningful relationships
    between features;
        Cluster panel Cluster panel gives access to K-Means algorithm, EM algorithm, COBWEB,
    DBSCAN and others;
        Select attributes panel gives access to feature selection methods;
        Visualize visualization panel builds scatter plot matrix, allows you to select and enlarge graphs,
    etc.
    The disadvantage of The Fuzzy Clustering and Data Analysis Toolbox and CVAP lies primarily in
their inaccessibility and inability to analyze their own algorithms. These non-commercial applications
are implemented mainly on Matlab, which automatically imposes a number of restrictions:
        applications depend on version and additional libraries supplied;
        it is necessary to know its internal structure and rules of operation;
        graphical and computational implementations are fixed;
        analysis of our own algorithms, if possible, it is necessary to create additional systems of
    interaction.
    CVAP is only supported by the Matlab application, despite its user-friendly graphical interface [11].
In order to use The Fuzzy Clustering and Data Analysis Toolbox, you need to write additional functions
that relate the algorithm and analysis features.
    In most cases, all such programs are personal research, often designed to demonstrate specific
methods and are therefore limited in functionality.
    Matlab itself is a commercial product that needs to be purchased and installed, which in itself is a
long, time-consuming process. Participation in commercial projects, including SPPP Statistics, in turn
are successfully developed, but since they are focused mainly on statistical surveys in business, they
include clustering algorithms as part of statistical methods, so specialized tools for analyzing the work
of the algorithms themselves, such as usually do not have. The cost of such developments is quite large.
For example, the licensed program SPSS Statistics for one private user is currently worth about forty
thousand rubles. In addition, the implementation and analysis of their algorithms in such programs is
not provided.
   Free software complexes (RapidMiner, WEKA) also impose a number of restrictions on data
processing [12]. These programs do not have the ability to embed their own algorithms, and the number
and variety of existing clustering algorithms is also negligible.
   RapidMiner has very good rendering tools: there are many rendering methods and all the graphics
look great. But the only downside to rejecting this software is the lack of connection to the FireBird
database and the lack of algorithms selected at the beginning of the study.
   Thus, after examining several statistical packages, WEKA software package was selected for further
clustering process, which contains the selected algorithms and has the possibility to connect to the
database via URL. In addition, WEKA is one of the few products that has an intuitive interface and
translated technical literature.

4. Selection of objects for clustering of data of the remote workshop and
   definition of their signs
    Based on the analysis of the conceptual scheme of the database checks the systems of the remote
workshop, the main essences were identified - these are the tasks, users and solutions. Therefore, it was
decided to select the following clustering objects:
        students (workshop users);
        tasks of the workshop;
        pairs "student - task".
    In order to determine the set of features for clustering, the attributes of the selected entities available
in the database were investigated. The following attributes are stored for tasks in the database:
        task identifier;
        the limit of CPU time and operational of this task;
        the minimum percentage of unique code at which the solution of the problem is considered
    unique;
        expert complexity of the task;
        the number of users who have solved this task;
        the number of users who tried to solve this problem;
        the number of decisions received for this task.
    Each user of the system in the database is assigned an ID, login and password to log in, also stored
calculated data, the number of tasks solved by the user and the number of tasks that the user tried to
solve. Each attempt to solve the problem by the student is recorded in the database, while storing the
following information:
        student ID and task number;
        date and time of receipt of the solution of the problem by the inspection system;
        used compiler;
        attempt status (correct decision or error code);
        characteristics of the correct decision - the execution time of the program (query, script), the
    amount of memory used, the percentage of plagiarism.
    After analyzing the attributes of clustering objects stored in the database and selecting the most
significant features, a set of features for clustering was determined for each stage of the study.
    To cluster users of the verification system, the following attributes were selected, which will allow
to select groups of students by level of training:
        user ID;
        relative indicator of the student's level of preparation;
        the average number of attempts to solve problems;
        the average complexity of the tasks;
        year of study (1-5, students of previous years are considered as one course "-1").
    To cluster the tasks of the testing system, the following attributes were selected, which will allow to
select groups of tasks by level of complexity:
       task identifier;
       a relative indicator of the degree of complexity of the task;
       the number of non-unique solutions;
       the number of partially correct decisions;
       the average number of attempts to solve the problem.
   For clustering of pairs "student-task" the following attributes were chosen, which will allow to
determine a suitable student task or not:
       complexity of the task;
       relative indicator of the degree of complexity of the task;
       relative indicator of the student's level of preparation; the average complexity of the tasks
   solved by the student;
       the relative indicator with which the problem is solved;
       the number of days between the first and last attempt to resolve.

5. Conclusions
    We should draw a number of conclusions from this research that will assist teachers in improving
both the learning process and the teaching process. It's worth noting that the Collaborative and Storage
resources are the most widely used, since they provide a vast volume of data when processed by both
teachers and students. To optimize efficiency and receive new developments and enhancement of
learning processes, this data must be intelligently ordered. Furthermore, we can infer from the usage of
the Evaluation instruments that teachers need to find alternative ways of assessment in order to improve
access to these tools.
    This portion of the visualization is built on two attributes: SEM / GRANDTOTAL, as well as the Y
axis and the X axis corresponding to the outcome. However, the other two properties can only be used
to create one aspect of the visualization.
    The visualization diagram (Figure 2) is shown together with the details of each cluster.




Figure 2: Visualize a chart between two attributes

    As we can see in Figure 3, most boys were absent from the two tests and therefore had poor grades
in all subjects. Most of the girls with average grades who went to the first test but missed the next one
also got into the cluster of section B.
Figure 3: Visualization of clustering result for unsatisfactory and satisfactory results

   Figure 4 shows that most guys had excellent grades in subjects and wrote the second test.




Figure 4: Visualize the result of clustering for great results

   Data mining in an instructional setting is presented in this article, which uses associative rule
extraction strategies to identify patterns of student failure. To examine pupil performance, association
law research was extended to educational systems. The Association Rules extraction methodology is
used in this research to uncover elusive dynamics and assess student success and trends. To find
connections between attributes, the EM algorithm is used.
   Student success was measured using academic and personal data gathered over the course of one
semester. After that, J48 classification algorithms were used. WEKA 3.8.2 was the data processing
software used in the trial. We can infer that the J48 classification system was the most suitable algorithm
for the data set based on the accuracy and classification errors.
   WEKA was used to apply the EM algorithm to the dataset in order to find an interpretation of average
student success based on some of the best rules. Data can be extended to include any of a student's
extracurricular activities and technical abilities, and various classification algorithms can be used to
forecast student success.

6. References
[1] Learning management system. Wikipedia, 2021. URL: https://en.wikipedia.org/wiki/ Learning_
     management_system.
[2] K. Cook, K. Kukier, V. Shteinberg, Big Data: A Revolution That Will Transform How We Live,
     Work, and Think, 2013.
[3] R. Baker, G. Siemens, Educational data mining and learning analytics, The Cambridge handbook
     of the learning sciences, 2014.
[4] Expectation-maximization algorithm. Wikipedia, 2020. URL: https://en.wikipedia.org/wiki/
     Expectation%E2%80%93maximization_algorithm.
[5] S. Rzheutsk, Experience in the application of clustering methods for analyzing the results of
     distance learning, in: Proceedings of the International Scientific and Practical Conference,
     Informatization of Engineering Education, 56, 2016, pp. 617–620.
[6] K-means clustering. Wikipedia. 2019. URL: https://en.wikipedia.org/wiki/K-means_clustering.
[7] O. Dashkevich, Analysis of Apache Kafka capabilities within the framework of secure Big Data
     streaming, in: Proceedings of the 7th. International scientific and technical conference, Information
     Systems and Technologies, 12, 2018, pp. 34–35.
[8] C. Lam, Hadoop in Action, 2010.
[9] Data Mining with WEKA MOOC – Material, Machine Learning at Waikato University, 2019.
     URL: https://www.cs.waikato.ac.nz/ml/WEKA/mooc/dataminingwithWEKA.
[10] Weka Tutorial, Tutorials Point, 2018. URL: https://www.tutorialspoint.com/weka/weka_
     quick_guide.htm.
[11] K. Smelyakov, M. Shupyliuk, V. Martovytskyi, D. Tovchyrechko, O. Ponomarenko, Еfficiency of
     Image Convolution, in: Proceedings of the 8th IEEE International Conference on Advanced
     Optoelectronics and Lasers, CAOL’2019, Sozopol Bulgaria. 2019, pp. 578-583.
[12] K. Smelyakov, O. Ponomarenko, A. Chupryna, D. Tovchyrechko, I. Ruban, Local Feature
     Detectors Performance Analysis on Digital Image, in: Proceedings of the IEEE International
     Scientific-Practical Conference Problems of Infocommunications, Science and Technology, PIC
     S&T’2019, Kyiv Ukraine, 2019, pp. 644-648.