=Paper=
{{Paper
|id=Vol-3691/paper43
|storemode=property
|title=Use of Data Mining to Identify Preferences for Humanistic Courses and Support the Management of University Extension of the Autonomous University of Aguascalientes
|pdfUrl=https://ceur-ws.org/Vol-3691/paper43.pdf
|volume=Vol-3691
|authors=Humberto Muñoz Bautista,Miguel Ortiz-Esparza,Jaime Muñoz Arteaga,Klinge Villalba-Condori
|dblpUrl=https://dblp.org/rec/conf/cisetc/BautistaEAV23
}}
==Use of Data Mining to Identify Preferences for Humanistic Courses and Support the Management of University Extension of the Autonomous University of Aguascalientes==
Use of Data Mining to Identify Preferences for Humanistic
Courses and Support the Management of University
Extension of the Autonomous University of Aguascalientes
Humberto Muñoz Bautista1, Miguel Ortiz-Esparza2, Jaime Muñoz Arteaga1, Claudia Acra-
Despradel3 and Klinge Villalba-Condori4
1 Universidad Autónoma de Aguascalientes, Av. Universidad # 940, Ciudad Universitaria, C.P. 20100, Aguascalientes, Ags.
México.
2 Center for Research in Mathematics, Quantum Knowledge City, 98160, Zacatecas, Mexico
3 Universidad Nacional Pedro Henríquez Ureña, Santo Domingo, Dominican Republic
4 Universidad Católica de Santa María, San José S/N, Arequipa, Perú
Abstract
Humanistic courses are a requirement of the Autonomous University of Aguascalientes for all its
students, who must take at least three courses from different disciplines during their career to graduate.
The courses offered are a way in which students from different careers can relate to each other, and in
this way, develop comprehensively. These courses are offered year after year with minimal change as
to which courses and at what times they are offered. The studies regarding the real demand of the
students are almost non-existent, and for the same reason the courses offered do not always serve to
meet the demand of the students, who find themselves in the need to take courses that they normally
do not attend. would sign up to meet the requirements. In addition to the needs of the students, the
problem of course management arises, where a process is carried out for the registration of courses to
be offered, publication and courses and management of written students.
Keywords
Humanist Courses, Schedule, Data mining 1
1. Introduction
For years the Autonomous University of Aguascalientes has aimed to create professionals with
comprehensive training in all aspects of their life, which is why it has insisted that its students
take courses outside their area and with students from other careers. so that in this way they have
knowledge in other areas and meet people from other careers with interests in common with
them. This is why the University made the decision to create humanistic training courses, which
are a set of courses that seek to expand the knowledge of university students in areas that are not
normally studied in their careers but that are of interest. of some students.
Every year a considerable number of courses are opened in two modalities, intensive courses,
which are opened in inter-semester periods for three weeks, and extensive courses, which are
courses that cover the entire semester. The investment made to be able to have this offer is too
great, from hiring teachers for each course to the materials consumed during them. The courses
that are opened year after year vary very little, since the study plans or the courses to be taught
are rarely radically modified; This is done with little or no consideration of the general opinion of
the university community, since despite the demand that this generates, the courses remain static
in terms of supply. This generates a large number of problems, since many courses are offered
CITIE 2022: International Congress on Trends in Educational Innovation, December 12–04, 2023, Zacatecas,
Zacatecas
hmuntista@gmail.com (H. Muñoz-Bautista); miguel.ortiz@cimat.mx (M. Ortiz-Esparza); jaime.munoz@edu.uaa.mx
(J. Muñoz-Arteaga); c.acra@unphu.edu.do (C. Acra-Despradel); kvillalba@ucsm.edu.pe (K. Villalba-Condori)
0000-0003-1720-0554 (H. Muñoz-Bautista); 0000-0001-8762-5780 (M. Ortiz-Esparza); 0000-0002-3635-7592 (J.
Muñoz-Arteaga); 0000-0002-6429-5675 (C. Acra-Despradel); 0000-0002-8621-7942 (K. Villalba-Condori)
© 2023 Copyright for this paper by its authors.
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR Workshop Proceedings (CEUR-WS.org)
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
that are only filled with students who enter only to meet the university's requirements and not
because they would really like to take the course; On the other hand, there are courses that have
excessive demand and of which only one or two groups are opened per period, regardless of the
fact that there are a large number of students who are left out of said courses due to lack of places
to take them.
It is for this reason that the decision was made to carry out a study, through surveys of current
students at the university, about which courses they would really like to take, at what times and
in what modality of those offered. This is with the intention of analyzing whether the courses that
are opened efficiently satisfy the demand of the student community, as well as to see which are
the courses that are really demanded by the students, which are those that are opened
unnecessarily, and which are They are the ones who should expand their offer. Everything is done
with the aim of the university improving the current offer of humanistic training courses so that
it really satisfies the demand of the students, and that this requirement really meets the purpose
for which it was created.
2. Related Work
The organization and search of metadata using developed software or specialized software
such as Weka is in constant development due to the growing demand for data to be processed,
which is why more precise and advanced approaches are increasingly required for the efficient
management of the data. information [10,12,18].
Regarding data search, many algorithms have been proposed that allow for agile queries of
large data sets. The improvement in search speed, as well as in obtaining information, is thanks
to indices that have been developed as B, B+ trees or hash tables, which are types of structures
through which data is represented in an orderly manner that allows insertions. and efficient
deletions thereof [9]. In the search and organization of data, the implementation of metadata is
of utmost relevance, since these provide very important information such as origin, creation
dates, formats, etc. Which allows the precision and efficiency of searches to increase [7].
The state of the art is constantly evolving because, as mentioned in previous points, the data
handled is increasing exponentially, which is why the development of increasingly sophisticated,
precise, agile and optimized algorithms is promoted. [18].
Currently, no field study has been carried out regarding the humanistic courses at the
Autonomous University of Aguascalientes, the few modifications that are made in the study plans
or in their offer are based solely on the registrations that are made during the previous periods.
This is one of the worst ways to carry out this analysis, since many students enter courses in
which they have no interest, but still enroll to fulfill the course requirements; this results in an
inefficient course offering for the university community.
2.1. Assignment of Schedule Problem
The assignment of schedules problem is a recurrent situation in all educational institutions
[3], where the schedule of the students is planned, assigning signatures, classrooms, and teachers,
with the objective of avoiding an overlap of hours. It corresponds to optimization problems in
computational complexity theory classified as NP-complete problems or NP-hard problems [5],
due to its complexity, resource limitations and number of restrictions. These problems requiring
the entry of applications with intelligent algorithms [17]
2.2. Weka
Weka is a software tool widely used in the field of data mining and machine learning. Weka,
named after "Waikato Environment for Knowledge Analysis", was developed at the University of
Waikato in New Zealand [11]. Weka provides a collection of machine learning algorithms for
classification, regression, clustering, and rule extraction tasks. Additionally, it includes tools for
data preprocessing, model evaluation, visualization, and data exploration.
2.3. Data Science: Algorithmic
Within data science we have different techniques for data processing, such as classification,
clustering and prediction algorithms, with the option of being able to combine algorithms with
each other to obtain a better result.; however regardless of the approach taken for it, asides from
the requirements of implementation to cover and the selection criteria according to the problem
to solve it is important to identify beforehand the area of implementation and the intrinsic
characteristics that it will provide.[6]
Within the renowned areas of computer science to cover the demands of the society the
branches of Artificial Intelligence and Data Science provide the basis for the development of
modern technologies and data driven based solutions [1,16]
The following table shows the algorithmic techniques taken into consideration [4, 2, 14, 15],
the model they are part of, a brief description, and remarks of the thought process regarding the
selection of the supervised learning model and the unsupervised model:
Table 1
Algorithmic techniques
Learning Model Description Remarks
Supervised Decision Tree Weighted decision Use the principle of
trees. Shannon's information
theory.
Supervised Random Forests Group of trees with High accuracy and
characteristics. training complexity.
They usually use the
result of the decision
trees for the input of
the next tree.
Unsupervised Expectation Iterative method for Ensures convergence
Maximization maximum likelihood of the likelihood
algorithm (EM) estimation of function. Helps when
parameters. variables are missing
and in poorly
conditioned problems.
Unsupervised K-means Grouping by distance Mainly the Euclidean
between data. distance is used to
obtain the clusters by
characteristics.
Unsupervised Hierarchical Clustering Create ascending or Generally, uses a
descending group greedy algorithm.
hierarchy.
Unsupervised Gaussian Mixture Probabilistic model Suffers in data
Models that representing scalability, making it
normally distributed unsuitable for large
data data sets
3. Problem Outline
A valuable tool in the field of data science is Weka software. Weka provides a wide range of
machine learning algorithms and tools for data preprocessing and analysis. Its user-friendly
environment and extensibility make it a popular choice for researchers and professionals in the
field of data mining. [8] To begin the study, a survey was first planned that could, with the smallest
number of questions, gather the necessary data to know the trends of university students
regarding humanistic courses, from which are the most requested to which are the most popular.
trends by major and sex for each course.
To carry out this study, 680 surveys were carried out randomly among students from all
possible centers and careers to form a statistical sample about which humanistic courses have a
real demand. The survey was designed to collect the following types of data:
• Edad [18-25]
• Sexo [M - F]
• Carrera [Tipificados]
• Turno [M - V]
• Horario del curso [M V S]
• Tipo del curso [I E A]
• Curso que tomaría [Tipificados]
General information about the students is requested, such as age; obtaining a range of between
18 and 25 years among the respondents; sex, career; Students could select from the university's
list of majors to avoid a problem in capturing the major's name and a subsequent problem in data
cleaning; shift; morning (M) or afternoon (F) according to your class schedule; course schedule;
The university offers morning (M), afternoon (V) and Saturday (S) schedules for humanist
courses; type of course; Within the courses there is an intensive (I) and extensive (E) modality,
so students could select the modality of their preference or both (A); and the course they would
take; the list of courses offered by the university is taken.
The surveys were carried out within the university city to cover the current university
population. It was sought that most of the students interviewed were within the first semesters
of their degrees, thus ensuring that the answers were as close to reality as possible, since during
this period, most students are studying or finishing to complete this requirement.
Once the 680 surveys were completed, the information obtained was cleaned, since there were
several entries with missing data, which were filled with the average data, and the information
was emptied into a database.
Once the complete database was available, we proceeded to choose which data were necessary
and which would only get in the way when carrying out a data mining process; It was determined
that the data chosen initially were those necessary to correctly carry out the study that was
wanted to be carried out and therefore no fields were eliminated.
The database was then moved to a file with an. arff extension, which briefly described the type
of data used and all the instances in the registry.
Using the weka program, the data was analyzed. The knowledge base was clustered using the
EM algorithm, because when this procedure was carried out through KMEANS the error obtained
was too large, which is why it was determined that the best way to classify this base was through
the EM algorithm.
Figure 1: Clustering of analyzed data
After carrying out said clustering, the results were analyzed using the graphs that were shown
to us as a result. Of these graphs, there are 3 that attract attention. The first is the one that shows
the sex (Y) with respect to the course they choose (X).
Figure 2: Data visualization of course selection by gender
From this graph it can be determined that the courses: Ballroom Dancing, Photography,
Gastronomy, Graphology, Yoga and Zumba, are the focuses of interest for the female population
of the University, since it is where about 75% of those interviewed said that I would take one of
those courses if possible. On the other hand, men only have a very weak focus of interest in the
Gastronomy course, since their population is distributed more homogeneously in all courses,
without having much inclination for anyone.
The next graph of interest was the one that shows the career (Y) with respect to the course
they choose (X).
Figure 3: Data visualization of course selection by carrer
In this graph it can be seen that the majors from the Economics and Administration center
have a tendency to enter the Gastronomy and Photography courses, while those from the other
centers have a greater tendency to enter the Ballroom Dance, Yoga courses. and Zumba.
Finally, the last graph of interest is the graph that shows the type of course (Y) with respect to
the course they choose (X).
Figure 4: Data visualization of course selection by modality
In this graph you can see how the most requested type of course, for Gastronomy, Ballroom
Dancing, Zumba and Yoga, is extensive and not intensive as would have been expected. However,
for the other courses, the intensive period is the most requested to carry it out.
4. Conclusions
After an exhaustive analysis of the different results obtained through the weka program to the
knowledge base, it was determined that currently the most requested courses by the university
community are the Gastronomy, Ballroom Dance and Zumba courses, but these courses They do
not open a sufficient number of places to satisfy the large number of demand they have, so
students have to look for other courses of less interest but that are also filled, such as Yoga,
Graphology and Art with clay; However, even with these courses, student demand continues to
be greater than the number of places needed, so finally students resort to courses that are not of
interest to them but still have places to take them.
If in the future it is planned to open more courses, it should first be considered to open more
groups of the most requested courses. If this is not possible due to lack of facilities or teachers,
courses should be opened that have activities related to the most requested courses for this
purpose. way to meet student demand.
References
[1] Ahmed, Z., Mohamed, K., Zeeshan, S., Dong, X.Q.: Artificial intelligence with multi-functional
machine learning platform development for better healthcare and precision medicine.
Database: The Journal of Biological Databases and Curation 2020 (2020).
https://doi.org/10.1093/DATABASE/BAAA010
[2] Alloghani, M., Al-Jumeily, D., Mustafina, J., Hussain, A., Aljaaf, A.J.: A systematic review on
supervised and unsupervised machine learning algorithms for data science. Supervised and
unsupervised learning for data science pp. 3–21 (2020)
[3] Baldonado, M., Chang, C.-C.K., Gravano, L., Paepcke, A.: The Stanford Digital Library Metadata
Architecture. Int. J. Digit. Libr. 1 (1997) 108–121
[4] Berry, M.W., Mohamed, A., Yap, B.W.: Supervised and unsupervised learning for data science.
Springer (2019)
[5] Bruce, K.B., Cardelli, L., Pierce, B.C.: Comparing Object Encodings. In: Abadi, M., Ito, T. (eds.):
Theoretical Aspects of Computer Software. Lecture Notes in Computer Science, Vol. 1281.
Springer-Verlag, Berlin Heidelberg New York (1997) 415–438
[6] Calderón-Reyes, J. E., Muñoz-Bautista, H., Alvarez-Rodriguez, F. J., Barba-Gonzalez, M. L., &
Cardona-Reyes, H. (2022, October). Data Science Based Methodology: Design Process of a
Correlation Model Between EEG Signals and Brain Regions Mapping in Anxiety.
In International Conference on Software Process Improvement (pp. 141-151). Cham:
Springer International Publishing.
[7] García, E. R., & García, F. J. E. (2017). Minería de Datos. Pearson Educación.
[8] Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA
data mining software: an update. ACM SIGKDD Explorations Newsletter, 11(1), 10-18.
https://doi.org/10.1145/1656274.1656278
[9] Han, J., Kamber, M., & Pei, J. (2011). Data Mining: Concepts and Techniques. Morgan
Kaufmann.
[10] Li, W., & Özsu, M. T. (2018). Encyclopedia of Database Systems. Springer.
[11] Martínez Abad, F. (2018). Aplicación de técnicas de minería de datos con software Weka.
[12] Oliva Córdova, L. M., Amado-Salvatierra, H. R., & Villalba Condori, K. O. (2019). An experience
making use of learning analytics techniques in discussion forums to improve the interaction
in learning ecosystems. In Learning and Collaboration Technologies. Designing Learning
Experiences: 6th International Conference, LCT 2019, Held as Part of the 21st HCI
International Conference, HCII 2019, Orlando, FL, USA, July 26–31, 2019, Proceedings, Part I
21 (pp. 64-76). Springer International Publishing.
[13] Paucar-Curasma, R., Villalba-Condori, K., Arias-Chavez, D., Le, N.-T., Garcia-Tejada, G., &
Frango-Silveira, I. (2022). Evaluación del Pensamiento Computacional utilizando cuatro
robots educativos con estudiantes de primaria en Perú. Education in the Knowledge Society
(EKS), 23. https://doi.org/10.14201/eks.26161
[14] Pinto, R. C., & Engel, P. M. (2015). A fast incremental gaussian mixture model. PloS
one, 10(10), e0139931.
[15] Sammaknejad, N., Zhao, Y., & Huang, B. (2019). A review of the expectation maximization
algorithm in data-driven process identification. Journal of process control, 73, 123-136.
[16] Sharma, S., Toshniwal, D.: Mr-ovntsa: a heuristics based sensitive pattern hiding approach
for big data. Applied Intelligence 50, 4241– 4260 (12 2020).
https://doi.org/10.1007/S10489-020-01749-6/FIGURES/6,
https://link.springer.com/article/10.1007/s10489-020-01749-6
[17] Van Leeuwen, J. (ed.): Computer Science Today. Recent Trends and Developments. Lecture
Notes in Computer Science, Vol. 1000. Springer-Verlag, Berlin Heidelberg New York (1995)
[18] Witten, I. H., Frank, E., & Hall, M. A. (2016). Data Mining: Practical Machine Learning Tools
and Techniques. Morgan Kaufmann.