=Paper= {{Paper |id=Vol-2254/10000191 |storemode=property |title=Software and hardware infrastructure for timetables scheduling in university |pdfUrl=https://ceur-ws.org/Vol-2254/10000191.pdf |volume=Vol-2254 |authors=Andrey Malikov,Vladimir Voronkin,Aleksey Shchegolev,Svetlana Pevchenko }} ==Software and hardware infrastructure for timetables scheduling in university== https://ceur-ws.org/Vol-2254/10000191.pdf

Software and hardware infrastructure for timetables
scheduling in university

Malikov Andrey Vladimir Voronkin Pevchenko Svetlana
amalikov@ncfu.ru vl.voronkin@raxperi.com spevchenko@ncfu.ru
Shchegolev Alexey
a.wegolev@gmail.com
North-Caucasus Federal University, Stavropol

Abstract
When making schedule of a large educational institution, we have to
work with big data, which requires a special organization of the software
and hardware infrastructure: usage of special data structures and in-
dexing, organization of data marts and analytical services, development
of new data manipulation algorithms. The paper examines certain as-
pects of the software and hardware infrastructure for business process
of training schedule formation in North Caucasus Federal University.

1 Introduction
Schedule formation is computationally complex problem of distributing finite sets of resources [1], usually non-
polynomically complex. The need for scheduling arises in various areas of human activity and technical systems:
the traffic schedule, the schedule of work of institutions and employees, the study schedule, multi-threaded
multiprocessing, the production line schedule, and so on.
There are two groups of methods and algorithms for automatic schedule formation can: deterministic and
stochastic [1]. Examples of deterministic algorithms are dynamic programming [2], optimization of the cost
function with preprocessing and without preprocessing [3], estimating the heuristics of multi-threaded tasks in
multiprocessors [4], etc. Examples of stochastic algorithms are genetic algorithms [5], various stochastic flow
algorithms [1], etc. Automatic scheduling algorithms work effectively in technical systems, for example, when
planning the distribution of tasks in multiprocessor systems, but they are of little use when allocating resources
in social systems, including study schedules. The schedule formation is characterized by a large number of
heuristics and internal agreements existing in any social system. It is also necessary to take into account the
availability of ”best practices” applied in various educational institutions, and often significant differences in the
regulatory framework and approaches to scheduling in different countries. Very often educational institutions,
especially high schools, uses software that allows to partially automate this process in such a way that the
machine decides the well-formalized computing tasks, and the person implements those processes of resource
allocation that the machine cannot perform due to the complexity of the algorithm being implemented and / or
the existing conflicting restrictions on the resources allocated.
There are many information systems, adapted for scheduling in educational institutions of the Russian Feder-
ation, for example, [6], [7], [8]. Some systems have partially automatic mode or a consulted mode of scheduling,

Copyright c by the paper’s authors. Copying permitted for private and academic purposes.
In: Marco Schaerf, Massimo Mecella, Drozdova Viktoria Igorevna, Kalmykov Igor Anatolievich (eds.): Proceedings of REMS 2018
– Russian Federation & Europe Multidisciplinary Symposium on Computer Science and ICT, Stavropol – Dombay, Russia, 15–20
October 2018, published at http://ceur-ws.org
but as a rule the main way of compiling is a manual method. In this case, known information systems have the
following limitations:

• require considerable time to prepare the initial data in the form of the department’s study load;
• poorly adapted to the processes of continuous change of the initial data in the study load or do not support
the temporality of the initial data;
• do not allow to make a schedule in the presence of individual learning paths for each student, including the
limitations when working with large data;
• not convenient system of parallel access of a large number of employees, who making the schedule (next –
schedule maker), to shared resources (classrooms, training groups, teachers) using the classical query model
of data manipulation.

The software and hardware infrastructure of training schedule formation of the North Caucasus Federal
University is considered, which allows solve these problems.

2 Problem Statement
The North Caucasus Federal University (NCFU) is the largest university in the North Caucasus, formed in 2012.
Currently, more than 25,000 students study at more than 400 educational programs in the NCFU. To ensure the
main business processes of educational, scientific and educational activities, the University uses the integrated
automated management system (IASY VUZ). In the database IASY VUZ, since 1994, information about more
than 500,000 students have been gathered. The total size of the managed dataexceeds 1TB.
Relational OLTP-database IASY VUZ contains more than 1000 tables and indexes based on B+trees. As a
result of the normalization, the metadata schema is represented by a tree:

• the root of the tree is the table kartapromat with students, which accumulates data on the features of each
lesson and schedule, assessments, attendance, etc.; the table contains more than 2 billion records;
• a subtree with educational programs is represented by tables with the description of educational programs,
schedules of the educational process, educational disciplines, lists of classes;
• a subtree with students is represented by tables with personal data of people, training groups, information
about students;
• a subtree with employees is represented by tables with the structure of educational institutions, states,
information about employees.

The presented scheme supports the individualization of the learning process of each student, associated with
the presence of elective disciplines in the curriculum, the availability of several alternative educational programs,
the ability of the federal university to create new own educational standards. The curriculums for different
education level and form contain from 2000 to 4000 separate lessons with an average of 1000 lessons per year.
As a result, it is required every year to schedule for p = 25000 ∗ 1000 = 25000000 records. With the modern
development of computer technology with such a large amount of initial data, the task of automatically or
manually compiling a timetable for an acceptable time seems difficult.
Heuristics are required, which can significantly narrow the area of data to be processed. The main approach is
based on the formation of student learning streams in accordance with the constraints imposed on the academic
disciplines, activities, classrooms used. The learning flow is a virtual aggregation of individual lessons of students
that can be held together when the conditions for the coincidence of the restrictions accepted in the educational
institution are met. Examples of such restrictions are:

• the coincidence of the name or content of the discipline;
• complete or partial coincidence of the calendar plans of the educational process;
• restrictions on the maximum number of students in the stream, for example, because of safety requirements;
• restrictions on the maximum number of students in the stream due to limitations of the classrooms.
3 Business-Process of Schedule Formation and Creating Study Streams
The business process of preparing the study schedule can be conditionally divided into the following stages:
1. The formation of an educational program;
2. Formation of a contingent of students on the educational program;
3. The distribution of students in elective disciplines;
4. Calculation of training streams;
5. The distribution of compulsory resources for educational streams: teachers;
6. Updating the data mart with the study load;
7. Distribution of shared resources by educational flows: classrooms, time;
8. Mapping of the timetable to individual learning trajectories and the formation of individual schedules for
students and teachers.
Figure 1 shows the main components of the database that provide the business process for scheduling.

Figure 1: The main components of the database that support the business process of scheduling

There are two models for calculating learning streams:
1. In the initial calculation mode, current regulations and restrictions are taken into account when forming
the stream. The main tasks at this stage: to minimize the number of streams within the limits of current
restrictions; as much as possible to reduce the number of student permutations between different streams
to avoid unnecessary checks of individual overlays in the schedule for each student.
2. In the regime of continuous updating of streams, changes in educational programs and movement of students
are taken into account. The main tasks at this stage: if possible, to preserve the initial set of streams when
students are added and removed during the year, including possibly with violation of certain restrictions;
when distributing new students through the streams, first of all, only information about the stream should
be taken into account, but not the norms of its formation. At this stage, the study streams have already
been checked by specialists and accepted for processing.
The flowchart for calculating the study streams is shown in Figure 2. The algorithm for calculating the study
streams is a modified version of the greedy algorithm, which makes it possible to significantly reduce the range
of enumeration of all possible combinations of student pooling into streams, but on the other hand does not
guarantee global optimization of the formation of streams. At each step of the algorithm, the problem of local
optimization is solved by comparing each new occupation of each student with the already formed streams,
sequentially checking from the smallest streams to the largest ones. If a suitable flow is found, the student is
added to it, otherwise a new stream is created from one person.
The algorithm can be effectively paralleled when partitioning the sorted list in step 2. At the same time,
sectioning should be carried out in such a way as to ensure that all the activities of one student and all potential
for the organization of students stream into one section fall into place. Examples of such sections may be institutes
or faculties in situations where interfaculty streams are not used in an educational institution. When using the
software and hardware infrastructure of the NCFU, the parallel implementation of the presented algorithm takes
8 hours if 15-20 processes are organized.
Figure 2: Flowchart for calculating study streams

4 Creating a Data Mart
In order to perform analytical services and organize automated workplaces for schedule makers, a data mart
with study streams organized in the database. The main implemented analytical services are:

• calculation of the load of basic resources: classrooms, teachers, training groups;
• calculation of hourly pay for teachers;
• optimization and load balancing of students and teachers;
• calculation of the efficiency of the use of scientific equipment and laboratories;
• optimization of the work of auxiliary services: food, security;
• optimization of the use of water, electricity, etc.

The data mart contains pre-processed aggregated data on study streams and their properties and is designed
to update the multidimensional cubes of the OLAP-module of the information system built according to the
MOLAP scheme with the update period – one time per day. The main measure is the number of students in
the subgroups, combined into a stream. The measurements are: educational programs, groups of students and
study streams.
The structure of the fragment of the data mart with a table of measures and tables for storing information
about the study streams and groups of students is shown in Figure 3.
The data mart does not store information about individual trajectories of students learning. In order for
the information system to track the overlays for each student, the data mart is supplemented with a table
0
matchF lows0 (kodp ot1, kodp ot2) to store the symmetric binary relation of the prohibited overlays in the schedule.
Each entry in the table contains all pairs of a binary relation, while only kodp ot1 < kodp ot2 pairs are stored
with symmetry taken into account. To fill the table 0 matchF lows0 , you need to match all the pairs of lessons
of students to each other, i.e. perform Cp2 = C25000000
2
comparisons. The computational complexity of pairwise
comparisons is described by a polynomial of the second degree. To reduce the computational complexity, the
list of activities can be compared to itself using a merge join, i.e. doublepass the sorted list. In this case, the
computational complexity corresponds to twice the length of the list, but you will also need to sort the list
and hash the result to obtain unique pairs of threads with shared students. Script in SQL to fill the table
0
matchF lows0 :

−−t h e temporary t a b l e #temp maps t o each f l o w k o d p o t a l i s t o f i t s s t u d e n t s
kod cont
−−t h e t a b l e dbo . k a r t a p r o m a t c o n t a i n s a l i s t o f a l l a c t i v i t i e s f o r a l l s t u d e n t s
s e l e c t d i s t i n c t s l 1 . kod pot , k1 . k o d c o n t into #temp
from s c h . s c h e d u l e L o g s l 1 inner j o i n dbo . kartapromat k1
on k1 . kod ng=s l 1 . i d S c h e d u l e−− time−s e n s i t i v e r e c o r d s
where g e t d a t e ( ) between s l 1 . d a t e B e g i n and s l 1 . dateEnd and
k1 . pr=1−− a s i g n o f e l e c t i v e c o u r s e s f o r s t u d e n t s
−− b u i l d B + t r e e
CREATE INDEX I X k o d c o n t ON #temp ( kod cont , k o d p o t ) ON [PRIMARY]
−− d e t e r m i n e t h e f l o w s i n which t h e r e i s a t l e a s t one common s t u d e n t
i n s e r t matchFlows ( kod pot1 , k o d p o t 2 )
s e l e c t d i s t i n c t t 1 . kod pot , t 2 . k o d p o t
from #temp t 1 ( n o l o c k ) inner j o i n #temp t 2 ( n o l o c k )
on t 1 . k o d c o n t=t 2 . k o d c o n t and t 1 . kod pot