=Paper=
{{Paper
|id=Vol-3041/143-148-paper-26
|storemode=property
|title=Developing a Toolkit for Task Characteristics Prediction Based on Analysis of Queue’s History of A Supercomputer
|pdfUrl=https://ceur-ws.org/Vol-3041/143-148-paper-26.pdf
|volume=Vol-3041
|authors=Mahdi Rezaei,Alexey Salnikov,Alexander Shiryaev
}}
==Developing a Toolkit for Task Characteristics Prediction Based on Analysis of Queue’s History of A Supercomputer==
<pdf width="1500px">https://ceur-ws.org/Vol-3041/143-148-paper-26.pdf</pdf>
<pre>
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and
                           Education" (GRID'2021), Dubna, Russia, July 5-9, 2021


DEVELOPING A TOOLKIT FOR TASK CHARACTERISTICS
PREDICTION BASED ON ANALYSIS OF QUEUE’S HISTORY
             OF A SUPERCOMPUTER
                            M. Rezaei1,a, A. Salnikov1,2, A. Shiryaev1
                 1
                     Moscow Institute of Physics and Technology, Moscow, Russia
                       2
                           Lomonosov Moscow State University, Moscow, Russia

                                   E-mail: a mahdirezaei_ai@yahoo.com


Empirical studies have repeatedly shown that in High-Performance Computing HPC users’ resource
estimations lack accuracy. Therefore, resource underestimation may remove the job at any step of
computing and subsequently allocated resources will be wasted. Moreover, resource overestimation
also will waste resources. In this work, to effectively utilize the overall HPC system, we proposed a
new approach to predict the required resources such as; number of required CPUs, time slots etc. for
newly submitted job. The study focused on predictive analytics tasks including regression and
classification. A supervised machine learning system, comprising several models, was trained based
on the collection of statistical data including per-job and per-user features collected from the reference
queue systems. Results indicated that adding more features to the dataset improves the prediction
accuracy. The possibility of designing a plugin to apply our machine learning system in practical
applications was studied. A dynamically connected SLURM SPANK plugin was created that adds the
“--predict-time” option and takes control on srun and sbatch commands while they are executed. It
was found that the plugin enables practical use of our proposed machine learning system.

Keywords: machine learning, predictive analytics, decision making, HPC, job scheduling,
SLURM plugin.


                                                      Mahdi Rezaei, Alexey Salnikov, Alexander Shiryaev


                                                               Copyright © 2021 for this paper by its authors.
                      Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


                                                     143
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and
                           Education" (GRID'2021), Dubna, Russia, July 5-9, 2021


1. Introduction
         High-Performance Computing (HPC) applications are intrinsically computational and data
intensive. Because of the dynamicity of resources and on-demand user application requirements, job
scheduling in such environments is an NP-Complete and complex problem [1]. Resource allocation is
done by job schedulers but the major drawback of job schedulers is that required resources must be
known to the scheduler at the time of job submission. Empirical studies have repeatedly shown that
users’ resource estimations lack accuracy [2], whereas there is no automatic system in HPCs that can
effectively allocate resources. The SLURM, a famous job scheduler, has a mechanism to predict only
the starting time of a job. However, this mechanism is very primitive and more often the time is
overestimated. The purpose of this work is to do predictive analytics on resource allocation using
machine learning methods. Using algorithms, including machine learning algorithms, to predict
required resources for jobs has been pursued by several previous studies [3-8]. Using historical data is
a reasonable method to improve the performance of the schedulers in order to utilize the overall HPC
system efficiently [9]. We created also a plugin to study the possibility of applying our system on real
clusters. Our proposed plugin is dynamically connected SPANK plugin, that adds the option ‘--
predict-time’ and while executing srun and sbatch commands, takes control on them. The block
diagram of our proposed system is shown in [fig. 1].


                              Figure 1. Block diagram of proposed system


2. Supervised machine learning and predictive analytics
        Machine learning plays an important role in predictive analytics as algorithms are capable to
be trained based on a historical dataset. Dagnino et al. have already discussed the applicability of
machine learning in data analytics [10, 11]. Nowadays many of applicable machine learning systems
use supervised learning. Different machine learning techniques and methods are available. Therefore,
different algorithms can be defined to find the best resource prediction in an HPC system [12]. The
study focuses on predictive analytics, including regression and classification tasks.


                                                   144
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and
                           Education" (GRID'2021), Dubna, Russia, July 5-9, 2021


2.1 Regression algorithms
        Regression algorithms have continuous outputs and are used to predict the resources. They are
as follows: Multilayer Perceptron (MLP), Random Forest Regression (RFR), Lasso Regression (LR),
K-Nearest Neighbor Regression (KNN), Ordinary Least-Squares Regression (OLSR), Support Vector
Regression (SVR), Ridge Regression (RR), Polynomial Regression (PR), and CART Regression
(CARTR).
2.2 Classification algorithms
         Classification algorithms are used when the output is a discrete label and are used to check the
failure of jobs. They are as follows: Naive Bayes classifier, Kernel Support Vector Machines (SVM),
CART classification.
2.3 Data preparation
         To check the efficiency of our system we used collected statistics of Bluegene/P system
installed at the Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State
University named after M.V Lomonosov. This statistic includes information on jobs run for almost 12
months in 2017. It has about 112000 instances. We used 80% of the dataset for training and 20% for
testing. The per-job features, presented in Table 1, in first phase were used to train our machine
learning. In second phase, 7 metrics known as per-user features, shown in Table 2, were added to our
dataset to train our machine learning.
2.4 Machine learning experimental results
       Our findings presented in [fig. 2a] show that in general, the predictions by MLP, RFR, KNN,
PR, and CARTR are promising. After adding per-user features to the dataset, it was observed that
most of the models had improved. RFR, KNN, and CARTR improved a little. However, the
performance of MLP and PR drastically improved after adding per-user features to the dataset.
Another promising finding was that after adding per-user features to the dataset all the models
improved to predict the required time.
                                        Table 1. per-job features
             Feature        Type       Description
             time_limit     Numeric    The time requested by the user for a job.
             num_cpus       Numeric    The number of CPUs requested by the user for a job.
             id             String     Task identifier, as identified in our job scheduling system.

             name           String     Task name specified by the user.

             user           String     User name (extract with getpwuid (euid)).

             group          String     User group (extract with getgrgid (egid)).

             task_class     String     Class of task. It is assigned automatically by scheduling
                                       system or by the user in case if allowed. It is used to split the
                                       priorities of tasks into several classes. Each class has its own
                                       priority.

             state          String     The status of job which is either completed or removed.
             required_time Numeric     The time during which a job is done. This time will be
                                       predicted for newly submitted jobs.
             required_cpu   Numeric    The number of CPUs used by a task during execution. This
                                       number will be predicted for newly submitted jobs.

         As [fig. 2b] shows, RFR is the best model and the highest increase in accuracy rate belongs to
the SVR model. From [fig. 2c], we can see that CART classification is the best model for
classification.


                                                     145
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and
                           Education" (GRID'2021), Dubna, Russia, July 5-9, 2021


                                          Table 2. per-user features
             Feature                      Type     Description
             used_portion_of_time_limit   Numeric The degree of reasonableness of execution time
                                                  requested by the user.
             avg_aborted_task             Numeric Percentage of aborted tasks submitted by the user.
             average_congestion           Numeric The average occupancy of the system by the user.
             average_cpus                 Numeric The average number of requested CPUs by the
                                                  user.
             duration                     Numeric The average waiting time of the user in the queue.
             wait_time / time_limit       Numeric The average ratio of time in the queue to the
                                                  requested time.
             average_time_limit           Numeric The average of time limits set by the user.


  Figure 2. (a) R2 values in the regression problem to predict the number of required CPUs (b) R2
 values in the regression problem to predict the required time (c) F1-score of the classification value


3. Plugin
         To make our system applicable on the SLURM we need to create a plugin. Therefore, a
component was designed to connect to the SLURM. It has the ability to collect statistics, analyze them
and based on the analysis create a model. Using this model, the prediction could be done. Plugins may
not only complement, but also modify the behavior of SLURM to optimize the system’s performance.
SLURM has a centralized SLURMctld manager for monitoring resources and tasks. Every compute
server (node) has a SLURMd daemon that could be compared with a remote shell: it waits for a job,
runs that job, returns state and waits for more work. In addition, SLURM has several utilities for its
users. In this work we are mainly interested in 2 utilities: srun - to start a job and sbatch - for placing
jobs in the queue. More precisely, sending a batchscript (instructions to SLURM to perform the job) to
the SLURM.
3.1 Review of existing solutions
         The standard SLURM package has a mechanism to predict only the starting time of a job.
However, this mechanism is very primitive and more often the time is overestimated. Moreover, there
is a software system for modeling the activity of computing cluster users based on the SLURM, which
uses the collection of statistics to simulate the load on a model of computing cluster under the control
of SLURM. This software lists several metrics used by the system administrators of clusters. This

                                                      146
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and
                           Education" (GRID'2021), Dubna, Russia, July 5-9, 2021


approach has been tested on data from computing clusters of the Faculty of Computational
Mathematics and Cybernetics, Lomonosov Moscow State University and NIKIET JSC [13].
However, this solution is also not suitable because it does not allow to analyze and prediction. There
are other systems [14] for analyzing the efficiency of using a cluster. Nevertheless, trying to connect
such systems to the SLURM wastes large amount of resources, while our proposed method only
requires the development of a component where analysis system can easily get embedded. In another
work [15], various metrics for clusters are studied, where the user will have to use these metrics
manually to predict the run time of the job which is not a ready-made solution.
3.2 Statement of technical specifications
       The task is done through design, develop and test the functionality of the component (or
components) connected to the SLURM queuing system. The component collects statistical data and
does analysis on the flow of computational tasks. The component is a system of two modules
implemented as:
               Add-ons to SLURM to add and handle the --predict-time option in the srun and sbatch
                commands to predict the job run time.
               A Linux daemon that handles HTTP requests, trains the model, and etc. (hereinafter -
                the main application). The application must be able to connect custom algorithms
                according to a given template.
        In addition, special scripts are required to install these modules easily.


4. Conclusion and future work
        Our work has led us to conclude that adding new features to the dataset improves prediction
accuracy. An innovative solution for the resource allocation problem was found. The possibility of
writing a plugin to apply our machine learning system in practical applications was studied. It was
found that designing a plugin allows the practical use of machine learning algorithms in decision
making. However, it is required to improve the performance of this component. In future work, we
will use this component to evaluate our algorithms on a real cluster to find the best method to do
prediction.


References
[1] Jeffrey D. Ullman, NP-complete scheduling problems, J. Comput. System Sci. 10 (3) (1975) 384–
393.
[2] D. T safrir, Y. Etsion, and D. G. Feitelson, “Backfilling Using System-Generated Predictions
Rather than User Runtime Estimates,” IEEE Trans. Parallel Distrib. Syst., vol. 18, no. 6, pp. 789-803,
2007.
[3] A. W. Moore, J. Schneider, and K. Deng, “Efficient Locally Weighted Polynomial Regression
Predictions,” in Proc. 14th Int. Conf. Machine Learning, 1997, pp. 236-244.
[4] A. W. Mu'alem, and D. G. Feitelson, “Utilization, Predictability, Workloads, and User Runtime
Estimates in Scheduling the IBM SP2 with Backfilling,” IEEE Trans. Parallel Distrib. Syst., vol. 12,
no. 6, pp. 529-543, 2001.
[5] W. Smith, I. Foster, and V. Taylor, “Predicting application run times with historical information,”
J. Parallel Distrib. Comput., vol. 64, no. 9, pp. 1007-1016, 2004.
[6] R. Albers, E. Suijs, and P. H. N. de With, “Triple-C: Resource usage prediction for semi-
automatic parallelization of groups of dynamic image-processing tasks,” in Proc. 23rd Int. Parallel
Distributed Processing Symp., 2009.


                                                   147
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and
                           Education" (GRID'2021), Dubna, Russia, July 5-9, 2021


[7] R. Duan, F. Nadeem, J. Wang et al., “A Hybrid Intelligent Method for Performance Modeling and
Prediction of Workflow Activities in Grids,” in Proc. 2009 9th IEEE/ACM Intl. Symp. Cluster
Computing and the Grid, 2009, pp. 339-347.
[8] F. Nadeem, and T. Fahringer, “Using Templates to Predict Execution Time of Scientific
Workflow Applications in the Grid,” in Proc. 2009 9th IEEE/ACM Intl. Symp. Cluster Computing and
the Grid, 2009, pp. 316-323.
[9] Predicting application run times with historical information Warren Smitha, Ian Fosterb, Valerie
Taylorc,2004
[10] A. Dagnino, K. Smiley, and L. Ramachandran, "Forecasting Fault Events in Power Distribution
Grids Using Machine Learning," Proceedings of the 24th International Conference on Software
Engineering and Knowledge Engineering (SEKE'12), San Francisco, CA, USA, pp. 458-463, July
2012.
[11] A. Dagnino and D. Cox, "Industrial Analytics to Discover Knowledge from Instrumented
Networked Machines", Proceedings of the 26th International Conference on Software Engineering and
Knowledge Engineering (SEKE’14), Vancouver, Canada, July 2014.
[12] Rodrigues, E. R., Cunha, R. L., Netto, M. A., and Spriggs, M. “Helping HPC users specify job
memory requirements via machine learning,” in Proceedings of the Third International Workshop on
HPC User Support Tools, pp. 6-13, IEEE Press, 2016.
[13] A software system for modeling the activity of users of a computing cluster based on the queuing
system SLURM - article // TRUE - Intellectual System for Thematic Research of Scientometric Data
URL: http://omega.sp.susu.ru/books/conference/PaVT2015/short.html
[14] Leonenkov S., Zhumatiy S. (2019) Supercomputer Efficiency: Complex Approach Inspired by
Lomonosov-2 History Evaluation. In: Voevodin V., Sobolev S. (eds) Supercomputing. RuSCDays
2018. Communications in Computer and Information Science, vol 965. Springer, Cham.
[15] B. Song, C. Ernemann and R. Yahyapour, "User group-based workload analysis and modelling,"
CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005., Cardiff,
Wales, UK, 2005, pp. 953-961 Vol. 2, doi: 10.1109/CCGRID.2005.1558664.


                                                   148

</pre>