=Paper= {{Paper |id=Vol-2281/paper-09 |storemode=property |title=Comparison of Approaches to the Analysis of Supercomputers Usage Eciency by the Example of Lomonosov and Lomonosov-2 Supercomputers |pdfUrl=https://ceur-ws.org/Vol-2281/paper-09.pdf |volume=Vol-2281 |authors=Sergei Leonenkov,Sergey Zhumatiy }} ==Comparison of Approaches to the Analysis of Supercomputers Usage Eciency by the Example of Lomonosov and Lomonosov-2 Supercomputers== https://ceur-ws.org/Vol-2281/paper-09.pdf
    Comparison of Approaches to the Analysis of
      Supercomputers Usage Efficiency by the
     Example of Lomonosov and Lomonosov-2
                Supercomputers?

                      Sergei Leonenkov1,2 and Sergey Zhumatiy1
    1
        Research Computing Center of Lomonosov Moscow State University, Moscow,
                                       Russia
                 2
                   Lomonosov Moscow State University, Moscow, Russia
                           {leonenkov,serg}@parallel.ru



         Abstract. “Resource planning efficiency” of HPC-systems is usually de-
         fined as the utilization of its resources. The number of queued jobs in
         most modern supercomputer complexes is much bigger than the num-
         ber of jobs executed at the same moment of time. That high demand
         and the evolution of widely-used planning algorithms, which can boost
         utilization up to 0,95 - 1, allow system administrators to more prop-
         erly manage computational resources and not only meet the needs of
         cluster owners in maximizing utilization, but also improve customer ex-
         perience. We conducted a research of the two largest CIS supercomputer
         systems’ (Lomonosov and Lomonosov-2) usage history and proposed a
         new multi-metrics definition of “resource planning efficiency” concept. In
         this article, our goal was to compare both approaches and explain why
         the increased demand for computational resources poses new challenges
         to the creators of resource planning algorithms and how the proposed ap-
         proach will improve customer service. Discussed multi-metrics efficiency
         estimation approach is a part of a bigger project, which aims to pro-
         vide full jobs scheduling eco-system. We examined general architecture
         of this environment , which will allow to qualitatively change the system
         settings of the supercomputer job scheduler on the fly and adapt to the
         changing flow of jobs.

         Keywords: Lomonosov supercomputer · Lomonosov-2 supercomputer ·
         Resource Management · Supercomputer Job Scheduling Efficiency


1       Introduction

In general, supercomputers are expensive and consume a considerable amount
of energy. This poses a major problem of the efficiency of their usage. But what
is efficiency? In most cases, ”efficiency” means utilization of supercomputer re-
sources, however, our experience suggests that this indicator does not always
?
    Supported by RFBR (project No. 17-07-00719).
            Approaches to the Analysis of Supercomputers Usage Efficiency      77

prove to be accurate. On practice, there are nuances that must be taken into
account, such as the priorities of individual users or groups, the average size of
the job queue, the waiting time for jobs in the queue, and others. The work of a
supercomputer depends on the settings of a job scheduler, which are flexible and
can be changed by an administrator. The question that arises is how to assess if
a supercomputer works effectively at chosen settings. The utilization is no longer
a univocal indicator, as there are other factors that have to be considered.
    We proposed an efficiency (performance) metric that allows to combine sev-
eral metrics and carry out a comprehensive assessment of the work of a super-
computer (and job scheduler). In short, the proposed efficiency metric includes
several minor metrics that are important from our perspective. It is possible to
change the weight of each of them or supplement them by other metrics.
    In the article we provided a number of case studies using both the traditional
and the new metric, as well as interpretations of the proposed metric values in
some specific cases. In Section 2, we considered the features of Lomonosov-1 and
Lomonosov-2 supercomputers, which had inspired us to develop the proposed
metric. Section 3 discusses traditional and proposed versions of the efficiency
metric. In Section 4, we compared these metrics and analyzed the differences.
Section 5 describes our plans for further development of the proposed approach.


2     Background

2.1   Lomonosov and Lomonosov-2 Supercomputers

The two core high-performance computing systems of Moscow State University
are Lomonosov and Lomonosov-2 supercomputers.




                  Fig. 1. Lomonosov-2 supercomputer utilization
78     S. Leonenkov and S. Zhumatiy

    More than 900 scientific research groups (3,000 active accounts) were pro-
vided access to both supercomputers. More than 1,000 jobs are processed every
day. SLURM (Simple Linux Utility for Resource Management) and our self-
created external scheduler are used to manage all these jobs. To highlight the
big workload of the system, let’s review the overall Lomonosov and Lomonosov-
2 supercomputers utilization performance in the period of 4 years (article [2]).
Users submitted more than 820,000 jobs on Lomonosov supercomputer from
March 2014 to March 2017; the SLURM native backfill scheduler (article [4])
provided over 0,88 utilization.
    Same situation can be found on supercomputer Lomonosov-2, where our ex-
ternal scheduler provided over 0,92 utilization (Fig. 1). Figure 1 shows approx-
imately 300 days of Lomonosov-2 supercomputer usage. Red and yellow lines
represent busy CPUs on each day of that period. As it can be seen from Fig-
ure 1 overall utilization was less than 0,85 before January of 2017, but after
allowing much more users to run their jobs on Lomonosov-2 we have reached
our current utilization efficiency.
    The other important metric of the job flow for any supercomputer complex
is the average time of waiting for queued job to be started. For instance, this
parameter is more than 22 hours on Lomonosov supercomputer. Such a busy
queue provides a good opportunity to work on improving not only the resource
utilization, but also other metrics that will increase the quality of service for
supercomputer users.


2.2   Supercomputers Scheduling: Main Terms

Here we introduce several terms that allow us to simplify the description and
comparison of both approaches to the evaluation of the supercomputer’s schedul-
ing efficiency. Let’s set a strip with fixed width H, which shows resources uti-
lization of a computing system in time (H - number of supercomputer’s nodes).
The strip has an XY coordinate system (X corresponds to time, Y - number of
nodes).




                           Fig. 2. A strip and W slot
             Approaches to the Analysis of Supercomputers Usage Efficiency           79

   In the strip we set a slot W with length T, which represents a time interval.
Slot start coordinate is the coordinate of its bottom left angle (X0 , Y0 ) (see
Fig. 2). Job is a user’s program that has two states: it is either in a queue or is
being executed in computing resources.
Definition 1. Job is a set of elements Ji = {Xi , Ti , Hi , Ri , Ui , Qi }, where:
 – Xi - execution start time of a job in computing resources;
 – Ti - time length of job execution in computing resources;
 – Hi - number of computing nodes required to execute a job;
 – Ri - non-empty setup of j pairs (yij , hij ), which describes job allocation on
   nodes as a rectangle with bottom left anglePcoordinate (Xi, yij ), Ti execution
   time and hij number of nodes such that j hij = Hi (Fig. 3, 4);
 – Ui - identifier of a user associated with a job;
 – Qi - job queuing time.




                                     Fig. 3. Job

   Notations interpretation: a job is represented as a rectangle with set coordi-
nates of a bottom left angle, defined size (Hi and Ti are rectangle’s sizes on Y and
X axes respectively) and color (corresponds to user identifier), decomposition of
Ri among nodes (Fig. 4).




Fig. 4. Job Ji in a strip and Ri decomposition example. Both parts of that decompo-
sition are gray, this coloring indicates that these parts corresponds to the same user

    More detailed information about used notations (i.e what is jobs packing,
packing quality loss function and etc.) can be found in article: “Supercomputer
Efficiency: Complex Approach Inspired by Lomonosov-2 History Evaluation” by
Sergei Leonenkov and Sergey Zhumatiy in Springer CCIS, article [1] is still in
publishing.
80       S. Leonenkov and S. Zhumatiy

3     Evaluation of Supercomputer’s Efficiency
3.1    Utilization Approach
The most widely used resource management quality characteristic is the uti-
lization of computing nodes. The main goal of a supercomputer complex is to
minimize the idle resources. Until recently, we also used this resources plan-
ning efficiency indicator for Lomonosov and Lomonosov-2 supercomputers (as a
definition of ”usage efficiency”).

                                     |Z|
                                     X
        U tilization(Z, W ) = 1 −          (Hi ∗ (min(T, Xi + Ti ) − Xi )/(H ∗ T )   (1)
                                     i=1

In Formula 1 Z is a setup of jobs that was executed on start of slot W or queued
during slot W. Let’s suppose that sets Zstart and Zqueue represent executed jobs
on slot W start time and queued during whole slot W respectfully.

3.2    Advanced Approach
Basing on Lomonosov and Lomonosov-2 supercomputers usage history, we of-
fered a set of metrics, which allows to consider the task of CPU hours scheduling
efficiency more comprehensively, and a formula, which provides a means of com-
paring different settings of any scheduling algorithms. In addition to the already
described Utilization, we also want to use the following metrics: average start
time of the first job of users (Formula 2), average start time of jobs belonging
to a specific class(Formula 3), number of running jobs (Formula 4) and number
of users (Formula 5), whose jobs from Zqueue were started in chosen slot W.
                               UX
                                N um
           F U JST (Z, W ) =             minj⊂U Jobs(u) (Xj − Qj )/U N um(Z)         (2)
                                   u=1
                                                 X
               AV GST (Z, W, Class) =                 (Xj − Qj )/|Class|;            (3)
                                               i⊂Class

            StartedJobs(Z, W ) = (|Zstart | + |Zqueue | − |Z|)/(|Zqueue |)           (4)
     StartedU sers(Z, W ) = U N um(Zstart ) + U N um(Zqueue ) − U N um(Z)            (5)
   Finally, our proposed efficiency formula is representing a weighted sum of
5 chosen metrics (Formula 6). Additional limitation for weights (Formula 7) is
created to normalize efficiency value on [0,1].
                             5
                             X
            Ef f iciency =         P riorityCoef f icienti ∗ M etricsV aluei         (6)
                             i=1
                           5
                           X
                                 P riorityCoef f icienti = 1                         (7)
                           i=1
             Approaches to the Analysis of Supercomputers Usage Efficiency        81

4     Comparison of Two Supercomputer’s Efficiency
      Evaluation Approaches

This section compares two considered efficiency evaluation approaches: utiliza-
tion-based and multi-metrics.
    An important question that will directly influence the multi-metrics efficiency
function is the right choice of weights (priority coefficient). To use this approach
each supercomputer complex has to configure it independently. It is impossible
to find a universal set of weights for all complexes, as each owner of such system
has a unique flow of jobs launched by the clients, and sets his own narrow goals
when using the system. For example, now the utilization of the processor time
is the cornerstone in the management of HPC-systems, so it is not correct to
set the same coefficients for this metric and the other, as other metrics are more
prone to volatility. Setting a pair of new jobs for execution can significantly shift
the ”efficiency” in one direction while the utilization remains unchanged.
    All the cases that we examined in this article are considered using model
examples of sets of weights. In view of the complexity of interpreting the values
of the multi-metrics efficiency function, we will use sets of weights, where three
of the five weights are equal to zero.


4.1   Utilization Efficiency Metric on Today’s Supercomputers
      Workloads Does not Show the Quality of Customer Experience

Modern supercomputer resource planning techniques and algorithms have al-
ready achieved significant results in maximizing utilization. The usage history
of the two MSU facilities shows that the maximum possible utilization was not
achieved because of the factors that have no connection with the quality of the
algorithms, such as reservations for allocated accounts, system failures, etc. On
the other hand, there are other examples. Let’s suppose that the computing
field of the supercomputer is 100 percent occupied. When a certain amount of
resources is released, the scheduler needs to decide which job from the queue can
be added to this location. Let’s say there are two identical jobs launched by two
different users at different times, but one already has jobs on the account, and
the other does not. Usually, the scheduler, tuned to maximize utilization, will
launch the one that was queued before. The planner, which tries to maximize
our multi-metrics function with given weights, will launch the job, the author of
which does not yet have jobs on the account. An example of both scenarios is
given in Fig. 5.
    But what is the difference? The utilization for both launch scenarios is the
same, but in the case of the multi-metrics efficiency function, we get more users
whose jobs are on the account, which means that on average users will receive
the first results of their calculations faster. Thus, not only the owners of super-
computer systems, who spend a large amount of resources to support their work,
are satisfied, but also individual users, who can quickly move from waiting for
the first results to their analysis.
      Zqueue




82              S. Leonenkov and S. Zhumatiy




                                    Ending                      Ended




                                 Queue
                                                                Ended




Fig. 5. Utilization efficiency metric on todays supercomputers workloads does not show
quality of customer experience


4.2        The First User’s Job Start Time Metric Should be Used for
           Managing Faster Access to First Calculations Results
We have already reviewed multi-metrics resource planning efficiency based on
utilization and number of users, whose jobs are being executed at a given moment
of time, (with weight equal to 1/2 each). Let’s now discuss this efficiency function
but based on utilization and average first user’s job start time metric (with
weight equal to 1/2 each). This choice of metrics follows a similar scenario, like                       Вариант 2
the previous one, but the start time of the first user’s jobs is an additional
parameter, which the scheduler have to take into account in order to achieve
optimal job planning.                                        Цвет обозначает
             H  i
    This additional metric controls the location of the               user’s first jobs in the
                                                             пользователя (U )                i



queue, shifting them all to the very beginning of the        current
                                                     Qi - время постановки
                                                     в очередь;
                                                                           queue regardless of
                                                     Ri = {(y ,h )};
their(X ,yqueue
           )
                time. All this cannot
                                 T
                                      be tracked using   only        utilization.
                                                                    ij   ij

       i    i                                i
                                                                                                         Вариант 2

                            h1
4.3        Large Jobs Start Time Problem
           (Xi,y1
           )                                                                  Цвет обозначает
                                                                              пользователя (Ui)
Another problemh
                       that was noticed by the system administrators at Полоса
                             2                                                               the RCC
                                                              Qi - время постановки
MSU is that, when achieving the highest system utilization    в очередь;        rates,  the  scheduler
                                                              Ri = {(y ,h ),(y ,h ) };
sometimes(X ,y
               abuses large
                      X
                              size jobsT (these jobs move in the queue much more slowly
                                                 i
                                                                              1   1   1   1


                    i   2                i
                                                                                                           Opt(Zend2) - Opt(Zend1) = 0,03
than jobs)    of smaller sizes). This effect arises due to the fact that the scheduler is
trying to fill all available
                       Zstart  empty nodes of the system with small jobs. To cope with
this significant problem Chebyshev supercomputer managing policies contained
special day (each Thursday), when all accumulated large jobs were given highest
priority to start execution and all smaller jobs - lowest priority. We strongly
believe that there is no need in such unclear for users optimizations and our
proposed set of metrics can help scheduler to cope with described problem. To
solve this type of efficiency planning problem, we have added to the general list
the average start time of jobs of a specific
                                           (0,0)
                                                  class metric, where class            can
                                                                                  Окно W    beTdefined
                                                                                         длины
              Approaches to the Analysis of Supercomputers Usage Efficiency      83

as a class of jobs with a size from a certain interval. Additionally, this metric
can be used to boost a specific class, for example: jobs from specific group of
users, jobs with specific programming package or even jobs from specific user.


4.4     Multi-metrics Approach with Utilization Weight Equal to 0

As we have already mentioned, the selection of weights in multi-metrics effi-
ciency evaluation approach is a challenging task. We have reviewed three dif-
ferent convolutions and sets of weights (Sections 4.1-4.3). Each of the efficiency
calculation formulas included utilization metrics. But what goes wrong if the
utilization weight is set to 0? Let’s discuss the efficiency function based on the
number of users whose jobs are executing and average first user’s job start time
metrics (with weight equal to 1/2 each). The optimal algorithm for the scheduler
will be to set only the first job of each user and no longer place any jobs for exe-
cution in order, so that if a new user appears in the queue, he could immediately
get to the execution, thus retaining the optimal value of the effectiveness. In
this regard, the availability of utilization metric in determining the efficiency of
supercomputer resource planning is vital.



5     Future Work

At the heart of the future work is the desire to create a recommendation system
that will advise the system administrator on changing the current scheduler
settings [3] in order to maximize proposed multi-metrics efficiency function. The
general architecture of such system is presented on Fig. 6.


                               change settings           algorithms and
         scheduler
                                                            policies
               commit new settings
          system                                                  manage queue
        administrator
               new settings
      recommendation        estimate efficiency
                                                              queue
          system


                     Fig. 6. Recommendation system architecture

  Subsequently, this system should evolve into an autonomous cluster manage-
ment system.
84     S. Leonenkov and S. Zhumatiy

Acknowledgements

This material is based upon the work supported by Russian Foundation for
Basic Research (project No. 17-07-00719). The research is carried out using
the equipment of the shared research facilities of HPC computing resources at
Lomonosov Moscow State University [5, 6].


References
1. Leonenkov S., Zhumatiy S.: Supercomputer Efficiency: Complex Approach Inspired
   by Lomonosov-2 History Evaluation. Springer CCIS (2018)
2. V. Sadovnichy, A. Tikhonravov, Vl. Voevodin, and V. Opanasenko: ”Lomonosov”:
   Supercomputing at Moscow State University. In Contemporary High Performance
   Computing: From Petascale toward Exascale (Chapman and Hall/CRC Computa-
   tional Science), pp.283-307, Boca Raton, USA, CRC Press, 2013.
3. Leonenkov S., Zhumatiy S.: Introducing New Backfill-based Scheduler for SLURM
   Resource Manager, Procedia Computer Science. Volume 66, 2015, (pp 661-669).
4. SLURM Homepage, https://slurm.schedmd.com. Last accessed 14 September 2018
5. Lomonosov-2 supercomputer on TOP50 list, http://top50.supercomputers.ru/. Last
   accessed 14 September 2018.
6. Lomonosov — T-Platforms, http://www.top500.org/system/177421. Last accessed
   14 September 2018.