<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Comparison of Approaches to the Analysis of Supercomputers Usage E ciency by the Example of Lomonosov and Lomonosov-2 Supercomputers?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sergei Leonenkov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sergey Zhumatiy</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Lomonosov Moscow State University</institution>
          ,
          <addr-line>Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Research Computing Center of Lomonosov Moscow State University</institution>
          ,
          <addr-line>Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <fpage>76</fpage>
      <lpage>84</lpage>
      <abstract>
        <p>\Resource planning e ciency" of HPC-systems is usually dened as the utilization of its resources. The number of queued jobs in most modern supercomputer complexes is much bigger than the number of jobs executed at the same moment of time. That high demand and the evolution of widely-used planning algorithms, which can boost utilization up to 0,95 - 1, allow system administrators to more properly manage computational resources and not only meet the needs of cluster owners in maximizing utilization, but also improve customer experience. We conducted a research of the two largest CIS supercomputer systems' (Lomonosov and Lomonosov-2) usage history and proposed a new multi-metrics de nition of \resource planning e ciency" concept. In this article, our goal was to compare both approaches and explain why the increased demand for computational resources poses new challenges to the creators of resource planning algorithms and how the proposed approach will improve customer service. Discussed multi-metrics e ciency estimation approach is a part of a bigger project, which aims to provide full jobs scheduling eco-system. We examined general architecture of this environment , which will allow to qualitatively change the system settings of the supercomputer job scheduler on the y and adapt to the changing ow of jobs.</p>
      </abstract>
      <kwd-group>
        <kwd>Lomonosov supercomputer</kwd>
        <kwd>Lomonosov-2 supercomputer</kwd>
        <kwd>Resource Management</kwd>
        <kwd>Supercomputer Job Scheduling E</kwd>
        <kwd>ciency</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>In general, supercomputers are expensive and consume a considerable amount
of energy. This poses a major problem of the e ciency of their usage. But what
is e ciency? In most cases, "e ciency" means utilization of supercomputer
resources, however, our experience suggests that this indicator does not always
? Supported by RFBR (project No. 17-07-00719).
prove to be accurate. On practice, there are nuances that must be taken into
account, such as the priorities of individual users or groups, the average size of
the job queue, the waiting time for jobs in the queue, and others. The work of a
supercomputer depends on the settings of a job scheduler, which are exible and
can be changed by an administrator. The question that arises is how to assess if
a supercomputer works e ectively at chosen settings. The utilization is no longer
a univocal indicator, as there are other factors that have to be considered.</p>
      <p>We proposed an e ciency (performance) metric that allows to combine
several metrics and carry out a comprehensive assessment of the work of a
supercomputer (and job scheduler). In short, the proposed e ciency metric includes
several minor metrics that are important from our perspective. It is possible to
change the weight of each of them or supplement them by other metrics.</p>
      <p>In the article we provided a number of case studies using both the traditional
and the new metric, as well as interpretations of the proposed metric values in
some speci c cases. In Section 2, we considered the features of Lomonosov-1 and
Lomonosov-2 supercomputers, which had inspired us to develop the proposed
metric. Section 3 discusses traditional and proposed versions of the e ciency
metric. In Section 4, we compared these metrics and analyzed the di erences.
Section 5 describes our plans for further development of the proposed approach.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Background</title>
      <p>2.1</p>
      <sec id="sec-2-1">
        <title>Lomonosov and Lomonosov-2 Supercomputers</title>
        <p>The two core high-performance computing systems of Moscow State University
are Lomonosov and Lomonosov-2 supercomputers.</p>
        <p>
          More than 900 scienti c research groups (3,000 active accounts) were
provided access to both supercomputers. More than 1,000 jobs are processed every
day. SLURM (Simple Linux Utility for Resource Management) and our
selfcreated external scheduler are used to manage all these jobs. To highlight the
big workload of the system, let's review the overall Lomonosov and
Lomonosov2 supercomputers utilization performance in the period of 4 years (article [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]).
Users submitted more than 820,000 jobs on Lomonosov supercomputer from
March 2014 to March 2017; the SLURM native back ll scheduler (article [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ])
provided over 0,88 utilization.
        </p>
        <p>Same situation can be found on supercomputer Lomonosov-2, where our
external scheduler provided over 0,92 utilization (Fig. 1). Figure 1 shows
approximately 300 days of Lomonosov-2 supercomputer usage. Red and yellow lines
represent busy CPUs on each day of that period. As it can be seen from
Figure 1 overall utilization was less than 0,85 before January of 2017, but after
allowing much more users to run their jobs on Lomonosov-2 we have reached
our current utilization e ciency.</p>
        <p>The other important metric of the job ow for any supercomputer complex
is the average time of waiting for queued job to be started. For instance, this
parameter is more than 22 hours on Lomonosov supercomputer. Such a busy
queue provides a good opportunity to work on improving not only the resource
utilization, but also other metrics that will increase the quality of service for
supercomputer users.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Supercomputers Scheduling: Main Terms</title>
        <p>Here we introduce several terms that allow us to simplify the description and
comparison of both approaches to the evaluation of the supercomputer's
scheduling e ciency. Let's set a strip with xed width H, which shows resources
utilization of a computing system in time (H - number of supercomputer's nodes).
The strip has an XY coordinate system (X corresponds to time, Y - number of
nodes).</p>
        <p>In the strip we set a slot W with length T, which represents a time interval.
Slot start coordinate is the coordinate of its bottom left angle (X0, Y0) (see
Fig. 2). Job is a user's program that has two states: it is either in a queue or is
being executed in computing resources.</p>
        <p>De nition 1. Job is a set of elements Ji = fXi; Ti; Hi; Ri; Ui; Qig, where:
{ Xi - execution start time of a job in computing resources;
{ Ti - time length of job execution in computing resources;
{ Hi - number of computing nodes required to execute a job;
{ Ri - non-empty setup of j pairs (yij ; hij ), which describes job allocation on
nodes as a rectangle with bottom left angle coordinate (Xi; yij ), Ti execution
time and hij number of nodes such that Pj hij = Hi (Fig. 3, 4);
{ Ui - identi er of a user associated with a job;
{ Qi - job queuing time.</p>
        <p>Notations interpretation: a job is represented as a rectangle with set
coordinates of a bottom left angle, de ned size (Hi and Ti are rectangle's sizes on Y and
X axes respectively) and color (corresponds to user identi er), decomposition of
Ri among nodes (Fig. 4).</p>
        <p>
          More detailed information about used notations (i.e what is jobs packing,
packing quality loss function and etc.) can be found in article: \Supercomputer
E ciency: Complex Approach Inspired by Lomonosov-2 History Evaluation" by
Sergei Leonenkov and Sergey Zhumatiy in Springer CCIS, article [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] is still in
publishing.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Evaluation of Supercomputer's E ciency</title>
      <p>3.1</p>
      <sec id="sec-3-1">
        <title>Utilization Approach</title>
        <p>The most widely used resource management quality characteristic is the
utilization of computing nodes. The main goal of a supercomputer complex is to
minimize the idle resources. Until recently, we also used this resources
planning e ciency indicator for Lomonosov and Lomonosov-2 supercomputers (as a
de nition of "usage e ciency").</p>
        <p>jZj
X(Hi (min(T; Xi + Ti)
i=1
U tilization(Z; W ) = 1
Xi)=(H</p>
        <p>T )
(1)
In Formula 1 Z is a setup of jobs that was executed on start of slot W or queued
during slot W. Let's suppose that sets Zstart and Zqueue represent executed jobs
on slot W start time and queued during whole slot W respectfully.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Advanced Approach</title>
        <p>Basing on Lomonosov and Lomonosov-2 supercomputers usage history, we
offered a set of metrics, which allows to consider the task of CPU hours scheduling
e ciency more comprehensively, and a formula, which provides a means of
comparing di erent settings of any scheduling algorithms. In addition to the already
described Utilization, we also want to use the following metrics: average start
time of the rst job of users (Formula 2), average start time of jobs belonging
to a speci c class(Formula 3), number of running jobs (Formula 4) and number
of users (Formula 5), whose jobs from Zqueue were started in chosen slot W.</p>
        <p>UNum</p>
        <p>X
u=1
F U J ST (Z; W ) =
minj UJobs(u)(Xj</p>
        <p>Qj )=U N um(Z)
AV GST (Z; W; Class) =</p>
        <p>X
i Class
(Xj</p>
        <p>Qj )=jClassj;
StartedJ obs(Z; W ) = (jZstartj + jZqueuej
jZj)=(jZqueuej)
StartedU sers(Z; W ) = U N um(Zstart) + U N um(Zqueue)
U N um(Z)</p>
        <p>
          Finally, our proposed e ciency formula is representing a weighted sum of
5 chosen metrics (Formula 6). Additional limitation for weights (Formula 7) is
created to normalize e ciency value on [
          <xref ref-type="bibr" rid="ref1">0,1</xref>
          ].
        </p>
        <p>Ef f iciency =</p>
        <p>M etricsV aluei
5
X P riorityCoef f icienti
i=1
5
X P riorityCoef f icienti = 1
i=1
(2)
(3)
(4)
(5)
(6)
(7)</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Comparison of Two Supercomputer's E</title>
    </sec>
    <sec id="sec-5">
      <title>Evaluation Approaches ciency</title>
      <p>This section compares two considered e ciency evaluation approaches:
utilization-based and multi-metrics.</p>
      <p>An important question that will directly in uence the multi-metrics e ciency
function is the right choice of weights (priority coe cient). To use this approach
each supercomputer complex has to con gure it independently. It is impossible
to nd a universal set of weights for all complexes, as each owner of such system
has a unique ow of jobs launched by the clients, and sets his own narrow goals
when using the system. For example, now the utilization of the processor time
is the cornerstone in the management of HPC-systems, so it is not correct to
set the same coe cients for this metric and the other, as other metrics are more
prone to volatility. Setting a pair of new jobs for execution can signi cantly shift
the "e ciency" in one direction while the utilization remains unchanged.</p>
      <p>All the cases that we examined in this article are considered using model
examples of sets of weights. In view of the complexity of interpreting the values
of the multi-metrics e ciency function, we will use sets of weights, where three
of the ve weights are equal to zero.
4.1</p>
      <sec id="sec-5-1">
        <title>Utilization E ciency Metric on Today's Supercomputers</title>
      </sec>
      <sec id="sec-5-2">
        <title>Workloads Does not Show the Quality of Customer Experience</title>
        <p>Modern supercomputer resource planning techniques and algorithms have
already achieved signi cant results in maximizing utilization. The usage history
of the two MSU facilities shows that the maximum possible utilization was not
achieved because of the factors that have no connection with the quality of the
algorithms, such as reservations for allocated accounts, system failures, etc. On
the other hand, there are other examples. Let's suppose that the computing
eld of the supercomputer is 100 percent occupied. When a certain amount of
resources is released, the scheduler needs to decide which job from the queue can
be added to this location. Let's say there are two identical jobs launched by two
di erent users at di erent times, but one already has jobs on the account, and
the other does not. Usually, the scheduler, tuned to maximize utilization, will
launch the one that was queued before. The planner, which tries to maximize
our multi-metrics function with given weights, will launch the job, the author of
which does not yet have jobs on the account. An example of both scenarios is
given in Fig. 5.</p>
        <p>But what is the di erence? The utilization for both launch scenarios is the
same, but in the case of the multi-metrics e ciency function, we get more users
whose jobs are on the account, which means that on average users will receive
the rst results of their calculations faster. Thus, not only the owners of
supercomputer systems, who spend a large amount of resources to support their work,
are satis ed, but also individual users, who can quickly move from waiting for
the rst results to their analysis.
82</p>
        <p>S. Leonenkov and S. Zhumatiy</p>
        <p>Ending
Queue</p>
        <p>Ended
Ended
We have already reviewed multi-metrics resource planning e ciency based on
utilization and number of users, whose jobs are being executed at a given moment
of time, (with weight equal to 1/2 each). Let's now discuss this e ciency function
but based on utilization and average rst user's job start time metric (with
weight equal to 1/2 each). This choice of metrics follows a similar scenario, like Вариант 2
the previous one, but the start time of the rst user's jobs is an additional
parameter, which the scheduler have to take into account in order to achieve
optimal job planning. Цвет обозначает</p>
        <p>ThiHsi additional metric controls the location of tпhолeьзовuатsелeяr(U'is) rst jobs in the
queue, shifting them all to the very beginning of tвQhiо-чeевррееcмдьuя;пrосrтeанnовtкиqueue regardless of
their queue time. All this caTnnot be tracked usingRi =o{(nyij,lhyij)}; utilization.</p>
        <p>(Xi,yi) i
Вариант 2
4.3 Larghe1 Jobs Start Time Problem</p>
        <p>()Xi,y1 Цвет обозначает
Another prho2blem that was noticed by the system admпiоnльiзsовtаrтеaляt(oUir)s at the RCC
a Полоса
Qi - время постrанaовtкиes, the scheduler
MSU is that, when achieving the highest system utilвizочерtедiьo; n
sometimes abuses large size jobsT(these jobs move in tRhi =e{(yq1,hu1)e,(yu1,he1) }m;uch more slowly
than job)(Xsi,yo2f smalleXri sizes). This ei ect arises due to the fact that the scheduler is
trying to ll all avaiZlsatarbtle empty nodes of the system with small jobs. To cope with
this signi cant problem Chebyshev supercomputer managing policies contained
special day (each Thursday), when all accumulated large jobs were given highest
priority to start execution and all smaller jobs - lowest priority. We strongly
believe that there is no need in such unclear for users optimizations and our
proposed set of metrics can help scheduler to cope with described problem. To
solve this type of e ciency planning problem, we have added to the general list
the average start time of jobs of a s(p0,e0)ci c class metric, where cОlaкнsоsWcaдnлиbныeTde ned
Opt(Zend2) - Opt(Zend1) = 0,03
as a class of jobs with a size from a certain interval. Additionally, this metric
can be used to boost a speci c class, for example: jobs from speci c group of
users, jobs with speci c programming package or even jobs from speci c user.
4.4</p>
      </sec>
      <sec id="sec-5-3">
        <title>Multi-metrics Approach with Utilization Weight Equal to 0</title>
        <p>As we have already mentioned, the selection of weights in multi-metrics e
ciency evaluation approach is a challenging task. We have reviewed three
different convolutions and sets of weights (Sections 4.1-4.3). Each of the e ciency
calculation formulas included utilization metrics. But what goes wrong if the
utilization weight is set to 0? Let's discuss the e ciency function based on the
number of users whose jobs are executing and average rst user's job start time
metrics (with weight equal to 1/2 each). The optimal algorithm for the scheduler
will be to set only the rst job of each user and no longer place any jobs for
execution in order, so that if a new user appears in the queue, he could immediately
get to the execution, thus retaining the optimal value of the e ectiveness. In
this regard, the availability of utilization metric in determining the e ciency of
supercomputer resource planning is vital.
5</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Future Work</title>
      <p>
        At the heart of the future work is the desire to create a recommendation system
that will advise the system administrator on changing the current scheduler
settings [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] in order to maximize proposed multi-metrics e ciency function. The
general architecture of such system is presented on Fig. 6.
      </p>
      <p>change settings
scheduler</p>
      <p>commit new settings
system
administrator</p>
      <p>new settings
recommendation
system
estimate efficiency
algorithms and
policies
queue
manage queue</p>
      <p>Subsequently, this system should evolve into an autonomous cluster
management system.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>
        This material is based upon the work supported by Russian Foundation for
Basic Research (project No. 17-07-00719). The research is carried out using
the equipment of the shared research facilities of HPC computing resources at
Lomonosov Moscow State University [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ].
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Leonenkov</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhumatiy</surname>
            <given-names>S.</given-names>
          </string-name>
          : Supercomputer E ciency:
          <article-title>Complex Approach Inspired by Lomonosov-2 History Evaluation</article-title>
          . Springer CCIS (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>V.</given-names>
            <surname>Sadovnichy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tikhonravov</surname>
          </string-name>
          , Vl. Voevodin, and
          <string-name>
            <given-names>V.</given-names>
            <surname>Opanasenko</surname>
          </string-name>
          <article-title>: "Lomonosov"</article-title>
          : Supercomputing at Moscow State University.
          <article-title>In Contemporary High Performance Computing: From Petascale toward Exascale (Chapman</article-title>
          and Hall/CRC Computational Science), pp.
          <fpage>283</fpage>
          -
          <lpage>307</lpage>
          ,
          <string-name>
            <surname>Boca</surname>
            <given-names>Raton</given-names>
          </string-name>
          , USA, CRC Press,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Leonenkov</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhumatiy</surname>
            <given-names>S.:</given-names>
          </string-name>
          <article-title>Introducing New Back ll-based Scheduler for SLURM Resource Manager, Procedia Computer Science</article-title>
          . Volume
          <volume>66</volume>
          ,
          <year>2015</year>
          , (pp
          <fpage>661</fpage>
          -
          <lpage>669</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>SLURM</given-names>
            <surname>Homepage</surname>
          </string-name>
          , https://slurm.schedmd.com.
          <source>Last accessed 14 September 2018</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. Lomonosov-
          <article-title>2 supercomputer on TOP50 list</article-title>
          , http://top50.supercomputers.ru/.
          <source>Last accessed 14 September</source>
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6. Lomonosov | T-Platforms, http://www.top500.org/system/177421. Last accessed
          <issue>14</issue>
          <year>September 2018</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>