=Paper=
{{Paper
|id=Vol-2679/short2
|storemode=property
|title=Dynamic Computing Resource Extension Using COBalD/TARDIS
|pdfUrl=https://ceur-ws.org/Vol-2679/short2.pdf
|volume=Vol-2679
|authors=R. Florian von Cube,René Caspart,Max Fischer,Manuel Giffels,Eileen Kühn,Günter Quast,Matthias J. Schnepf
}}
==Dynamic Computing Resource Extension Using COBalD/TARDIS==
<pdf width="1500px">https://ceur-ws.org/Vol-2679/short2.pdf</pdf>
<pre>
        Dynamic Computing Resource Extension
              using COBalD/TARDIS

R. Florian von Cube, René Caspart, Max Fischer, Manuel Giels, Eileen Kühn,
                       Günter Quast, and Matthias J. Schnepf


                       Karlsruhe Institute of Technology, Germany
 {florian.cube, rene.caspart, max.fischer, manuel.giffels, eileen.kuehn,
                guenter.quast, matthias.schnepf}@kit.edu


        Abstract. To dynamically increase computing power on demand, Cloud
        providers, HPC clusters, and free institute resources can be used. In order
        to make theses so-called opportunistic resources transparently available,
        the services COBalD and TARDIS are developed in collaboration of
        the Institute of Experimental Particle Physics (ETP) and the Steinbuch
        Centre for Computing (SCC) at KIT. The opportunistic resources are
        integrated into an overlay batch system (OBS), which acts as a single-
        point-of-entry for the users. Depending on the decisions of the OBS, the
        utilization, and the allocation of the additional compute resources, CO-
        BalD/TARDIS adjust the resource allocation at the various resource
        providers. To supply the necessary software environment for the jobs, re-
        quired by the scientic communities, virtualization and containerization
        technologies are used on those heterogeneous resources.


                                               ·                                ·
                           ·                         ·
        Keywords: Opportunistic Resources          Heterogeneous Resources          Over-
        lay Batch System       High Energy Physics       Resource Scheduling.


1     Computing Challenges in Physics


     With future experiments, recording drastically increasing amounts of data,
more computing resources will be needed for analysis. In those analyses, physi-
cists want to exploit modern techniques which require specic hardware, as e.g.
graphic cards. Many institutes in physics have dedicated local worker nodes
which are made accessible for the end-users through a batch system. Those stat-
ically integrated resources are adapted to the physicists needs, providing the
necessary soft- and hardware, but are usually provisioned for the average re-
source demands. With uctuating demand in computing power, this can result
in long waiting times during peak loads.
     As the landscape in which scientic computing takes place shifts from com-
munity-specific dedicated computing clusters, to shared science computing cen-
ters, HPC clusters, and cloud providers, already today, those can be used for

    Copyright   ©   2020 for this paper by its authors. Use permitted under Creative
    Commons License Attribution 4.0 International (CC BY 4.0).
mitigating load peaks or providing specic hardware. By backlling with jobs
from other communities, those resources can achieve higher eciencies. The us-
age of the shared clusters, however, is often inconvenient for the end-user and
requires a lot of eort, because of multiple identity providers, the need to pro-
vision software and each cluster operating its dedicated workload management
system. With several such resources, the end-user has to assess which the most
suitable resource is and check which resource has capacity at the current time. If
a resource is only temporarily available, this decision gets even more challenging.


2      Single point-of-entry

For the end-user, usage through a single point-of-entry with dynamic integration
and provisioning of a heterogeneous resource-pool, containing all resources, also
the opportunistic ones, is more desirable. To enable a transparent usage, the
opportunistic resources are integrated into an overlay batch-system (OBS). This
can be e.g. the batch system of the institute or the collaboration. By integrat-
ing the opportunistic resources into the OBS, the end-user only has to interact
with this layer, instead of with each resource separately. Authentication and au-
thorization at the several opportunistic resources is done through proxy users
per institute or research group. With this further authentication of the end user
becomes dispensable.
      The OBS also takes care of the decision of the job to resource scheduling
and accounting. However, providing the user with a single point-of-entry to a
plethora of heterogeneous resources, it has to be taken care, that the jobs run-
time environment is the same, no matter which resource it is scheduled to. For
this, containerization and virtualization technologies are used. With those, it is
assured, that e.g. specic software is installed.


3      Resource Integration using           TARDIS
The integration of the opportunistic resources is performed by TARDIS [11]
(Transparent Adaptive Resource Dynamic Integration System). TARDIS starts
so called drones on the opportunistic resource, which integrate the resource into
the OBS. A drone is a placeholder job that allocates the resource and integrates
it in the OBS. The drone runs either natively on the resource, or it is started
in a container, or a virtual machine. The drone is responsible for preparing the
resource to accept jobs from the OBS. When this preparation is performed, the
OBS can place jobs on the newly integrated resource.


3.1     Overlay Batch-System Adapter
For the integration of the opportunistic resources into the OBS, TARDIS in-
terfaces the OBS through a     BatchSystemAdapter. The adapter denes, how
resources are added to the OBS and how they are managed. At the moment
TARDIS supports HTCondor [12] as OBS. An adapter for the Slurm Work-
load Manager is currently under development in collaboration with the university
of Freiburg.


3.2     Resource Provider Interface
For the life cycle management of the opportunistic resources, they are interfaced
with SiteAdapters. A SiteAdapter denes, how the resource is allocated, how
the status of the drone can be monitored, and how drones can be terminated.
      At the moment HTCondor, MOAB [6], and Slurm [15] batch systems, as
well as CloudStack [7], and OpenStack [14] cloud APIs are supported.


4      Resource Mix Assessment with              COBalD
To optimize the usage of the integrated opportunistic resources, the decisions of
the OBS are monitored by assessing the suitability of the presently integrated
resources to the current job mix. This assessment is done by COBalD [9] (CO-
BalD  the Opportunistic Balancing Daemon).
      For each resource component like CPU, memory, or disk space, the ratio
of the assigned and the requested component is determined. Here, requested
means requested by the drone, and assigned is what is assigned to the actual
jobs running on the resource. The maximum and minimum are called allocation
and utilization, respectively:


                                  CPUassigned Memoryassigned
                                                                    
             allocation = max                 ,                , ···
                                 CPUrequested Memoryrequested
                                 CPUassigned Memoryassigned
                                                                   
             utilization = min               ,                , ··· .
                                 CPUrequested Memoryrequested

By comparing those, a suitability of the drone to the current job mix can
be determined. Consider the following example as shown in g. 1: A drone
requesting a virtual machine with 20 CPU-cores and 80 GB of memory at the
resource provider, runinng jobs which sum up to 20 CPU cores, but only use
60 GB of memory. With this the machine is fully allocated as all CPU-cores are
assigned and it can not run any more jobs. However the memory is only utilized
to 75%. This implies that the machine is not the optimal t for the current job
mix.
      Depending on the utilization and the allocation of the drones, COBalD de-
termines the demand for the respective resource. That is the number of drones
of that type that should be requested. On the other hand supply is the number
of drones of the same type, currently available in the OBS. The policy, when
to request more drones of a specic type is congurable with a Controller. It
increases the demand, when drones of the specic type have a high allocation
and decreases it if the utilization is low.
                          1
                                                      utilization


                          CPU Fraction


                                         allocation
                                                                    not
                                                                    used


                                                       Memory Fraction     1


Fig. 1. Depiction of the metrics  allocation and  utilization used by COBalD to
assess the suitability of the drone to the presently running jobs. The jobs are shown
as the darker rectangles. The gray area is a resource component fraction, not used by
any job. The values measured are directly inuenced by the decisions of the overlay
batch system and thus can be used for reacting accordingly.


      COBalD uses TARDIS resource provider interfaces to determine the num-
ber of drones which should be requested at each one of them. An overview of the
resource scheduling process, depending on the decisions of the OBS is shown in
g. 2. Because of their modularity COBalD/TARDIS are easily customizable
and can be adapted to virtually every setup.


5      Examplary Setups

COBalD/TARDIS is used for resource scheduling in production environments
in several institutes within and outside of physics.


5.1     Institute of Experimental Particle Physics
At the Institute of Experimental Particle Physics (ETP) of the Karlsruhe Insti-
tute of Technology (KIT), research groups of for example the AMS, Belle, and
CMS collaboration share one local HTCondor batch system. For data analysis
and simulation, multiple resources are used:


  The normal desktop workstations, when not used interactively, and dedicated
      local worker nodes with 600 CPU-cores are integrated in the OBS. All jobs
      run in Docker-containers [8].
  The HPC cluster ForHLRII at KIT with 25.000 CPU-cores is integrated
      using COBalD and TARDIS. The drones are bash-scripts, backlled to
      the Slurm Workload Manager, which start up a HTCondor-process. The
      jobs run in a Singularity-containers [13].
                                      Resource Prov. 1   ,
      Local res. 1     Local res. 2
                                      drone 1   drone 2

                                                              Requests
                                      Job 1     Job 3
                                                              resources

         Submits                      Job 2     Job 4
 g                  OBS
                                                                    Assesses
           jobs
                                                                    resource
                                      Resource Prov. 2   ,        suitability

                                      drone 3   drone 4


                                      Job 5      OBS
                                                 place-
                                      Job 6      holder


Fig. 2. Jobs are submitted to the overlay batch-system (OBS). The OBS decides, on
which drone to run the jobs. Depending on the metrics allocation and utilization of
the resources, COBalD manages the amount and type of resources needed. In the
depicted example, more resources would be requested at resource provider 1.


  The HPC cluster BWForCluster  NEMO at the university of Freiburg
      is also integrated using COBalD and TARDIS. It is shared with several
      other communities and consists of 900 worker nodes summing up to roughly
      20.000 CPU-cores. The drones run as virtual machines, booted through
      OpenStack via the MOAB batch system. Within the virtual machines,
      provided by the HPC center, the jobs run inside of Docker-containers.

All resources are available to the physicists at ETP completely transparent
through the batch system. With this setup several thousand CPU-cores are op-
portunistically integrated and used as shown in g. 3.


5.2     WLCG Tier 1 Center GridKa
GridKa is one of the 13 tier 1 centers of the Worldwide LHC Computing Grid
(WLCG), located at KIT. It provides compute and storage resources for several
high energy and astroparticle physics collaborations. As shown in g. 4 oppor-
tunistic resources are transparently made available to the collaborations through
one dedicated compute entry-point, a so-called compute element (CE). The re-
sources are managed using COBalD and TARDIS. This also allows for dynamic
integration of dedicated hardware, as e.g. graphic cards for deep learning.


5.3     Fighting COVID-19
The Institute of Experimental Particle Physics and the tier 1 center GridKa
contribute to the distributed computing projects Rosetta@home [4] and Fold-
Fig. 3. The number of CPU cores used through the OBS at ETP. The resources at
the HPC clusters NEMO and ForHLRII are dynamically integrated using COBalD
and TARDIS. The number of the statically integrated CPU cores used uctuates, as
those are freed, whenever the machine is used interactively.


                                                         Bonn Tier 3
                                                           (BAF)
                                                                            Bonn HPC
                                                                            (BONNA)


                                     ARC CE           HTCondor                KIT HPC
                     GridKa
                                  cloud-arc-1-kit       OBS                 (FORHLR2)


                              Single Point of Entry
                                                                            KIT Tier 3
                                                                            (TOPAS)
                                                        LMU Munich
                                                         OpenStack


Fig. 4. GridKa dynamically integrates opportunistic resources for several experiments
behind a single point-of-entry. Jobs from the dierent experiments are only dispatched
to their corresponding resources. This happens completely transparent to the experi-
ments. Figure from [10].
ing@home [2] for the ght against COVID-19. Both institutes use a COBalD/
TARDIS-setup with a placeholder batch-system mimicking a constant demand
to allocate resources within their respective resource pools for providing them
to the protein research projects. The computing payloads are pulled from the
central management servers of the research projects and then executed on the
local resources. With this setup, ETP and GridKa make a signicant contribu-
tion. [3,5,1]


6    Summary

Using shared science computing centers, HPC clusters, and resources at cloud
providers, is a promising way to mitigate computing load peaks within research
groups, and provide specic hardware. It also can help to achieve higher ecien-
cies on those resources. To dynamically schedule and integrate heterogeneous
resources on demand, COBalD and TARDIS are the necessary toolkits. They
allow for optimizing the resource usage by reacting on the decisions of the overlay
batch-system and by this yield a more ecient resource usage.
    For the end-users a setup with COBalD and TARDIS allows for a transpar-
ent usage of a plethora of resources through one single point-of-entry. They are
no longer faced with the inconvenience and the eort of having multiple identity
providers, or several batch systems to deal with. By integrating resources pro-
viding dedicated hardware, this can easily be made available for the end-users
maintaining fair usage.


References

 1. Fight       COVID-19       Grafana     (2020),     https://grafana-sdm.scc.kit.edu/d/
    uMJtTojZk/ght-covid-19
 2. Folding@home  Fighting disease with a world wide distributed super computer.
    (2020), https://foldingathome.org/
 3. Folding@home stats report (2020), https://stats.foldingathome.org/team/250565
 4. Rosetta@home (2020), https://boinc.bakerlab.org/
 5. Rosetta@home: KIT-GridKa (2020), https://boinc.bakerlab.org/rosetta/show_
    user.php?userid=2127744
 6. Adaptive Computing: Moab HPC  Adaptive Computing (Jun 2020), https://
    adaptivecomputing.com/cherry-services/moab-hpc/
 7. Apache: Apache CloudStack: Open Source Cloud Computing (Jun 2020), http:
    //cloudstack.apache.org/
 8. Docker Inc.: Empowering App Development for Developers | Docker (Jun 2020),
    https://www.docker.com/
 9. Fischer,     M.,   Kuehn,   E.,   Giels,   M.,   Schnepf,   M.,   Kroboth,   S.,   Frey-
    ermuth,      O.:   MatterMiners/cobald:       New     Plugin   System    (Apr       2020).
    https://doi.org/10.5281/zenodo.3752587,               https://doi.org/10.5281/zenodo.
    3752587
10. Giels, M., et al.: Eective dynamic integration and utilization of heterogenous
    compute resources. Proceedings of the 24th International Conference on Comput-
    ing in High Energy and Nuclear Physics (2020), to be published
11. Giels, M., Schnepf, M., Kuehn, E., Kroboth, S., Caspart, R., von Cube, R.F., Fis-
    cher, M., Wienemann, P.: MatterMiners/tardis: YAR (Yet Another Release) (Jun
    2020). https://doi.org/10.5281/zenodo.3874847, https://doi.org/10.5281/zenodo.
    3874847
12. HTCondor Team: HTCondor (May 2020). https://doi.org/10.5281/zenodo.3834815,
    https://doi.org/10.5281/zenodo.3834815
13. Kurtzer,     G.M.,    Sochat,    V.,   Bauer,   M.W.:   Singularity:   Scientic    con-
    tainers    for   mobility   of   compute.   PLOS   ONE     12(5),   120   (05     2017).
    https://doi.org/10.1371/journal.pone.0177459,       https://doi.org/10.1371/journal.
    pone.0177459
14. OpenStack: Build the future of Open Infrastructure (Jun 2020), https://www.
    openstack.org/
15. SchedMD: SchedMD | Slurm Support and Development (Jun 2020), https://www.
    schedmd.com/

</pre>