<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Evaluation of the Impact of Various Local Data Caching Configurations on Tier2/Tier3 WLCG Sites</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Aleksandr Alekseev</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff6">6</xref>
          <xref ref-type="aff" rid="aff8">8</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stephane Jezequel</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrey Kiryanov</string-name>
          <email>kiryanov@cern.ch</email>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexei Klimentov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tatiana Korchuganova</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff6">6</xref>
          <xref ref-type="aff" rid="aff8">8</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Valery Mitsyn</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Danila Oleynik</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Serge Smirnov</string-name>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrey Zarochentsev</string-name>
          <xref ref-type="aff" rid="aff7">7</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Brookhaven National Laboratory</institution>
          ,
          <addr-line>Upton, NY</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute for System Programming RAS</institution>
          ,
          <addr-line>Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Joint Institute for Nuclear Research</institution>
          ,
          <addr-line>Dubna</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Laboratoire d'Annecy de Physique des Particules</institution>
          ,
          <addr-line>Annecy</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>NRC “Kurchatov Institute” - PNPI</institution>
          ,
          <addr-line>Gatchina</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>National Research Nuclear University MEPhI</institution>
          ,
          <addr-line>Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff6">
          <label>6</label>
          <institution>Plekhanov University of Economy</institution>
          ,
          <addr-line>Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff7">
          <label>7</label>
          <institution>Saint Petersburg State University</institution>
          ,
          <addr-line>Saint Petersburg</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff8">
          <label>8</label>
          <institution>University Andres Bello</institution>
          ,
          <addr-line>Santiago, Chili</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we describe various data caching scenarios test implementation and lessons learned. In particular, we show how local data caches may be configured, deployed, and tested. In our studies, we are using xCache, which is a special type of Xrootd server setup to cache input data for a physics analysis. A relatively large Tier2 storage is used as a primary data source and several geographically distributed smaller WLCG sites configured specifically for this test. All sites are connected to the LHCONE network. The testbed configuration is evaluated using both synthetic tests and real ATLAS computational jobs submitted via the HammerCloud toolkit. The impact and realistic applicability of different local cache configurations is explained, including both the network infrastructure and the configuration of computing nodes.</p>
      </abstract>
      <kwd-group>
        <kwd>Federated Storage</kwd>
        <kwd>xCache</kwd>
        <kwd>WLCG</kwd>
        <kwd>DOMA</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>HENP experiments are preparing for the HL-LHC era, which will bring an
unprecedented volume of scientific data. This data will need to be stored and processed by
collaborations, but the expected resources growth is nowhere near extrapolated
requirements of existing models both in storage volume and compute power. It is well
understood that computing models need to evolve. Such evolution includes multiple
aspects:
• Optimized data processing, squeezing the maximum from the available</p>
      <p>CPU/GPGPU/FPGA resources.
•
•
•
•
2</p>
      <p>Optimized data storage, reduction of the number of copies, different data access
methods, full utilization of network resources.</p>
      <p>Cost optimizations, no high-end expensive RAID setups, no underutilized CPUs
on storage servers, no HDDs with 90% free space on the worker nodes.
Deployment optimizations, scalability and containerization with on-demand
expansion into the cloud (both community and commercial).</p>
      <p>Operational cost optimization, more standardized solutions, lower requirements
on unique Grid expertise.</p>
    </sec>
    <sec id="sec-2">
      <title>Ongoing R&amp;D Projects</title>
      <p>WLCG and experiments have launched multiple R&amp;D projects to address HL-LHC
challenges:
• Data Lake. The aim is to consolidate geographically distributed data storage
systems connected by fast network with low latency. The Data Lake model as an
evolution of the current infrastructure bringing reduction of the storage and
operational costs.
• Intelligent Data Delivery Service (iDDS). The intelligent data delivery system
will deliver events as opposed to delivering bytes. This allows an edge service to
prepare data for production consumption, the on-disk data format to evolve
independently of applications, and decrease the latency between the application and
the storage. The first implementation in April-May 2020 for Data carousel and
active ML workflows.
• Hot/Cold storage. Data placement and data migration between “Hot” and “Cold”
storages using data popularity information.
• Data format and I/O. Evaluating new formats (e.g. parquet) and I/O performance
for HENP data.
• Third Party Copy. Improve bulk data transfers between sites and find a viable
replacement to the GridFTP protocol.
• Operations Intelligence. Reduce the HEP experiments computing operations
effort by exploiting anomaly detection, time series and classification techniques to
help the operators in their daily routines, and to improve the overall system
efficiency and resource utilization.
• Data Carousel. Use tape more effectively and actively in distributed computing
context.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Objectives of this work</title>
      <p>
        This research is conducted in collaboration with the European Data Lake Project,
which is part of the WLCG DOMA initiative [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. We will show a few possible ways
of optimizing remote data access from the worker nodes in somewhat small T2/T3
setups or dynamically scaled containerized deployments for physics analysis
payloads. This kind of deployment implies the necessity of heavy site-remote read-biased
data I/O, and time slot (t) allocated for analysis job is normally split into three phases
(disregard some overhead): input read (t1), compute (t2) and output write (t3).
Sometimes analysis payloads can read and write data while performing computation which
makes it hard to separate t1 from t2 and t2 from t3, but in any case, at least some data
needs to be preloaded before computation can start. Here, we will focus on optimizing
t1 and thus improving the CPU utilization of a compute resource.
      </p>
      <p>
        In order to optimize the read time (t1) in cases where hardware and network
performance cannot be easily improved, various caching systems are standardly used.
However, any kind of caching is only effective with a sufficient cache hit ratio. The
very first thing we need to check is the real repeatability of read requests during
standard physics workflow. Let us try to evaluate the typical number of read requests
to a single file (K) of the ATLAS experiment data suitable for user analysis. Figure 1
shows the ATLAS derivation data sample popularity (number of usage) by users’
analysis tasks [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        There is at least one dataset that was accessed 1170 times. On average, ATLAS
DAOD datasets consist of 50 files, which means that each file in this dataset was
accessed at least 20 times if the data popularity is evenly split between them. We take
K = 20 as the basis of our tests. In the end, we have to build a distributed data
processing system where the computing element (CE) is distinct and distant from the
primary storage system; computing tasks are submitted by users, and these tasks can
eventually request access to the same input file up to 20 times. The input file is
located on a primary storage and its size varies from 1 to 5 GB (the overview of ATLAS
data files is given in the HEPiX talk [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]). In this case, the infrastructure and building
blocks of CEs can vary significantly between the sites.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>The test bed</title>
      <p>For the scheme we have explained above, it is necessary to describe some specific
details such as the data access protocol and a caching system. In our tests, we will use
an xrootd protocol which is widely employed by LHC experiments and has an
important property of supporting redirects. The latter feature is important when building
distributed storage systems, including distributed caching systems. As a caching
software we will use an xCache which is, basically, a standard xrootd server configured
in a special way.</p>
      <p>We have decided to exploit three data caching schemes shown in Fig. 2. There is
no universal solution due to the hardware (especially network) differences on
different sites. With these schemes, we tested three quite obvious scenarios:
1. A single dedicated cache server for sites having a modest external connectivity
(~1 Gbps) and a relatively good internal network for worker nodes (&gt;=10 Gbps).
2. A local isolated cache on each worker node for sites having a good external
connectivity (&gt;=10 Gbps), but modest internal network for worker nodes (~1 Gbps).
3. A shared cache between worker nodes for sites having external and internal
networks of the same relatively high quality (&gt;=10 Gbps) – this approach requires
some sort of service discovery.</p>
      <p>
        At the first stage of the testing, scenarios 1 and 2 were implemented using
resources of JINR, PNPI and MEPhI (Fig. 3). JINR was used as a primary storage with
10 Gbps uplink while still having a local CE with 1 Gbps internal network. This CE
was used as a reference and no caching system was deployed there. Tests with JINR
CE were only carried out at the very beginning; later, such tests lost their value. The
PNPI CE located 520 km (~ 11 ms latency) from JINR has 10 Gbps internal network
and 10 Gbps uplink to primary storage. The MEPhI CE located 120 km (~1 ms
latency) from JINR has 1 Gbps internal network and 10 Gbps uplink to primary storage.
In order to receive some useful performance metrics, we needed tthe appropriate tests.
In this case, the authors already had some experience in testing distributed storage
systems with both synthetic tests [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and the HammerCloud toolkit used by the
ATLAS experiment [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], both of which were used for testing the EOS-based
distributed storage [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
• As synthetic tests, a simple file copy by the xrdcp tool was used.
• As a HammerCloud payload a real-life Athena analysis task was submitted to the
      </p>
      <p>CEs.</p>
      <p>
        The first tests, which were reported at HEPiX Workshop [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], were conducted at
PNPI and JINR sites, using only the dedicated xCache. Figures 4, 5, 6 show the
results of HammerCloud tests with and without xCache (JINR is a primary storage, so
xCache was not used there). The results, as can be seen from the graphs:
• Wallclock at PNPI (t):
o Direct mean time = 2698 ± 577 s
o xCache mean time = 1934 ± 139 s
o Difference ~ 770s, ~30%
• Download input files time at PNPI (t1):
o Direct mean time = 811 ± 574 s
o xCache mean time = 53 ± 137 s
o Difference ~ 770s, ~95%
• Download input files time at JINR (t1):
      </p>
      <p>o Direct mean time = 117 ± 17 s</p>
      <p>These results give an idea of the fundamental benefits of using xCache.</p>
      <p>The following synthetic tests were carried out taking into account the average
number of requests to a single file (20) and the fact that the task can land on a random
worker node, which is important in the case of a local cache (case 2 in Fig. 2). Tests
were carried out in batches, since the load and available bandwidth of the external
network is variable and it was necessary to compare different caching schemes in the
same external network conditions.</p>
      <p>Figures 7, 8, and 9 show the results of these tests. PNPI was tested with both
dedicated and local caches, while MEPhI only with dedicated cache because of the
shortage of local disk resources on the worker nodes. The results clearly show the benefit
of using a dedicated cache for both sites, which is a bit unexpected for MEPhI, since
the local network there is worse than the external one, and no improvement from
using the cache was expected. At the same time, we can see minimum benefits from
using the local cache which are within the margin of error.</p>
      <p>HammerCloud tests were carried out in two scenarios only: direct read and
dedicated cache, as there were technical problems registering a site with a local cache in
the ATLAS information system (AGIS). The tests themselves have also changed
since 2019, in particular, the template for test jobs was changed from HITS
(digitization and reconstruction) to Derivation (AOD and DAOD) which is more I/O-intensive
and have a larger input file size per event than with HITS. Figures 10 and 11 show the
results of comparative tests using HammerCloud copy2scratch template (the input file
is entirely downloaded to the working node before execution) for PNPI and MEPhI,
respectively.</p>
      <p>It can be seen that in all cases the gain from using the cache is obvious, which is
expected, since in these tests there was no limit on the number of reads of a single
file. Also, the gain in download input files time accurately matches the overall gain in
the total time of the task execution.</p>
      <p>Fig. 7. Synthetic tests at PNPI. Direct access on the left, local cache on the right.</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions and future work</title>
      <p>We have successfully passed “a pilot project phase” (PoC). During PoC, we have
configured and tested two types of xCache setup: dedicated cache and local cache.
We have shown performance benefits of using xCache on smaller sites using
synthetic and real-life ATLAS analysis workloads. Together with the WLCG community, we
need to address the Data Lake challenge in a global context. The DOMA ACCESS
initiative is the first step in this direction. We will work closely with DOMA and
ATLAS to define the next steps, in particular we will be interested to test our setup
for other HL-LHC R&amp;Ds, such as Data Carousel, QOS and hot/cold storage, etc.</p>
      <p>As a result of this work, we have observed an apparent benefit of a dedicated cache
even for a limited number of requests to a single file, while for the local cache the
benefit is severely doubtful. A dedicated cache, on the other hand, implies some
additional operational and hardware costs that might not be justified by the expected
performance benefits. The idea of a distributed cache on local nodes (case 3 on Fig. 2),
which the authors see as very productive, still needs to be understood. Our near-term
plans will include implementation and further evaluation of this idea.
7</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Data</given-names>
            <surname>Organization</surname>
          </string-name>
          ,
          <article-title>Management and Access (DOMA)</article-title>
          . https://iris-hep.org/doma.html,
          <source>last accessed</source>
          <year>2020</year>
          /06/25.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>ATLAS</given-names>
            <surname>HEP-Google</surname>
          </string-name>
          <string-name>
            <surname>R</surname>
          </string-name>
          &amp;
          <article-title>D project</article-title>
          .
          <source>Technical Interchange Meeting</source>
          . https://indico.cern.ch/event/921179/contributions/3870250/subcontributions/307490/attach ments/2042093/3420405/RnD_HEPGCP.pdf,
          <source>last accessed</source>
          <year>2020</year>
          /06/25.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>A.</given-names>
            <surname>Kiryanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Klimentov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zarochentsev</surname>
          </string-name>
          et al,
          <source>ATLAS Data Carousel Project. HEPiX Autumn Workshop</source>
          ,
          <fpage>14</fpage>
          -
          <lpage>19</lpage>
          Oct.
          <year>2019</year>
          , Amsterdam, Nederlands.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>A.</given-names>
            <surname>Kiryanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Klimentov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Krasnopevtsev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Ryabinkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zarochentsev</surname>
          </string-name>
          ,
          <article-title>Federated data storage system prototype for LHC experiments and data intensive science</article-title>
          ,
          <source>CEUR Workshop Proceedings</source>
          , v.
          <volume>1787</volume>
          , pp.
          <fpage>40</fpage>
          -
          <lpage>47</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>J.</given-names>
            <surname>Schovancova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Di</given-names>
            <surname>Girolamo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fkiaras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Mancinelli</surname>
          </string-name>
          ,
          <article-title>Evolution of HammerCloud to commission CERN Compute resources</article-title>
          , to appear
          <source>in proceedings of the 23rd International Conference on Computing in High Energy and Nuclear Physics</source>
          , Sofia,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>X.</given-names>
            <surname>Espinal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kiryanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Klimentov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schovancova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zarochentsev</surname>
          </string-name>
          ,
          <article-title>Federated data storage evolution in HENP: data lakes and beyond</article-title>
          ,
          <source>ACAT</source>
          ,
          <fpage>10</fpage>
          -
          <lpage>15</lpage>
          Mar.
          <year>2019</year>
          ,
          <string-name>
            <given-names>Saas</given-names>
            <surname>Fee</surname>
          </string-name>
          , Switzerland.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>