<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The Current Design and Implementation of the AstroDS Data Aggregation Service?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Skobeltsyn Institute of Nuclear Physics</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lomonosov Moscow State University</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Moscow</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Russia nguyendmitri@gmail.com</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Matrosov Institute for System Dynamics and Control Theory of Siberian Branch of Russian Academy of Sciences Irkutsk</institution>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>AstroDS is a distributed storage for Cosmic Ray Astrophysics. The primary goal of Astro DS is to gather data measured by the instruments of various physical experiments such as TAIGA, TUNKA, KASCADE into global storage and provide the users with a standardized user-friendly interface to search for the datasets that match certain conditions. AstroDS consists of a set of distributed microservices components that communicate with each other through the Internet via REST API. The core component of AstroDS is the Data Aggregation Service that orchestrates other components to provide access to data. The development process of AstroDS started in 2019. This paper describes the current design and implementation of the Data Aggregation Service and also the bene ts it brings to the astrophysical community in the early state.</p>
      </abstract>
      <kwd-group>
        <kwd>Distributed storage • Data warehouse • Data acquisition • Astroparticle physics</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Russian{German Astroparticle Data Life Cycle Initiative [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is an
international project whose aim is to develop an open science system called
ASTROPARTICLE.ONLINE that enables users to publish, store, search, select,
and analyze astroparticle data that are coming from various experiments
located worldwide. One of the main parts of the system is the Astroparticle Physics
Distributed Data Storage [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] (AstroDS) that gathers data measured by the
instruments of physical experiments such as TAIGA [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], KASCADE-Grande [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
into global storage and allows users to search for a speci c dataset and retrieve
it via a standardized storage-independent API.
? Supported by the Russian Science Foundation, grant #18-41-06003.
      </p>
      <p>Copyright ' 2020 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).</p>
      <p>
        AstroDS consists of a set of distributed microservices that interacts with
each other to provide a smooth experience for users in data acquisition. One of
the core services of AstroDS is the Data Aggregation Service (the Service) that
coordinates others to ful ll data queries from users [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. This paper describes the
current design and implementation of the Service.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Design</title>
      <p>Input Query Web UI</p>
      <p>HTTP
Server</p>
      <p>Cache
Manager</p>
      <p>Core
Controller
Repository
Manager</p>
      <p>Metadata
Catalog
Storage</p>
      <p>Instances</p>
      <p>Data Aggregation Service</p>
      <p>The overall view of the Service's components is presented in Fig. 1. On the
frontend side, through the Web User Interface, the Service receives data lookup
queries from users. The content of a query contains a set of search lters de ned
by a user. For example, the query can contain IACT01, which is the code name
of a facility of the TAIGA experiment, two timestamps identifying the start and
the end of a certain period, the weather condition when the data were collected
at the experiment's site.</p>
      <p>On the backend side, the Web User Interface is served by an HTTP Server.
The HTTP Server serves as a proxy layer that passes the queries to di erent
instances of the Service. Within each instance of the Service, the Core Controller
is responsible for processing queries. Upon receiving a query, the Core Controller
always checks with the Cache Manager if there is a response already generated
for the query. To uniquely identify a query, the Core Controller uses the MD5
algorithm to calculate the hash of the query using its content. A response to a
query is a list of les containing data that matches the lters de ned in the query.
If the response to a query is not cached, the Core Controller will forward the
query to the Metadata Catalog that stores the metadata of all available datasets.
The Metadata Catalog, in turn, makes a search query against its database and
returns a response to the Core Controller. After receiving the response from
the Metadata Catalog, the Core Controller passes it to the Cache Manager for
caching and requests the Repository Manager to generate the les in the response
using data from the Storage Instances each of which is mounted to the machine
where the Service is running. File generation is carried out asynchronously.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Implementation</title>
      <p>
        The Web UI was implemented as a single-page application based on React [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
and Material UI [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] packed into a Docker container. In the Web UI, the user can
create a data query using the search menu. The GraphQL query language [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] was
used to implement the query. The structure of the query re ects the structure
of entities that de ne the data of a physical experiment such as experiment site,
facility name, instrument name, detector, data channel, etc. To compose a query,
the user can de ne a list of lters and corresponding values. Each facility of a
physical experiment has its own set of lters. When the user chooses multiple
facilities, only the common lters available for all facilities are shown,
facilityspeci c lters are hidden.
      </p>
      <p>Since the history of all queries made by the user is stored in the local storage
of the browser, it is possible to review the response to a query. While reviewing
the responses, the user can make a list of les from di erent queries just by
selecting them. The selected les later can be downloaded as a single archive.
An illustration of the Web UI is shown in Fig. 2</p>
      <p>
        The Core Controller was implemented as a microservice based on the Django
framework [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and packed into a Docker container. The whole communication
between the Web UI and the Core Controller is done through a set of REST
API [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] endpoints, each of which is responsible for a type of queries. Data lookup
queries are formed using the GraphQL syntax and sent to the Core Controller as
POST-requests; others are standard REST API requests. One exception is the
le generation queries. Since les are generated asynchronously, when the les
for a query are generated, the Core Controller sends a short text message to the
Web UI via the WebSocket channel [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. The Web UI can also check the current
status of a le generation query using a GET-request on the initialization stage.
      </p>
      <p>All queries are cached by the Cache Manager to reduce the response time.
The cache was implemented as a big hash table where the key is the MD5 hash
of a query, and the value contains the response to the query and other metadata
such as creation time, last used timestamp. Queries are discarded from the cache
by the Least Frequently Used (LFU) policy: the Cache Manager counts how often
a query is used; those that are used least often are discarded rst. Currently, the
expiration time of a query is seven days.</p>
      <p>The Repository Manager was implemented as a separate microservice with its
REST API. After receiving a le generation request from the Core Controller,
the Repository Manager creates an empty CernVM-FS repository, copies the
original les from the storage instances to the newly created repository, and
publishes it. When a repository is ready, the Repository Manager sends a short
text message to the Core Controller via the WebSocket channel.</p>
      <p>
        To implement the communication via WebSocket, the RabbitMQ Message
Broker [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] was used. Text messages are formatted using the STOMP
protocol [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Current Status</title>
      <p>Facilities of the physical experiments mentioned above (TAIGA,
KASCADEGrande) were successfully integrated into the AstroDS system. Using the Web
UI of the Data Aggregation Service, users can already make a complex query
that combines data from di erent facilities of the experiments. The
KASCADEGrande experiment has its own service similar to the Repository Manager, so
components of the Data Aggregation Service work with that service to generate
les. There are ongoing works to make the Data Aggregation Service stable
in production use. In the future, there are plans to integrate more physical
experiments into the AstroDS platforms and implement more lters that give
users a exible way to search for data. There are also plans to conduct outreach
activities to increase the number of users.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Haungs</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bychkov</surname>
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dubenskaya</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fedorov</surname>
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heiss</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kang</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kazarina</surname>
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Korosteleva</surname>
            <given-names>E.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kostunin</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kryukov</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mikhailov</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nguyen M.D.</surname>
          </string-name>
          ,
          <string-name>
            <surname>Polgart</surname>
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polyakov</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Postnikov</surname>
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shigarov</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shipilov</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Streit</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tokareva</surname>
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wochele</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wochele</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhurov</surname>
            <given-names>D.</given-names>
          </string-name>
          : Russian{
          <article-title>German Astroparticle Data Life Cycle Initiative</article-title>
          .
          <source>Data</source>
          <year>2018</year>
          ,
          <volume>3</volume>
          , 56. https://doi.org/10.3390/data3040056
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Kryukov</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nguyen M.-D.</surname>
          </string-name>
          :
          <article-title>A Distributed Storage for Astroparticle Physics</article-title>
          .
          <source>EPJ Web of Conferences</source>
          <year>2019</year>
          , vol.
          <volume>207</volume>
          , 08003. https://doi.org/10.1051/epjconf/201920708003
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Nguyen M.-D.</surname>
          </string-name>
          ,
          <string-name>
            <surname>Kryukov</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dubenskaya</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Korosteleva</surname>
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bychkov</surname>
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mikhailov</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shigarov</surname>
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Data Aggregation in the Astroparticle Physics Distributed Data Storage</article-title>
          .
          <source>CEUR Workshop Proceedings</source>
          <year>2019</year>
          , vol.
          <volume>2406</volume>
          , pp
          <fpage>84</fpage>
          -
          <lpage>89</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bundev N</surname>
          </string-name>
          . et al.:
          <article-title>The TAIGA experiment: From cosmic-ray to gamma-ray astronomy in the Tunka valley</article-title>
          .
          <source>In: Nuclear Instruments and Methods in Physics Research Section A: Accelerators</source>
          , Spectrometers,
          <source>Detectors and Associated Equipment February</source>
          <year>2017</year>
          , vol.
          <volume>845</volume>
          , pp
          <fpage>330</fpage>
          -
          <lpage>333</lpage>
          . https://doi.org/10.1016/j.nima.
          <year>2016</year>
          .
          <volume>06</volume>
          .041
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Apel W.D</surname>
          </string-name>
          . et al.:
          <article-title>The KASCADE-Grande Experiment</article-title>
          .
          <source>In: Nuclear Instruments and Methods in Physics Research Section A 620 April</source>
          <year>2010</year>
          : pp
          <fpage>202</fpage>
          -
          <lpage>216</lpage>
          . https://doi.org/10.1016/j.nima.
          <year>2010</year>
          .
          <volume>03</volume>
          .147
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6. Facebook Open Source: React, https://reactjs.org.
          <source>Last accessed 28 Jun 2020</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Material-UI</surname>
          </string-name>
          , https://material-ui.
          <source>com. Last accessed 28 Jun 2020</source>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8. The GraphQL Foundation: GraphQL, https://graphql.org.
          <source>Last accessed 28 Jun 2020</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9. Django Software Foundation: Django, https://www.djangoproject.com.
          <source>Last accessed 28 Jun 2020</source>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Fielding</surname>
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Th</surname>
          </string-name>
          .:
          <article-title>Architectural Styles and the Design of Networkbased Software Architectures</article-title>
          . https://www.ics.uci.edu/ elding/pubs/dissertation/ elding dissertation.pdf
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. Internet Engineering Task Force:
          <article-title>The WebSocket Protocol</article-title>
          , https://tools.ietf.
          <source>org/html/rfc6455. Last accessed 28 Jun 2020</source>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Pivotal</surname>
            <given-names>Software</given-names>
          </string-name>
          , Inc: RabbitMQ Server Documentation, https://www.rabbitmq.com/admin-guide.
          <source>html. Last accessed 28 Jun 2020</source>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>STOMP</given-names>
            <surname>Protocol</surname>
          </string-name>
          <article-title>Speci cation</article-title>
          , https://stomp.github.io/stomp-speci
          <year>cation1</year>
          .
          <article-title>2.html</article-title>
          .
          <source>Last accessed 28 Jun 2020</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>