<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>MODERN SQL AND NOSQL DATABASE TECHNOLOGIES FOR THE ATLAS EXPERIMENT</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>D. Barberis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>on behalf of the ATLAS Collaboration</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>E-mail:</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dario.Barberis@ge.infn.it</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Physics Department of the University of Genova and INFN Sezione di Genova</institution>
          ,
          <addr-line>Via Dodecaneso 33, 16146 Genova</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <fpage>15</fpage>
      <lpage>22</lpage>
      <abstract>
        <p>Structured data storage technologies evolve very rapidly in the IT world. LHC experiments, and ATLAS in particular, try to select and use these technologies balancing the performance for a given set of use cases with the availability, ease of use and of getting support, and stability of the product. We definitely and definitively moved from the “one fits all” (or “all has to fit into one”) paradigm to choosing the best solution for each group of data and for the applications that use these data. This paper describes the solutions in use, or under study, for the ATLAS experiment and their selection process and performance measurements.</p>
      </abstract>
      <kwd-group>
        <kwd>Scientific Computing</kwd>
        <kwd>Databases</kwd>
        <kwd>BigData</kwd>
        <kwd>Hadoop</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        When software developments started for ATLAS [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and all Large Hadron Collider (LHC)
experiments about 20 years ago, the generic word "database" practically referred only to relational
databases, with only a few exceptions. There were very few options to store largish amounts of
structured data: Oracle [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] was fully supported by CERN-IT including license costs, MySQL [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] was
in its early stages, not scaling yet to the expected data volumes and rates but promising rather well, or
one could build a new in-house system. So the choice was clear: fit everything into Oracle because of
the CERN system-level support, and develop the ATLAS applications to make use of Oracle's tools
for performance optimization. ATLAS hired two expert Oracle application developers who obviously
helped a lot with application development and optimization.
      </p>
      <p>
        Having only one underlying technology helped to provide a robust and performant central
database service, managed jointly by CERN at the system level and ATLAS at the application level.
Many time-critical applications are now hosted by the CERN Oracle infrastructure:
 The conditions database (COOL) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
 AMI (ATLAS Metadata Interface) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and COMA (Conditions Metadata) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
 ProdSys/PanDA (distributed production system) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
 Rucio (distributed data management system) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
 AGIS (Grid information system) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]
 Glance (membership, authorship, speakers etc.) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
All these applications grew in size and complexity with time and are working quite well for the
Collaboration’s current usage; Oracle can be very fast if database schemas and queries are well
designed and optimized.
      </p>
      <p>
        On the other hand having only one underlying technology forced some applications that have
no need of relational information into fixed schemas that may be not completely optimal; for
example time-series measurements produced by DCS (Detector Control System) can be more simply
represented by time-value pairs, and their data have to be compressed before storing in Oracle
because of their huge sizes. In addition Oracle schemas have to be carefully designed upfront and are
then hard to extend or modify, and data access to Oracle databases from Grid jobs was less than
obvious and an interface system (Frontier [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]) had to be adopted to allow concurrent running of over
300k jobs. So when data analytics tools started appearing on the Open Source market that can deal
with huge amounts of less structured data, ATLAS groups started evaluating them for their needs.
      </p>
      <p>
        Towards the end of LHC Run1 in 2012 and during the shutdown period in 2013-2014 a
number of new structured data storage solutions ("NoSQL Databases") were tested as back-end
support systems for new applications, including Hadoop [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and the many associated tools and data
formats, Cassandra [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], MongoDB [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], etc. They are mostly key-value pair or column-oriented
storage systems.
Evolution Group" (DB TEG), which recommended that CERN deploy and support a Hadoop cluster
for new applications, with all associated tools [15]. In fact several Hadoop clusters were set up over
the years to avoid destructive interference between different applications, while both system
managers and application developers were learning the best practices for application design and
optimization. Figure shows the many tools provided currently by CERN-IT in the Hadoop
ecosystem.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Database usage by ATLAS in LHC Run2</title>
      <p>The database systems are used to support ATLAS data processing and analysis, as well as all
other collaboration activities. One can identify three major groups of data:
 Conditions data. They are all non-event data that are useful to reconstruct events, such as
detector hardware conditions (temperatures, currents, voltages, gas pressures and
mixtures, etc), detector read-out conditions, detector calibrations and alignments, and
physics calibrations. All conditions data have associated intervals of validity and (for
derived data) versions. The COOL database is used for all conditions data.
 Physics metadata. This is information about datasets and data samples, provenance
chains of processed data with links to production task configurations, cross-sections and
configurations used for simulations, trigger and luminosity information for real and
simulated data.
 Distributed computing data management and processing book-keeping. The distributed
production/analysis and data management systems produce and need to store a wealth of
metadata about the data that are processed and stored:
o Rucio (Distributed Data Management) has a dataset contents catalogue (list
of files, total size, ownership, provenance, lifetime, status etc.) a file
catalogue (size, checksum, number of events), a dataset location catalogue
(list of replicas for each dataset) and keeps information on the activities of
data transfer tools, deletion tools and on storage resource status etc.
o ProdSys/JEDI/PanDA (Distributed Workload Management) store lists of
requested tasks and their input and output datasets, software versions, lists of
jobs with status, running locations, lists of processing resources with their
status etc.</p>
      <p>Both systems use a combination of quasi-static and rapidly changing information, as
ATLAS runs over 1 million jobs/day using on average almost 300k job slots and moves
600 TB/day around the world. Oracle supports very well both systems if the tables and
the load don't grow indefinitely; “old" information is automatically copied to an archive
Oracle database and removed from the primary one.</p>
      <sec id="sec-2-1">
        <title>2.1 Oracle storage</title>
        <p>All this information is stored in three main Oracle RACs (Real Application Clusters),
respectively for ATLAS online, offline, and distributed computing applications, plus an archive
database, all with active stand-by replicas and back-ups. Selected users and processes have write
access; all users have read access. Read access normally goes through front-end web services as
direct access to Oracle from many processes could overload the servers: Frontier for access to
conditions data from production and analysis jobs, the AMI and COMA front-end servers for access
to metadata, and DDM and PanDA servers for access to dataset and production/analysis task
information. Figure shows a sketch of the Oracle RACs and the data flow between them, including
the replication to the active stand-by instances and the distribution of conditions data to IN2P3-CC in
Lyon (France), RAL in Oxfordshire (United Kingdom) and TRIUMF in Vancouver (Canada).</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2 NoSQL storage</title>
        <p>The main distributed computing applications (Rucio and ProdSys/PanDA) have a very high
transaction rate and the Oracle database is very efficient in dealing with this large information flow.
Applications such as monitoring and accounting, that only read from the database, are instead better
suited for different storage systems, with needed data extracted from Oracle and formatted
appropriately for the expected queries. Tasks to extract the relevant information from Oracle and
store it in Hadoop run continuously and provide input to several other tools, including dataset
popularity, task monitoring and data management accounting.</p>
        <p>
          ElasticSearch [
          <xref ref-type="bibr" rid="ref15">16</xref>
          ] became popular in the last couple of years as a "quick" way to search
information, and it is now used by several distributed computing analytics applications. The
ElasticSearch storage needs filling with data extracted from logfiles or databases, and then interactive
tools can be used to generate plots that are displayed with Kibana [
          <xref ref-type="bibr" rid="ref16">17</xref>
          ]. It is very useful for
monitoring and to find out what is going on in case of unexpected failures, correlating information
from different sources; for example, if a Frontier server becomes unresponsive, we can look up
which jobs or tasks caused that, where they ran (or are running) and correlate it with the PanDA
status of that site. As the ElasticSearch performance gets degraded if the amount of accumulated data
becomes large and the hardware is not sufficient for the data size and the tasks to be performed,
careful provisioning is needed (like for any other computing system!).
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3 The first ATLAS NoSQL tool: EventIndex</title>
        <p>
          The ATLAS EventIndex [
          <xref ref-type="bibr" rid="ref17">18</xref>
          ] is the first application that was entirely developed having in
mind the usage of modern structured storage systems as back-end instead of a traditional relational
database. The design started in late 2012 and the system was in production at the start of LHC Run2
in Spring 2015. The EventIndex is a system designed to be a complete catalogue of ATLAS events,
with all real and simulated data. Its main use cases are event picking (give me this event in that
format and processing version), counting and selecting events based on trigger decisions, production
completeness and consistency checks (data corruption, missing and/or duplicated events) and trigger
chain and derivation overlap counting. It contains event identifiers (run and event numbers, trigger
stream, luminosity block, bunch crossing number), trigger decisions and references (GUID plus
internal pointer) to the events at each processing stage in all permanent files generated by central
productions.
        </p>
        <p>
          The EventIndex has a partitioned architecture, following the data flow, sketched in Figure .
The Data Production component extracts event metadata from files produced at Tier-0 or on the Grid,
the Data Collection system [
          <xref ref-type="bibr" rid="ref18">19</xref>
          ] transfers EventIndex information from jobs to the central servers at
CERN, the Data Storage units provide permanent storage for EventIndex data and fast access for the
most common queries, plus finite-time response for complex queries. The full information is stored
in Hadoop in MapFile format [
          <xref ref-type="bibr" rid="ref19">20</xref>
          ], with an internal catalogue in HBase [
          <xref ref-type="bibr" rid="ref20">21</xref>
          ] and also a copy to
HBase for event look-up; reduced information (only real data, no trigger) is copied to Oracle for
faster queries [
          <xref ref-type="bibr" rid="ref21">22</xref>
          ]. A monitoring system keeps track of the health of servers and the data flow [
          <xref ref-type="bibr" rid="ref22">23</xref>
          ].
        </p>
        <p>At the time of writing the Hadoop system stores 120 TB of real data and 36 TB of simulated
data, plus 154 TB of other data (input and transient data and archive). In Oracle we have over 100
billion event records, stored in a table of 2.2 TB with 2 TB of index space.</p>
        <p>
          An active R&amp;D programme to explore different, and possibly better performing, data store
formats in Hadoop was started in 2016. The "Pure HBase" approach (database organized in columns
of key-value pairs) was one of the original options in 2013, but did not work in 2015 because of the
then poor performance of the CERN lxhadoop cluster (problem solved at the end of 2015); it is more
promising now as it shows good performance for event picking. The Avro [
          <xref ref-type="bibr" rid="ref23">24</xref>
          ] and Parquet [
          <xref ref-type="bibr" rid="ref24">25</xref>
          ] data
formats have been explored, with tests on full 2015 real data, and look promising (for different
reasons). Kudu [
          <xref ref-type="bibr" rid="ref25">26</xref>
          ] is a new technology in the Hadoop ecosystem, implementing a new
columnoriented storage layer that complements HDFS and HBase. It appears to be more flexible to address a
wider variety of use cases, in particular as it is addressable also through SQL queries, placing it
midway between Oracle and the NoSQL world; tests are continuing this year in view of a possible
use in production in 2018 [
          <xref ref-type="bibr" rid="ref26">27</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Evolution of Databases for Run3</title>
      <p>The continued usage of Oracle is fine for the time being but we were warned by CERN that
the license conditions may change in the future, so some kind of diversification may be needed.
Some types of data and metadata fit naturally into the relational database model, but other data much
less, for example the large amounts of useful but static data on DDM datasets for accounting, or
information on completed PanDA production and analysis tasks, event metadata and so on.</p>
      <p>As long as access to the data is done through an interface server, the user won't actually see
the underlying storage technology. In this way it is possible to keep only the "live" data in Oracle and
move the rest to different technologies. This also means that at some point in the future we could
change technology for the SQL database without too much trouble.</p>
      <sec id="sec-3-1">
        <title>3.1 A new Conditions Data Service for Run3</title>
        <p>
          CREST [
          <xref ref-type="bibr" rid="ref27">28</xref>
          ] is a new architecture for conditions data services for HEP experiments,
developed initially by CMS and ATLAS, and now considered by a number of other experiments. It is
based on the relational schema simplification introduced by CMS for Run2, with data identified by
type, interval of validity and version, and payload data in BLOBs. It will contain in its schema only
data used for event processing (no dump of raw information). The functions are partitioned: the
relational database is used only for payload data identification, but the payload can be anywhere,
including files in CVMFS [
          <xref ref-type="bibr" rid="ref28">29</xref>
          ]. A web server (with an internal cache) is used for interactions with the
relational database and data input, search and retrieval, and Frontier servers and Squids provide
access from Grid jobs and local caches. Figure shows a scheme of the component architecture and
data access paths.
        </p>
        <p>The CREST system for ATLAS is under active development and will be in production for
the start of LHC Run3. By that time all existing conditions for Run1 and Run2 will have to be
transferred to the new system, to allow processing and analysis of all ATLAS data with the most
recent software suites.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2 Time series databases</title>
        <p>Time series are for example streams of DCS (Detector Control System) data, where for each
data type a raw data record consists of a time stamp and one or a few values. This information is
currently stored in Oracle using COOL, after averaging over short time periods, or storing new
values only when sufficiently different from previous ones. Data sizes can become enormous
compared to other data types, so much that direct use of this information in reconstruction jobs is not
a good idea; it is much better to store this information in a system that is designed for time series and
has useful tools for averaging over predefined time intervals, threshold detection, and an integrated
display of the values as a function of time.</p>
        <p>
          CERN-IT decided to use InfluxDB [
          <xref ref-type="bibr" rid="ref29">30</xref>
          ] coupled with Grafana [
          <xref ref-type="bibr" rid="ref30">31</xref>
          ] initially for their internal
system monitoring and then also for the monitoring of WLCG site status and experiment distributed
computing tools. As they seem happy with it, ATLAS started evaluations in the online and offline
context, including displaying the time series with Grafana. An example of data extracted from the
PanDA database in Oracle, stored as time series in InfluxDB and displayed with Grafana is shown in
Figure .
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3 Metadata for Run3 and beyond</title>
        <p>A new effort to revise and harmonize metadata information and its storage and retrieval tools
started this year in ATLAS: the DCC (Data Characterization and Curation) project. It has three
complementary approaches, respectively on the overall architecture, a top-down approach for dataset
related metadata and dataset discovery based on the Data Knowledge Base, and a bottom-up
approach to event metadata based on the concepts of the Event WhiteBoard and Virtual Datasets.</p>
        <p>The Event WhiteBoard (EWB) project has been launched recently. It is an evolution of the
EventIndex concept, but with an event-oriented architecture, whereas the EventIndex has a
datasetoriented internal storage organization. It will have one and only one logical record per event,
containing event identification and immutable information (trigger, luminosity block etc.), but then
for each processing step involving each event it will have the link to the algorithm producing it
(processing task configuration), pointers to outputs and flags for offline selections (derivations). An
important new feature of the EWB is the possibility for automatic processes and single users to
annotate single event records, adding key-value pairs (or similar formats) that can then be interpreted
and used for automatic selections or other actions. Figure shows the EWB component architecture
and the flow of data into and out of it.</p>
        <p>The problem is not intrinsically difficult but the EWB will have to support 10 billion new
real and 35 billion simulated events per year for Run2, a factor of 3 more for Run3 and another factor
of 3 more for Run4. Work on the technology selection is starting now, with the aim to have a
prototype working at the Run2 scale by the end of 2018 that will be promising to scale as required for
Run3, and the new EWB in operation during 2020.</p>
        <p>Virtual Datasets (VDS) are not a new idea but with the new EWB technology it should be
possible to implement them. A VDS is a list of events that satisfy a number of conditions as
contained in the EWB. For example, the derivation step after reconstruction now writes out O(100)
streams with selected events; even if the events are “slimmed”, the amount of required disk space is
large. With VDSs, it will be enough to flag the selected events in the EWB, saving lots of storage,
and user analysis jobs will then read only those events.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusions</title>
      <p>ATLAS is always following technology developments in the database and structured data
storage fields. The lifetime of ATLAS computing tools and infrastructure is much longer than the
active lifetime of many open source products, and this fact poses very strong constraints on product
selection. In any case we need to continue the R&amp;D programs to make the best use of new upcoming
computing technologies, without neglecting ongoing operations of course.</p>
      <p>Continued collaboration with CERN-IT is essential for providing well-performing and robust
services to the Collaboration.</p>
      <p>The tool that is invisible to most users is the one that works without problems all the time!</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>ATLAS</given-names>
            <surname>Collaboration 2008 The ATLAS</surname>
          </string-name>
          <article-title>Experiment at the CERN Large Hadron Collider, JINST 3 S08003 doi</article-title>
          :10.1088/
          <fpage>1748</fpage>
          -0221/3/08/S08003
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>[2] Oracle: https://www.oracle.com</mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>[3] MySQL: https://www.mysql.com</mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Valassi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          et al.
          <source>2008 COOL</source>
          ,
          <article-title>LCG Conditions Database for the LHC Experiments: Development and Deployment Status, CERN-</article-title>
          <string-name>
            <surname>IT-Note-</surname>
          </string-name>
          2008
          <source>-019 and NSS 2008 Proceedings of the Medical Imaging Conference</source>
          , Dresden, Germany
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Albrand</surname>
            <given-names>S. 2010</given-names>
          </string-name>
          <article-title>The ATLAS metadata interface</article-title>
          ,
          <source>J. Phys. Conf. Ser. 219 042030</source>
          , doi:10.1088/
          <fpage>1742</fpage>
          - 6596/219/4/042030
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Gallas</surname>
            <given-names>E J</given-names>
          </string-name>
          et al
          <year>2014</year>
          <article-title>Utility of collecting metadata to manage a large scale conditions database in ATLAS, J</article-title>
          .
          <source>Phys.: Conf. Ser. 513 042020</source>
          , doi:10.1088/
          <fpage>1742</fpage>
          -6596/513/4/042020
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Maeno</surname>
            <given-names>T</given-names>
          </string-name>
          et al.
          <year>2014</year>
          <article-title>Evolution of the ATLAS PanDA workload management system for exascale computational science</article-title>
          ,
          <source>J. Phys. Conf. Ser. 513 032062 doi:10</source>
          .1088/
          <fpage>1742</fpage>
          -6596/513/3/032062
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Garonne</surname>
            <given-names>V</given-names>
          </string-name>
          et al.
          <year>2014</year>
          <article-title>Rucio - The next generation of large scale distributed system for ATLAS Data Management</article-title>
          ,
          <source>J. Phys. Conf. Ser. 513 042021 doi:10</source>
          .1088/
          <fpage>1742</fpage>
          -6596/513/4/042021
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Anisenkov</surname>
            <given-names>A.</given-names>
          </string-name>
          et al.
          <source>2011 ATLAS Grid Information System, J. Phys.: Conf. Ser. 331 072002</source>
          , doi:10.1088/
          <fpage>1742</fpage>
          -6596/331/7/072002
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Grael F F et al</surname>
          </string-name>
          .
          <year>2011</year>
          <article-title>Glance Information System for ATLAS Management</article-title>
          ,
          <source>J. Phys.: Conf. Ser. 331 082004</source>
          , doi:10.1088/
          <fpage>1742</fpage>
          -6596/331/8/082004
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Barberis</surname>
            <given-names>D</given-names>
          </string-name>
          et al
          <year>2012</year>
          <article-title>Evolution of grid-wide access to database resident information in ATLAS using Frontier</article-title>
          ,
          <source>J. Phys.: Conf. Ser. 396 052025</source>
          , doi:10.1088/
          <fpage>1742</fpage>
          -6596/396/5/052025
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>[12] Hadoop and associated tools: http://hadoop.apache.org</mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>[13] Cassandra: http://cassandra.apache.org</mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>[14] MongoDB: https://www.mongodb.com</mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>[16] ElasticSearch: https://www.elastic.co</mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>[17] Kibana: https://www.elastic.co/products/kibana</mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Barberis</surname>
            <given-names>D</given-names>
          </string-name>
          et al
          <year>2015</year>
          <article-title>The ATLAS EventIndex: architecture, design choices, deployment and first operation experience</article-title>
          ,
          <source>J. Phys.: Conf. Ser. 664 042003</source>
          , doi:10.1088/
          <fpage>1742</fpage>
          -6596/664/4/042003
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Sánchez</surname>
            <given-names>J</given-names>
          </string-name>
          et al
          <year>2015</year>
          <article-title>Distributed Data Collection for the ATLAS EventIndex</article-title>
          ,
          <source>J. Phys.: Conf. Ser. 664 042046</source>
          , doi:10.1088/
          <fpage>1742</fpage>
          -6596/664/4/042046
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Favareto</surname>
            <given-names>A</given-names>
          </string-name>
          et al.
          <year>2016</year>
          <article-title>Use of the Hadoop structured storage tools for the ATLAS EventIndex event catalogue</article-title>
          ,
          <source>Phys. Part. Nuclei Lett</source>
          .
          <volume>13</volume>
          : 621, https://doi.org/10.1134/S1547477116050198
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>[21] HBase: https://hbase.apache.org</mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Gallas</surname>
            <given-names>E J</given-names>
          </string-name>
          et al.
          <year>2017</year>
          <article-title>An Oracle-based Event Index for ATLAS</article-title>
          , http://cds.cern.ch/record/2252389/files/ATLAS-COM
          <string-name>
            <surname>-CONF-</surname>
          </string-name>
          2017-011.pdf, to be published in
          <source>Proceedings of CHEP</source>
          <year>2016</year>
          , San Francisco, USA,
          <year>October 2016</year>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Barberis</surname>
            <given-names>D</given-names>
          </string-name>
          et al.
          <year>2016</year>
          <article-title>ATLAS Eventlndex monitoring system using the Kibana analytics and visualization platform</article-title>
          ,
          <source>J. Phys.: Conf. Ser. 762 012004</source>
          , doi:10.1088/
          <fpage>1742</fpage>
          -6596/762/1/012004
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>[24] Avro: https://avro.apache.org</mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>[25] Parquet: http://parquet.apache.org</mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>[26] Kudu: http://kudu.apache.org</mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [27]
          <string-name>
            <surname>Baranowski</surname>
            <given-names>Z</given-names>
          </string-name>
          et al.
          <year>2017</year>
          <article-title>A study of data representation in Hadoop to optimise data storage and search performance for the ATLAS EventIndex</article-title>
          , http://cds.cern.ch/record/2244442/files/ATL-SOFTPROC-2017-043.pdf, to be published in
          <source>Proceedings of CHEP</source>
          <year>2016</year>
          , San Francisco, USA,
          <year>October 2016</year>
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [28]
          <string-name>
            <surname>Barberis</surname>
            <given-names>D</given-names>
          </string-name>
          et al.
          <year>2015</year>
          <article-title>Designing a future Conditions Database based on LHC experience</article-title>
          ,
          <source>J. Phys.: Conf. Ser. 664 042015</source>
          , doi:10.1088/
          <fpage>1742</fpage>
          -6596/664/4/042015
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>[29] CVMFS: https://cernvm.cern.ch/portal/filesystem</mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>[30] InfluxDB: https://www.influxdata.com</mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>[31] Grafana: https://grafana.com</mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>