<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An Analysis Framework for KCDC</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>nk Polg</string-name>
          <email>frank.polgart@kit.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Donghw</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Doris Wo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jurg</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Karlsruhe Institute of Technology, Institute for Nuclear Physics</institution>
          ,
          <addr-line>76021 Karlsruhe</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We will introduce a data analysis extension for the KASCADE Cosmic-ray Data Center (KCDC), based on the Jupyterhub/notebook ecosystem. A user-friendly interface, easy access to data from KCDC, and modern analysis software are of special interest. This contribution will discuss the service architecture, followed by a brief usage example. The KASCADE Cosmic-ray Data Center (KCDC) [2] is a public data shop developed at the Institute for Nuclear Physics at the Karlsruhe Institute for Technology. KCDC provides access to reconstructed cosmic-ray data taken by the KASCADE [1] experiment. We identi ed improvements and extension that will bene t our users: { The data-sets can be very large, depending on the cut and selection criteria, and personal disk-space and bandwidth to download the desired data-set quickly becomes a bottleneck. { Analyses usually rely on specialised frameworks and toolkits, some of which are very cumbersome to nd, install, use and/or maintain. One obvious solution to this is, instead of bringing data to the researcher, we bring the researcher to the data.</p>
      </abstract>
      <kwd-group>
        <kwd>Astroparticle Physics • Open Data • Data Analysis • Jupyter</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Motivation</title>
      <p>2.1
In order to
quirements.</p>
    </sec>
    <sec id="sec-2">
      <title>Design</title>
      <p>Requirements</p>
      <p>nd the best possible technology, we formulated the following
re{ Accessibility: we want people to actually use it, not yet another prestige
project.
{ Usability: the environment should provide all the of the tools necessary to
conduct an analysis of KASCADE data.
{ Administration: the entire software stack should be easy to install, maintain
and update.</p>
      <p>{ Don't reinvent the wheel, most of the work has already been done.
2.2</p>
      <sec id="sec-2-1">
        <title>Solution</title>
        <p>
          Accessibility We decided to use the Jupyterhub / Jupyter Notebook [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]
ecosystem for our analysis framework, as it already is very common analysis tool
throughout all industries and research branches. Almost anyone how has done
some data analysis in the last couple of year has come across or even used jupyter
notebooks.
        </p>
        <p>
          Useability Python is the current go-to language for data analysis and there's
a plethora of third-party analysis packages. We also have to provide ROOT [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ],
as many example analyses still rely on it and it supplies the standard le format
in particle physics.
        </p>
        <p>
          Administration Notebooks allow arbitrary code execution by design, so it is
sensible to isolate the notebook server of each user from one another and from
the operating system. It shouldn't come as a surprise, that docker [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] was the
best choice for containerization, scalability, resource management, ease-of-use,
etc.
        </p>
        <p>Though there is some custom code needed to attach Jupyter to KCDC, the
majority of components are o -the-shelf industry standards. It is unlikely that
we have to heavily invest into in-house development to keep these running or
even extend and improve over time.
2.3</p>
      </sec>
      <sec id="sec-2-2">
        <title>Architecture</title>
        <p>Jupyter notebooks are a special kind of le format for cell-based, interactive
programming. The standard user interface is implemented as a web service
(notebook server); the important features in context of this document are the
le-system-like contents manager and the notebook text-editor/execution
environment ("kernel" in ipython jargon).</p>
        <p>The notebook servers are docker containers created on demand with
exclusive access by a single user each. They are proxied by another container
running the jupyterhub software; the jupyterhub handles authentication and
creation/deletion of notebook servers.</p>
        <p>Choosing docker as virtualisation technology has many bene ts, rst and
foremost process isolation, which is a good idea when your service requires
arbitray code execution. Realizing the service as a docker stack, we not only gain
easy scalability for many users across multiple physical host machines, but also
a text-based infrastructure description, which makes deployment and version
control trivial.</p>
        <p>For simplicity, users can use their KCDC credentials for authentication.</p>
        <p>The contents manager has been extended for transparent access to the KCDC
download area, and it is comparatively simple to add even more data shops,
depending of course on the interfaces exposed by a data shop.</p>
        <p>The current default notebook is con gured with kernels for both Python and
C++ (via the ROOT interpreter).</p>
        <p>
          A graphical representation of the proposed analysis framework with KCDC
integration can be found at Fig. 1.
This section is generated from the ipython notebook presentation at DLC 2020 [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ],
where it was used to showcase the capabilities of the analysis framework, and it is
used now to demonstrate how easy it is to convert your analysis or presentation
to a paper.
        </p>
        <p>We also showed the importing of data from KCDC to the analysis
framework in the original presentation, and although it is a key design aspect of the
described framework, the ease-of-use of this feature doesn't lend itself to an
enlightning description on paper, short of an uninspired screenshot or the mention
of "There's a button for that".</p>
        <p>Successfull acquisition of data is therefore presupposed in the following
analysis example.</p>
        <p>
          We are going to apply the KCDC example analysis to one of the preselected
data-sets, for reasons of recreatability. After importing both the data le and
analysis script, create a new notebook in the same directory.
[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]: import os
from zipfile import ZipFile
ZipFile(’KASCADE_SmallDataSample_wA_runs_0877-7417_ROOT.zip’).
        </p>
        <p>
          ,!extractall()
os.listdir()
[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]: [ .ipynb checkpoints ,
        </p>
        <p>KCDC analyze example.C ,
slides.ipynb ,
KASCADE SmallDataSample wA runs 0877-7417 ROOT.zip ,
info.txt ,
events.root ,
EULA.pdf ]</p>
        <p>
          Along with preselected data-sets, KCDC also provides a basic example
analysis in C++ + ROOT. In order to run the KCDC example analysis, switch the
notebook kernel to C++ and enter:
[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]: .L KCDC_analyze_example.C
[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]: run()
[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]: Input file:events.root
        </p>
        <p>KCDC-Entries read from files: 1080295
KCDCM-Entries: 1080295
Array Entries: 986577
Calor Entries: 250981
Grande Entries: 88259
General Entries: 1080295
KCDCN-Entries to be evaluated: 1080295
processing event No: 0 of 1080295
processing event No: 100000 of 1080295
processing event No: 200000 of 1080295
processing event No: 300000 of 1080295
processing event No: 400000 of 1080295
processing event No: 500000 of 1080295
processing event No: 600000 of 1080295
processing event No: 700000 of 1080295
processing event No: 800000 of 1080295
processing event No: 900000 of 1080295
processing event No: 1000000 of 1080295
Entries survived:: 1080295 out of 1080295
general id &gt;0 :: 1080295
array id &gt;0 :: 986577
calorimter id &gt;0:: 250981
grande id &gt;0 :: 88259
(int) 0</p>
        <p>
          Switch back to Python, and use PyROOT to open and display the result.
[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]: import ROOT
f = ROOT.TFile(’KCDC_Test.root’)
keys = [_.GetName() for _ in f.GetListOfKeys()]
c = ROOT.TCanvas("foo", "bar", 1920, 1080*len(keys)//4)
c.Divide(2,len(keys)//2)
c.SetLogy()
pad = 0
logspectra = [’h6202’, ’h6302’, ’h7202’]
for key in keys:
pad+=1
c.cd(pad)
if key in logspectra:
        </p>
        <p>
          ROOT.gPad.SetLogy()
f.Get(key).Draw()
c.Draw()
[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]: Welcome to JupyROOT 6.20/04
        </p>
        <p>The result of this example analysis shows, some statistics of the data set (ex.
events over run number), some spectra (ex. energy deposit -Detectors), among
other interesting things.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Outlook</title>
      <p>We consider the integration of our data shop and analysis framework as a
successfull technology demonstration. Going public and making it available for all
KCDC users is the current goal, and we also plan to extend the functionality; it
is comparatively simple to add more o -site data shops, there's the possibility
to have multi-core and GPU support, and we need to evaluate more third-party
analysis packages and notebook extensions. We also anticipate that user
feedback is going to be another driving force behind long-term improvement of this
service.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>T.</given-names>
            <surname>Antoni</surname>
          </string-name>
          et al; The
          <string-name>
            <surname>Cosmic-Ray Experiment</surname>
            <given-names>KASCADE</given-names>
          </string-name>
          ; Nucl.Instr. and
          <string-name>
            <surname>Meth A513</surname>
          </string-name>
          (
          <year>2003</year>
          )
          <fpage>490</fpage>
          -
          <lpage>510</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>A.</given-names>
            <surname>Haungs</surname>
          </string-name>
          et al;
          <article-title>The KASCADE Cosmic-ray Data Centre KCDC: Granting Open Access to Astroparticle Physics Research Data; Eur</article-title>
          .
          <string-name>
            <surname>Phys. J. C</surname>
          </string-name>
          (
          <year>2018</year>
          )
          <volume>78</volume>
          :741; https://doi.org/10.1140/epjc/s10052-018-6221-2
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Kluyver</surname>
          </string-name>
          and
          <article-title>Benjamin Ragan-Kelley and Fernando Perez and Brian E. Granger and Matthias Bussonnier and Jonathan Frederic and Kyle Kelley and Jessica B</article-title>
          .
          <article-title>Hamrick and Jason Grout and Sylvain Corlay and Paul Ivanov and Damian Avila and Sa a Abdalla and Carol Willing and</article-title>
          et al.;
          <article-title>Jupyter Notebooks - a publishing format for reproducible computational work ows</article-title>
          ;
          <source>Kluyver2016JupyterN</source>
          <year>2016</year>
          ;
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Dirk</given-names>
            <surname>Merkel</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Docker: lightweight Linux containers for consistent development and deployment</article-title>
          .
          <source>Linux J</source>
          .
          <year>2014</year>
          ,
          <volume>239</volume>
          ; Article 2 (
          <year>March 2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Rene</given-names>
            <surname>Brun</surname>
          </string-name>
          and
          <source>Fons Rademakers; ROOT - An Object Oriented Data Analysis Framework; Proceedings AIHENP'96 Workshop</source>
          , Lausanne, Sep.
          <year>1996</year>
          , Nucl.
          <source>Inst. &amp; Meth. in Phys. Res. A</source>
          <volume>389</volume>
          (
          <year>1997</year>
          )
          <fpage>81</fpage>
          -
          <lpage>86</lpage>
          . See also http://root.cern.ch/
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>IV</given-names>
            <surname>International</surname>
          </string-name>
          <article-title>Workshop "Data life cycle in physics"</article-title>
          , DLC-2020; https://indico.scc.kit.edu/event/806/
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>