Advanced Signal Reconstruction in Tunka-Rex
     with Matched Filtering and Deep Learning

   Pavel Bezyazeekov1 , Nikolai Budnev1 , Oleg Fedorov1 , Oleg Gress1 , Oleg
    Grishin1 , Andreas Haungs2 , Tim Huege2,3 , Yulia Kazarina1 , Matthias
   Kleifges4 , Dmitriy Kostunin5 , Elena Korosteleva6 , Leonid Kuzmichev6 ,
   Vladimir Lenok2 , Nima Lubsandorzhiev6 , Stanislav Malakhov1 , Tatiana
 Marshalkina1 , Roman Monkhoev1 , Elena Osipova6 , Alexander Pakhorukov1 ,
Leonid Pankov1 , Vasily Prosin6 , Frank Gerhard Schröder2,7 , Dmitry Shipilov1 ,
                          and Alexey Zagorodnikov1
                 1
                       Institute of Applied Physics ISU, Irkutsk, Russia
                2
                      Institut für Kernphysik, KIT, Karlsruhe, Germany
  3
     Astrophysical Institute, Vrije Universiteit Brussel, Pleinlaan 2, Brussels, Belgium
  4
     Institut für Prozessdatenverarbeitung und Elektronik, KIT, Karlsruhe, Germany
                                  5
                                    DESY, Zeuthen, Germany
            6
               Skobeltsyn Institute of Nuclear Physics MSU, Moscow, Russia
7
    Bartol Research Inst., Dept. of Phys. and Astron., Univ. of Delaware, Newark, USA


       Abstract. The Tunka Radio Extension (Tunka-Rex) is a digital antenna
       array operating in the frequency band of 30-80 MHz, measuring the radio
       emission of air-showers induced by ultra-high energy cosmic rays. Tunka-
       Rex is co-located with the TAIGA experiment in Siberia and consists
       of 63 antennas, 57 of them in a densely instrumented area of about
       1 km2 . The signals from the air showers are short pulses, which have
       a duration of tens of nanoseconds and are recorded in traces of about
       5 µs length. The Tunka-Rex analysis of cosmic-ray events is based on
       the reconstruction of these signals, in particular, their positions in the
       traces and amplitudes. This reconstruction suffers at low signal-to-noise
       ratios, i.e. when the recorded traces are dominated by background. To
       lower the threshold of the detection and increase the efficiency, we apply
       advanced methods of signal reconstruction, namely matched filtering and
       deep neural networks with autoencoder architecture. In the present work
       we show the comparison between the signal reconstructions obtained
       with these techniques, and give an example of the first reconstruction of
       the Tunka-Rex signals obtained with a deep neural networks.

       Keywords: Tunka-Rex · Matched filtering · Autoencoder · Denoising


1    Introduction
Cosmic rays (CR) are high-energy charged particles with extra-terrestrial origin.
The sources of ultra-high energy CR are still unknown due to complexity of
tracing charged particles deflected by galactic and extragalactic magnetic fields.
However, the energy spectrum and mass composition of CR can shed light on the
most powerful cosmic accelerators. Due to low flux of ultra-high energy CR it is
impossible to measure them directly (in space or higher layers of atmosphere),
and they are detected by large ground detectors measuring cascades produced
by their interaction with the atmosphere. These cascades, called air-showers,
consist of many secondary particles, including electrons and positrons, which
produce short radio pulses due to deflection in the Earth’s magnetic field. These
pulses have a broadband spectrum mostly in the MHz domain and a duration
of tens of nanoseconds [1].
    Tunka-Rex [2] is a sparse antenna array located at the TAIGA facility [3, 4]
in the Tunka valley (Eastern Siberia). It consists of 63 antennas measuring radio
emission from air showers in the frequency band of 30-80 MHz. Since Tunka-
Rex is placed in a relatively radio-quiet location, the main background is from
the Galaxy and has a power law behavior with an index of ≈ −2.5. However,
there are plenty of non-stationary background sources, which complicate the re-
construction of events with low energies and may distort the reconstruction of
the signal. This background is caused by hardware RFI (especially after several
upgrades of TAIGA facility) and by anthropogenic influence. A simple study of
the background shows that the noise in the Tunka-Rex traces has distinguish-
able features and cannot be approximated as white noise (see Fig. (1)). In this
work we discuss different approaches of signal reconstruction and background
suppression, particularly a standard one used in many experiments and more
sophisticated ones: matched filtering (MF) and autoencoder (AE). We discuss
their performance and future applications.


2    Signal reconstruction

The air-shower pulse at each antenna station is measured in two independent
channels and digitized in two traces of 1024 samples each with a rate of 200 MHz
(i.e. binsize of 5 ns). For the reconstruction of cosmic-ray air showers the two
main properties of radio pulses are used: the amplitude of the signal and its
arrival time. Details of this reconstruction and its application to cosmic-ray
science are given in Refs. [5, 6]. In this paper we focus on the signal reconstruction
itself and methods of its improvement.
    Before the reconstruction of the signals we perform several preprocessing
transformations. Spectra of the signals obtained with Fourier transform are cut
by a digital bandpass to 35-80 MHz and filtered with a median filter, which
removes narrow-band RFI and equalizes the noise using a sliding window of
3 MHz width. Then the traces are upsampled in order to increase the timing
resolution by factors of 4, 16, and 64 for the standard method, AE, and MF,
respectively. As last step, the electric field along the two polarization directions
in the plane perpendicular to the shower axis are reconstructed, namely v × B
(along the Lorentz force, where v is the direction of the air shower and B the
direction of the geomagnetic field) and v × v × B perpendicular to it. Since the
radio emission of air-showers is mostly in the v × B polarization we consider
only this one hereafter.
                            averaged signal                         before pulse:
                    400                                             RMS (mean) = 21.84
                            standard deviation
                                                                    RMS (aver.) = 4.09
                                                                    after pulse:
                    200
amplitude (µV/m)


                                                                    RMS (mean) = 24.56
                                                                    RMS (aver.) = 4.21

                      0


                   −200


                   −400

                     −1.5      −1.0              −0.5      0.0      0.5        1.0       1.5
                                                        time (µs)

Fig. 1. Average of 400 Tunka-Rex traces centered at the pulse. The black line indi-
cates the mean value, the shaded area indicates
                                             √ one standard deviation. The expected
reduction factor of the noise by averaging is 400 = 20, however it is significantly less
(about 5), moreover one can see prominent noise feature after the pulse.


    The main quantity used for the definition of the threshold of signal recon-
struction is the signal-to-noise ratio (SNR). The SNR is defined using the fol-
lowing formula:
                                  SNR = S 2 /N 2 ,                          (1)
where S is the amplitude of the peak of the signal envelope S = max(s(t)) inside
of the signal window, a 200 ns window defined by the trigger time. The envelope
s(t) is defined as
                             s(t) = u(t) + iH[u(t)] ,                        (2)
where u(t) is the initial trace of the electric field at the antenna as a function of
time t, and H denotes the Hilbert transformation.The noise level N is defined as
RMS in the noise window. SNRth is defined as minimal RMS by sliding the noise
window, a 500 ns window, scanning the trace in order to pass over occasional
RFI. We also use the additional definition of neighborhood SN Rneighb. for the
cases of overlapping signal with RFI (when noise is inside the signal window)
based on the estimation of the noise level around the signal (± 100 ns). Thus,
to keep the ratio of false positive detection at the level of 5%, the thresholds are
set as following: SNRth ≥ 16 and SNRneighb. ≥ 10.

2.1                  Signal and background dataset for simulation study
In this study we use a dataset of 650 000 samples of measured background
(2014-2017) recorded by Tunka-Rex and 25 000 CoREAS [7] simulations. The
air-shower pulse is randomly located within the signal window, summed with
                              −3                                                                                         3
                      6 ×10                                                                                    1.6 ×10
                                                               threshold = 854 µV/m
                                                               CC with noise                                   1.4
                      5
                                                                                                               1.2


                                                                                       True amplitude (µV/m)
Probability density

                      4
                                                                                                               1.0

                      3                                                                                        0.8

                                                                                                               0.6
                      2
                                                                                                               0.4
                                                                                                                                                               threshold
                      1
                                                                                                               0.2                                             signal w/noise
                                                                                                                                                               signal without noise
                      0                                                                                        0.0
                      0.2          0.4   0.6   √ 0.8     1.0        1.2       1.4                                0.0         0.5   1.0   1.5√      2.0   2.5     3.0      3.5     4.0
                                                Acc (µV/m)                      ×103                                                            Acc (µV/m)                      ×103

                                         √
Fig. 2. Left: Distribution of MF values Acc applied on traces of pure noise without
air-shower signal. The detection threshold
                                       √ (red line) is set to have < 5% of false positive
detections. Right: Correlation between Acc and the amplitude of the simulated signal.


noise and folded with the Tunka-Rex hardware response taking into account the
geometry of the air shower and the calibration of the detector.
    As was discussed in Ref. [6], the simulated signals reproduce real ones with
satisfactory accuracy. As shown below, the methods developed for simulated
pulses can be applied to the real data without additional tuning.


2.2                           Matched filtering

MF is a widely used technique of extracting signal from noisy data based on the
convolution of a signal template T with the time trace u(t):
                           X
                 scc [t] =   T [t0 − t]u[t], Acc = max(scc [t0 ]) ,         (3)
                                                           t0

where Acc is the maximum of the convolution, which defines the peak of the
filtered signal scc [t].
     MF is very effective for the detection of the signal position inside a trace con-
taminated significantly by white noise and can even detect signals with SNR  1.
However, MF is unable to reconstruct the pulse shape and can only give an ap-
proximate estimation of the amplitude of initial signal.
     For testing the filter we use a single template with a length of 60 ns obtained
from averaging √ over many CoREAS simulations. The threshold of signal detec-
tion is set to Acc = 854 µV/m (see Fig. (2), left), allowing for a 5% probability
of false positives (similar to our standard method [2]).


2.3                           Autoencoder

The AE [8] is an unsupervised convolutional network used for learning the coded
representation of the data and removing specific features from it. We train the
AE in order to learn noise-related features and clean the input traces leaving
                         8 filters       16 filters       32 filters                          8 filters       16 filters       32 filters


  Number of layers   3    0.34             0.28             0.27                          3    0.78             0.74             0.68


                                                                       Number of layers
                     4    0.24             0.31             0.27                          4    0.74             0.72             0.74


                     5    0.26            overfit          overfit                        5    0.92            overfit          overfit


                               Number of filters on 1st layer                                       Number of filters on 1st layer


Fig. 3. Performance of different configurations of the autoencoder (see main text for
explanation). Left: efficiency (fraction of reconstructed events Nrec /Ntot ). The rela-
tively low efficiency is due to the detection threshold of the AE at the very low SNRs.
Right: purity (fraction of events with properly reconstructed peak position, namely
< 5 ns from true value: Nhit /Nrec ). The reconstruction quality of the AE increases
with complexity. However, the AE has been overfitted when the number of degrees of
freedom exceeded the size of the training data.


cosmic-ray signals. The input array for our AE consists of 4096 values, what
corresponds to a trace length of 1280 ns and 0.3125 ns sampling in order to
contain the signal window of 200 ns as well as surrounding background. For the
minimization of the loss, we normalize the input data to the [0:1] range with a
baseline level at 0.5.
    The encoder, the encoding part of the AE, distinguishes features of noise
contained in the input data by applying a set of filters. The filters perform the
convolution of characteristic noise-related features with the input data, estimate
its contribution as a result of the convolution, and afterwards send it to the
max-pooling layer. The max-pooling layer performs a discrete downsampling of
the data and sends it to the next convolution layer with the next set of filters.
With each layer of the encoder, the data becomes more abstract and reduced in
size. The result of encoding is a map of features of the input data.
    After encoding, noise-related features are removed and the map of denoised
data proceeds to the next part of the AE, the decoder, which produces a reverse
reconstruction and returns a data array of the same dimension as the input. In
case of success, the resulting output is the denoised trace containing only the
air-shower pulse.
    We have explicitly selected a subsample with low amplitudes and low SNR
for training to test the possibility of lowering the threshold. We implemented
and trained our AE with Keras [9] and Tensorflow [10] in a uDocker container
with GPU support.
                          80
                                                                                             signal+noise
                          60                                                                 signal
                                                                                             denoised signal
                          40
      Amplitude (µV/m)
                          20

                           0

                         −20

                         −40

                         −60

                         −80
                           500            600         700          800           900     1000              1100
                                                                  t (ns)


                          150
                                                                                             signal+noise
                                                                                             signal
                          100                                                                denoised signal
      Amplitude (µV/m)


                           50


                               0


                         −50


                         −100


                            500                 600         700            800         900                 1000
                                                                  t (ns)


                          80

                          60

                          40
      Amplitude (µV/m)


                          20

                           0

                         −20

                         −40
                                   signal+noise
                         −60       signal
                                   denoised signal
                         −80
                           500                600           700            800         900                 1000
                                                                  t (ns)


Fig. 4. Examples of the AE performance on simulated data (from top to bottom):
1) correct identification (true positive, signal is perfectly denoised); 2) no identifica-
tion (false negative, due to heavy distortion by the noise), 3) double identification
(true+false positive, passing signal-like RFI).
                    103                                                                18
                             reco. (std.)                                                               reconstructed (std.), µ = 0.75◦
                             reco. (MF)                                                16               reconstructed (MF), µ = 0.90◦
                             reco. (AE)                                                                 reconstructed (AE), µ = 1.02◦
                                                                                       14
                             simulated
                         2
                    10                                                                 12
 number of events


                                                                    number of events
                                                                                       10

                                                                                        8
                         1
                    10                                                                  6

                                                                                        4

                                                                                        2

                    100                                                                 0
                     16.0     16.5         17.0       17.5   18.0                        0   1            2          3       4            5
                                     energy log(E/eV)                                            6   (Atrue, Arec) (degrees)


Fig. 5. Comparison of the performance of different methods on simulated data. Left:
distribution of the reconstructed events as function of the energy for the standard, MF
and AE pipelines; right: angular resolution using the different methods.


   We have tested several AE with different sizes defined by the depth (D) and
number of filters per layer (N ), where the configuration of the i-th (i = 1, ..., D)
encoding layer is defined as follows:

                                                Si = Smin × 2D−i , ni = 2i+N −1 ,                                                         (4)

where Si is the size of the i-th filter, ni is the number of filters per layer; Smin =
16 is the minimum size of a layer. To estimate the performance of the AE we
use a cross-entropy loss function with the following metrics:

 – efficiency Nrec. /Ntot. , the fraction of reconstructed events. This metric indi-
   cates how many non-zero traces are returned by the AE. However this metric
   alone is insufficient. To evaluate the performance of the AE one has to check
   also the purity.
 – purity Nhit /Nrec. , the fraction of events with a peak position reconstructed
   within 5 ns from the true position. This metric indicates the true positive
   detections and filters misreconstructed signals.

    Fig. (3) shows a comparison chart of the different AE settings used. Although
only a quarter of the testing sample is reconstructed (due to low SNR), the
purity of the reconstruction is very high, i.e. the reconstruction of background-
dominated signals is improved. Moreover, the performance of the AE increases
with the complexity of the network and we plan to train more sophisticated AE
with larger datasets (our present AE is limited by the relatively small training
set). Examples of the reconstruction are given in Fig. (4).
                        0.8
                                                                          expected pulse position
                                                                                       raw signal
                        0.6                                                      denoised signal

                        0.4
Amplitude (arbitrary)


                        0.2


                          0


                        -0.2


                        -0.4


                        -0.6


                        -0.8
                            300          400       500              600       700                   800
                                                           t (ns)


Fig. 6. Example of the autoencoder performance on a measured Tunka-Rex trace show-
ing successful denoising of the typical RFI after the signal, cf. Fig. (1).


2.4                            Comparison of the methods
We have integrated both, the MF and AE algorithms, with the Auger Offline
framework [11], used for the standard reconstruction of Tunka-Rex. This enables
us to check the performance of MF and AE integrated in Tunka-Rex reconstruc-
tion pipeline, and to use the identical benchmark for comparison between the
standard signal reconstruction, matched filtering, and denoising with the au-
toencoder. As metrics we selected the reconstruction of air-shower events (which
requires the reconstruction of the signal at several antenna station per event),
and the accuracy of the air-shower reconstruction. We performed this test on
simulations using a small subset of the Tunka-Rex library. The results of the
comparison in Fig. (5) show that both, MF and AE, are ready for the appli-
cation to the reconstruction of real data. Moreover, the threshold is slightly
lowered. With future improvements of MF (using a library of templates instead
of a single one, and better whitening of the noise) and the AE (developing a
more sophisticated network using a larger training dataset) we expect a fur-
ther improvement of the reconstruction. At the moment we are working on the
application of the AE on real Tunka-Rex data, an example is shown in Fig. (6).


3                        Discussion and conclusion
The signal reconstruction in the Tunka-Rex experiment is continuously improved
(see, for example, Refs. [12, 13]). One of the most advanced method, namely deep
learning, represented by the autoencoder technique is discussed in this work (it
is worth noticing that a similar technique was recently used for a similar prob-
lem [14]). We have shown that the methods described in this paper are ready for
the implementation for the reconstruction of real data, and we are currently
working on this reconstruction. Besides the application of these methods to
Tunka-Rex data we plan to apply them in a Tunka-Rex child engineering ex-
periment, Tunka-21cm, for RFI tagging and denoising, and to the Tien-Shan
array [15] recently upgraded with SALLA.


Acknowledgements

This work has been supported by the Russian Foundation for Basic Research
                           
(grants 18-32-00460 and 18-32-20220), by the Helmholtz grant HRSF-0027,
              
and by grant 18-41-06003 Russian Science Foundation (Section 2.3).


References

 1. Schröder, Frank G., “Radio detection of Cosmic-Ray Air Showers and High-Energy
    Neutrinos,” Prog. Part. Nucl. Phys., vol. 93, pp. 1–68, 2017.
 2. P. A. Bezyazeekov et al., “Measurement of cosmic-ray air showers with the Tunka
    Radio Extension (Tunka-Rex),” Nucl. Instrum. Meth., vol. A802, pp. 89–96, 2015.
 3. N. Budnev et al., “The TAIGA experiment: From cosmic-ray to gamma-ray as-
    tronomy in the Tunka valley,” Nucl. Instrum. Meth., vol. A845, pp. 330–333, 2017.
 4. D. Kostunin et al., “Tunka Advanced Instrument for cosmic rays and Gamma As-
    tronomy,” in 18th International Baikal Summer School on Physics of Elementary
    Particles and Astrophysics: Exploring the Universe through multiple messengers
    (ISAPP-Baikal 2018) Bolshie Koty, Lake Baikal, Russia, July 12-21, 2018, 2019.
 5. P. A. Bezyazeekov et al., “Radio measurements of the energy and the depth of
    the shower maximum of cosmic-ray air showers by Tunka-Rex,” JCAP, vol. 1601,
    no. 01, p. 052, 2016.
 6. P. A. Bezyazeekov et al., “Reconstruction of cosmic ray air showers with Tunka-
    Rex data using template fitting of radio pulses,” Phys. Rev., vol. D97, no. 12,
    p. 122004, 2018.
 7. T. Huege, M. Ludwig, and C. James, “Simulating radio emission from air showers
    with CoREAS,” AIP Conf.Proc., vol. 1535, p. 128, 2013.
 8. Y. Bengio, A. Courville, and P. Vincent, “Representation learning: A review and
    new perspectives,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, pp. 1798–
    1828, Aug. 2013.
 9. F. Chollet, “keras.” https://github.com/fchollet/keras, 2015.
10. M. Abadi et al., “TensorFlow: Large-scale machine learning on heterogeneous sys-
    tems,” 2015. Software available from tensorflow.org.
11. P. Abreu et al., “Advanced functionality for radio analysis in the Offline soft-
    ware framework of the Pierre Auger Observatory,” Nucl.Instrum.Meth., vol. A635,
    pp. 92–102, 2011.
12. P. A. Bezyazeekov et al., “Imporoving reconstrucion methods for radio measure-
    ments with Tunka-Rex,” in 25th European Cosmic Ray Symposium (ECRS 2016)
    Turin, Italy, September 04-09, 2016, eConf C16-09-04.3, 2017.
13. D. Kostunin et al., “Improved measurements of the energy and shower maximum
    of cosmic rays with Tunka-Rex,” PoS, vol. ICRC2017, p. 400, 2017.
14. M. Erdmann, F. Schlüter, and R. Smida, “Classification and Recovery of Radio Sig-
    nals from Cosmic Ray Induced Air Showers with Deep Learning,” JINST, vol. 14,
    no. 04, p. P04005, 2019.
15. A. Beisenova et al., “Search for EAS radio-emission at the Tien-Shan shower instal-
    lation at a height of 3340 m above sea level,” EPJ Web Conf., vol. 145, p. 11003,
    2017.