Reconstruction of Radio Signals from Air-Showers with Autoencoder Pavel Bezyazeekov1 , Nikolay Budnev1 , Oleg Fedorov1 , Oleg Gress1 , Oleg Grishin1 , Andreas Haungs2 , Tim Huege2,3 , Yulia Kazarina1 , Matthias Kleifges4 , Dmitriy Kostunin5 , Elena Korosteleva6 , Leonid Kuzmichev6 , Vladimir Lenok2 , Nima Lubsandorzhiev6 , Stanislav Malakhov1 , Tatyana Marshalkina1 , Roman Monkhoev1 , Eleonora Osipova6 , Alexandr Pakhorukov1 , Leonid Pankov1 , Vasiliy Prosin6 , Frank Schröder2,7 , Dmitriy Shipilov1 , and Alexey Zagorodnikov1 1 Institute of Applied Physics ISU, Irkutsk, Russia 2 Institut für Kernphysik, KIT, Karlsruhe, Germany 3 Astrophysical Institute, Vrije Universiteit Brussel, Pleinlaan 2, Brussels, Belgium 4 Institut für Prozessdatenverarbeitung und Elektronik, KIT, Karlsruhe, Germany 5 DESY, Zeuthen, Germany 6 Skobeltsyn Institute of Nuclear Physics MSU, Moscow, Russia 7 Bartol Research Inst., Dept. of Phys. and Astron., Univ. of Delaware, Newark, USA Abstract. The Tunka Radio Extension (Tunka-Rex) is a digital antenna array (63 antennas distributed over ≈ 1 km2 ) co-located with the TAIGA observatory in Eastern Siberia. Tunka-Rex measures radio emission of air-showers induced by ultra-high energy cosmic rays in the frequency band of 30-80 MHz. Air-shower signal is a short (tens of nanoseconds) broadband pulse. Using time positions and amplitudes of these pulses, we reconstruct parameters of air showers and primary cosmic rays. The amplitudes of low-energy event (E < 1017 eV) cannot be used for suc- cesful reconstruction due to the domination of background. To lower the energy threshold of the detection and increase the efficiency, we use au- toencoder neural network which removes noise from the measured data. This work describes our approach to denoising raw data and further reconstruction of air-shower parameters. We also present results of the low-energy events reconstruction with autoencoder. · · Keywords: Tunka-Rex Efficiency Autoencoder Denoising· 1 Introduction Cosmic rays (CR) are accelerated charged particles traveling in the space. Most of them are protons, minor part is more massive atomic nuclei (up to iron). The sources of CR are associated with stars at different evolution stages. The Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). ultra-high energy CR (> 1015 eV) carry information about most powerful cosmic accelerators and studying them is one of important tasks in modern astrophysics. Due to the low flux of the ultra-high energy CR, it is impossible to measure them directly (in space or high layers of atmosphere), and they are detected by sparse ground detectors measuring cascades produced by their interaction with the atmosphere. These cascades, called air-showers, consist of many secondary particles, including electrons and positrons, which produce short radio pulses due to deflection in the Earth’s magnetic field and heterogenity of charge distribution in shower. These pulses have a broadband spectrum mostly in the MHz domain and a duration of tens of nanoseconds [1]. Tunka-Rex [2] is a sparse antenna array located at the TAIGA facility [3, 4] in the Tunka Valley (Eastern Siberia). It consists of 63 antennas measuring radio emission from air showers in the frequency band of 30-80 MHz. Since Tunka-Rex is placed in a relatively radio-quiet location, the main background is from the Galaxy. However, there are plenty of non-stationary background sources, which may distort the air-shower pulse and complicate the reconstruction of events with low energies. In this work, we present our progress on the way of reconstruction of low-energy events by removing RFI from Tunka-Rex signal traces using the autoencoder (AE) neural network and discuss the performance of this approach. 2 Dataset Tunka-Rex measures air-shower signals in two perpendicular polarizations and records it to traces of 1024 samples each with 200 MHz sampling. For the recon- struction of cosmic-ray air-showers, the two main properties of radio pulses are used: the amplitude of the signal and its arrival time. Details of this reconstruc- tion are given in Refs. [5, 6]. Before the reconstruction of the signals, we perform several preprocessing transformations. Spectra of the signals obtained with the Fourier transform are cut by a digital bandpass to 35-75 MHz and filtered with a median filter, which removes narrow-band RFI and equalizes the noise using a sliding window of 3 MHz width. Afterwards, the traces are upsampled in order to increase the timing resolution (factor 16 for this study). Finally, the electric fields along the two polarization directions in the plane perpendicular to the shower axis are reconstructed, namely v × B (along the Lorentz force, where v is the direction of the air shower and B the direction of the geomagnetic field) and v × v × B perpendicular to it. Since the main contribution of radio emission occurs in the v × B polarization, we consider only this one for the AE processing. In this study, we use a dataset of 650 000 samples of the measured background (2014-2017) recorded by Tunka-Rex and 25 000 CoREAS [7] simulations. The air-shower pulse is randomly located within the signal window, summed with noise and folded with the Tunka-Rex hardware response taking into account the geometry of the air shower and the detector calibration. As was discussed in Ref. [6], the simulated signals reproduce real ones with satisfactory accuracy. As shown below, the methods developed for simulated pulses can be applied to the real data without additional tuning. 3 Autoencoder (AE) AE [8] is an unsupervised convolutional network used for learning the coded representation of the data and removing specific features from it. The structure of AE can be described as follows: 1. Encoder. This part of AE distinguishes the features of noise contained in the input data by applying sets of filters. The filters perform the convo- lution of characteristic noise-related features with the input data, estimate its contribution as a result of the convolution, and afterwards send it to the max-pooling layer. The max-pooling layer performs a discrete downsampling of the data and sends it to the next convolution layer with the next set of filters. With each layer of the encoder, the data becomes more abstract and reduced in size. 2. Coded representation. Central layer of AE has the least size (1024 in this study) and contains an abstract code of the input data. Due to its small size, we lost part of the input data. Learning procedure tunes AE for loosing noise-related data and leaving only cosmic-ray signals. 3. Decoder. After encoding, the noise-related features are removed and the map of denoised data proceeds to the decoding part of AE, which produces a reverse reconstruction and returns a data array of the same dimension as the input. If successful, the resulting output is the denoised trace containing only the air-shower pulse, as shown in Fig. (1). 3.1 Configuration and training The input array for our AE consists of 4096 values, which corresponds to a trace length of 1280 ns and 0.3125 ns sampling in order to contain the signal window of 200 ns as well as surrounding background. To minimize the loss, we normalize the input data to the [0:1] range with a baseline level at 0.5. We have explicitly selected a subsample with low amplitudes and a low SNR for training to find out if the threshold may be lowered. We implemented and trained our AE with Keras [9] and Tensorflow [10] in a uDocker container with GPU support. After estimation of efficiency and accuracy of reconstruction with various depth (number of convolutional layers) and a number of filters per layer, we chose a 3-layer encoder with 8 filters per layer. The full pipeline reconstruction using the data denoised by AE shows precision comparable with the standard method [11]. 3.2 Real data reconstruction After a series of tests on the simulated data, we test the performance of AE in application to the reconstruction of real low-energy events. For this study, we use 0.8 expected pulse position raw signal 0.6 denoised signal 0.4 Amplitude (arbitrary) 0.2 0 -0.2 -0.4 -0.6 -0.8 300 400 500 600 700 800 t (ns) Fig. 1. Example of the autoencoder performance on a measured Tunka-Rex trace show- ing successful denoising of the typical RFI after the signal. a set of low-energy Tunka-133 [12] events (Fig. (2)) with energies from 1016 to 1017 eV, which is unavailable for reconstruction using the standard Tunka-Rex method [5]. The AE threshold was decreased from 0.395/0.500 to 0.200/0.500. The reconstruction pipeline is as follows: 1. Traces (single polarization) in the event are processed with AE. Peaks of envelopes of denoised traces are saved as assumed air-shower timestamps. 2. Reconstruction of the shower front and arrival direction using these times- tamps. 3. Cut by applying the cross-check between the reconstruction direction by AE and by the host experiment Tunka-133: passing only events with the difference < 5◦ . Additional cut for the geomagnetic angle α > 60◦ to select events with the maximal contribution of the geomagnetic effect. 4. Shifting the traces inside each event corresponding to the AE timestamps, summarizing them and normalizing to number of input traces in purpose of increasing SNR (Fig. (3)). The result of processing an event with this pipeline is an amplitude S of the coherent sum at AE timestamps and a mean distance r of input stations. By this we extrapolate the lateral distribution function: S S0 = , (1) exp[η0 (r − r0 )] where S0 is an amplitude at the distance r0 from the shower axis, η0 = −227.793· 10−5 m−1 is correction factor. This way we calculate the amplitude S180 (180 m to axis) related to the best correlation with air-shower energy. After that we 90° 600 Tunka-Rex station Shower core 135° 45° 400 60 200 50 40 30 20 10 0 180° 0° 200 B field 400 225° 315° 600 270° 600 400 200 0 200 400 600 Fig. 2. Left: Angular distribution of low-energy events used in this study. Right: dis- tribution of shower cores over the surface of the detector. reconstruct the energy E using the single antenna method: E = S180 · κ, (2) where κ = 868 · 10−6 EeV · m/V. This way we reconstruct the set of low-energy events. 83 events passed the amplitude threshold and the arrival direction cross- check. 13 of them survived after α and SNR cuts. In Fig. (4) one can see the results of this reconstruction. 4 Discussion and conclusion The performance of Tunka-Rex AE has been tested on real data. Reconstruction shows that we can reconstruct the arrival direction of low-energy events with AE, but energy precision for now is relatively low (≈ 26%). We have illustrated the possibility of reconstruction of low-energy events, and there is still room for improvement for the efficient application to the Tunka-Rex data processing. We plan to improve AE by testing different loss metrics and network architectures with bigger dataset. Future work also implies modification of the input trace normalization for saving the information of the absolute signal amplitude in denoised trace. This will enable us to validate our technique on the Tunka-133 + Tunka-Rex data and check its performance in application to the data measured by Tunka-Rex + Tunka-Grande experiments. In addition to the task of lowering the threshold, we also plan to check application of this technique to removing air-shower pulses from the raw data flow within the frame of the Tunka-21cm experiment [13]. Signal traces: 60 SNR = 18.05 40 20 0 20 40 60 1000 1200 1400 1600 1800 SNR = 12.26 40 20 0 20 40 1000 1200 1400 1600 1800 SNR = 9.77 40 20 0 20 40 1000 1200 1400 1600 1800 Coherent sum: 40 SNR = 31.48 20 0 20 40 1000 1200 1400 1600 1800 Fig. 3. Example event with E = 30 PeV. Top: radio traces recorded at different stations in an event. Bottom: coherent sum of traces. The dotted line shows the peak of the denoised trace appropriate to the air-shower signal timestamp. Acknowledgements The work of P.Bezyazeekov on section ”Real data reconstruction” is supported by the Russian Foundation for Basic Research ”Mobility” program grant 19- 32-50147. The work has been supported by Russian Federation for Basic Re- search grant 18-32-20220, the Helmholtz grant HRSF-0027, the Russian Feder- ation Ministry of Science and High Education (project. FZZE-2020-0024), the Mathematical Center in Akademgorodok under agreement No 075-2019-1675 with the Ministry of Science and Higher Education of the Russian Federation and Irkutsk State University grant 091-19-213. P.Bezyazeekov thanks the com- munity of the Institute for Nuclear Research, where this study has been carried out, and personally G. Rubtsov. Fig. 4. Left: AE energy vs Tunka-133 energy. Right: histogram of relative energy de- viation. References 1. F.G. Schröder, “Radio detection of Cosmic-Ray Air Showers and High-Energy Neutrinos,” Prog. Part. Nucl. Phys., vol. 93, pp. 1–68, 2017. 2. P. A. Bezyazeekov et al., “Measurement of cosmic-ray air showers with the Tunka Radio Extension (Tunka-Rex),” Nucl. Instrum. Meth., vol. A802, pp. 89–96, 2015. 3. N. Budnev et al., “The TAIGA experiment: From cosmic-ray to gamma-ray as- tronomy in the Tunka valley,” Nucl. Instrum. Meth., vol. A845, pp. 330–333, 2017. 4. D. Kostunin et al., “Tunka Advanced Instrument for cosmic rays and Gamma As- tronomy,” in 18th International Baikal Summer School on Physics of Elementary Particles and Astrophysics: Exploring the Universe through multiple messengers (ISAPP-Baikal 2018) Bolshie Koty, Lake Baikal, Russia, July 12-21, 2018, 2019. 5. P. A. Bezyazeekov et al., “Radio measurements of the energy and the depth of the shower maximum of cosmic-ray air showers by Tunka-Rex,” JCAP, vol. 1601, no. 01, p. 052, 2016. 6. P. A. Bezyazeekov et al., “Reconstruction of cosmic ray air showers with Tunka- Rex data using template fitting of radio pulses,” Phys. Rev., vol. D97, no. 12, p. 122004, 2018. 7. T. Huege, M. Ludwig, and C. James, “Simulating radio emission from air showers with CoREAS,” AIP Conf.Proc., vol. 1535, p. 128, 2013. 8. Y. Bengio, A. Courville, and P. Vincent, “Representation learning: A review and new perspectives,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, pp. 1798– 1828, Aug. 2013. 9. F. Chollet, “keras.” https://github.com/fchollet/keras, 2015. 10. M. Abadi et al., “TensorFlow: Large-scale machine learning on heterogeneous sys- tems,” 2015. Software available from tensorflow.org. 11. P.Bezyazeekov et al. (Tunka-Rex collab.), “Advanced Signal Reconstruction in Tunka-Rex with Matched Filtering and Deep Learning,” CEUR Workshop Pro- ceedings, vol. 2406, pp. 7–16, 2019. 12. V. V. Prosin et al., “Primary CR energy spectrum and mass composition by the data of Tunka-133 array,” EPJ Web Conf., vol. 99, p. 04002, 2015. 13. D. Kostunin et al., “Quest for detection of a cosmological signal from neutral hydrogen with a digital radio array developed for air-shower measurements,” PoS, vol. ICRC2019, p. 320, 2020.