3D Radio Holographic Images Synthesis and Filtration on Multiprocessor Computing Systems Alexey A. Kalmykov1 , Vadim A. Dobryak1 , Andrey A. Kalmykov1 , Anton S. Kurilenko1 , Elena N. Akimova1,2 , Aliya F. Skurydina1,2 , Vladimir E. Misilov1,2 1 Ural Federal University, Yekaterinburg, Russia kaa@iidt.ru, dobryak1958@gmail.com, andrey-kalmykov@yandex.ru, 2 Krasovskii Institute of Mathematics and Mechanics, Ural Branch of RAS, Yekaterinburg, Russia aen15@yandex.ru, {afinapal, out.mrscreg}@gmail.com Abstract. The work describes an algorithm for 3D radio holographic images synthesis in a subsurface sounding system designed by the au- thors. The algorithm was parallelized and implemented for multicore pro- cessors and NVIDIA GPU. Computational complexity of the algorithm was estimated. Numerical experiments were performed for estimating the algorithm performance and efficiency. Experiments were carried out on multicore processors with vector instructions set and NVIDIA GPU. The developed system can locate inhomogeneity (size of which can be a few centimeters) in doublelayered reinforced concrete at the depth of a few tens of centimeters. Keywords: 3D images, subsurface radiolocation, fast Fourier transform, parallel computations, linear filtration, CUDA, OpenMP 1 Introduction A number of problems, which are solved using the subsurface sounding [1], is growing today. It is stipulated by increased availability of locators quality, as well as, by increased performance of modern computational systems. Application of the parallel or quasi-parallel systems allows one to perform the sounding and image synthesis in a fraction of a second. The wide range of problems includes high-performance passenger screening systems, searching for the hidden objects in buildings, computer vision systems, archeology, etc. The 3D radio holographic subsurface locator developed by authors is described in [2–6]. The basic operat- ing principles include: application of the sounding signal with a linear frequency modulation, correlation filter based on processing the reflected signal, an inter- nal coherence of the system, and radio holographic synthesis of 3D images. All these principles allow the system to obtain high spatial resolution by using ultra- wideband sounding signal, high angular resolution by using aperture synthesis with portable antennas, and MIMO-lines or matrices. Further modification of the image synthesis algorithm considering refraction was developed. Now for 37 practical application of the system, it is essential to use effective software and hardware for fast processing of radio signals and showing results in the real time. The authors see two ways of using hardware to implement fast signal process- ing. The first one is to use specialized computers with signal processors and programmable logic devices. The second one is to use general-purpose personal computers with high-performance coprocessors. Combination of these two ways can be used [7, 8]. In this paper, we estimate performance of one parallel algo- rithm for 3D image synthesis using the CPU and GPU, estimate the resolution obtained, and propose the way to its increasing. 2 Image synthesis algorithm The algorithm comprises the following steps [3]: 1. There is a set of beat-frequency waveforms {sk (t)}, k = 1, K, for different positions {(xk , yk , zk )} of antenna. 2. Compute the complex spectrums of these waveforms using the Fourier transform Sk (f ) = F {sk (t)}. (1) The delay is τ = f · Tm /∆f , so, we get a set {Sk (τ )}. Here, Tm is the modulation period and ∆f is the frequency deviation. The algorithmic complexity of this step is O(N · log(N ) · K), where N is the number of quantization steps for one signal. 3. Perform phasing for the whole set Sk0 (τ ) = Sk (τ ) · e−jφ0 (τ ) , (2) where φ0 (τ ) = 2π(f0 − ∆f /2) · τ . Complexity of this step is O(N · K). 4. For all coordinates of the required volume (x, y, z), calculate the total intensity X I(x, y, z) = k Sk0 (τ )k, (3) K where τk is the total delay of the echo from the (x, y, z) point considering possible refractions at the media interfaces. The complexity of this step is O(M ·K), where M = X · Y · Z is the total number of voxels in the synthesized image, X, Y, Z are the voxel resolutions for each coordinate. The total algorithmic complexity of the algorithm is O(K(N · log(N ) + M )). To achieve the desired resolution of synthesized images, we have to impose cer- tain restrictions onto all components of this expression. Application of the system with the serial sounding allows one to use parallelization for each single signal processing (on each step of the Fourier transform computation and accumula- tion) and to find values independently for different points using T threads. Thus, the complexity can be reduced to O( K T (N · log(N ) + M )). 38 3 Parallelization and numerical experiments results Application of the multithreaded system allows one to process the data for each thread independently. So, application of T threads reduces time complexity by T times. However, this also increases space complexity by requirement to store copy of the data for each thread. Combination of these methods can be used. In the experiment, the synthesized image has the following parameters: the resolution in voxels is 100 × 100 × 300, the beat-frequency waveforms set K has the length of 2048, and the number of quantization steps is 216. Both the serial and parallel methods were applied to the same image. Table 1 shows the computation times for the serial method and parallel one with four threads. It is seen that the computation time increases linearly with the number of signals. Table 1. Computation time versus the number of signals for different implementations of the algorithm Execution time, sec Number K of signals Serial signal processing Parallel signal processing 2048 30 22 4096 55 48 6044 76 60 8192 100 85 Another way to increase the implementation performance is to use opti- mization for the certain processor architecture. For example, contemporary Intel processors support the vector instructions sets. This allows one to increase the performance of floating-point operations significantly. The results of the automatically vectorized program for the Intel Core i7 processor using the Intel C++ Compiler 14.0 are shown in Table 1. The thread parallelization was performed using the OpenMP technology. The number of signals is K = 2048. The algorithm was also implemented for GPU using the NVIDIA CUDA technology. The experiments were carried out using the NVIDIA GTX 780Ti (2880 cores, 875 MHz). Table 3 shows that this implementation allows one to produce an image with 15 frames per second. The experiments were also performed using a PC with the Intel Core i7 3610M (2.4 GHz, 9 logical cores) processor with 16 GB of RAM. The algorithm parameters were: M = 106 and N = 262144. The time shown in tables is the minimal time of 1000 experiments with the same conditions. The FFT algorithm was taken from the FFTW and NVIDIA cuFFT libraries. 39 Table 2. Computation time versus the number of threads; the number of signals is 2048 Execution time, sec Number T of threads without AVX with AVX 1 109 78 2 60 34 3 38 24 4 32 21 5 28 14 6 20 13 7 19 12 8 18 10 Table 3. Computation time versus the number of signals for NVIDIA GTX 780Ti Number K of signals Execution time, sec 1536 0.15 2048 0.12 2816 0.07 4 Linear filtration The problem of the low-sized subsurface object visualization motivates creating devices with characteristics extremely close to their theoretical physical limits. Some of the fundamental bases of those limits are a wavelength of the probing signal, aperture sizes, and signal losses in the probed mediums. At the same time, methods of additional image filtering facilitate reaching these limits, which is useful in solving various applied problems. For example, consider a problem of probing different building structures. The purpose of linear filtering consists of highlighting the low-sized objects on a background of high-intensity reflections from reinforcements and wall surfaces. Would the filtering method based on the minimal prior information (e.g. that the wall is considered to be flat, the reinforcement bars to be long, etc.) be still effective? 40 Fig. 1. Images (left) and their spectra (right). The principle of the filter operation is the following. Figure 1 (top) shows the simulated two-dimensional image of the reinforcement lattice and the absolute value of its spatial frequency spectrum. It can be shown that the absolute value does not depend on the shift of the lattice with respect to the picture frame. The shift occurs only in the phase- frequency spectrum. The fluctuations of the absolute values of frequency compo- nents depend on the lattice’s step size. The significant volume of spatial harmonic curves for a rectangular reinforcement lattice is concentrated along the coordi- nate axes. With rotation of the lattice, the spectrum rotates about the vertical axis. Horizontally oriented ribs correspond to the harmonics located alongside one of the axes, and the vertically oriented ribs correspond to the harmonics located alongside the other one. Extending this conclusion to the third dimension, we can be assured that any orthogonal plain in the 3D image corresponds to the spectrum concentrated alongside a single plain as well. 41 This fact suggests using a linear 3D filter with the spatial-frequency spectrum that equals 0 for any coordinates that have at least one component at zero and equals 1 in any other case. The filter is supposed to be effective in clearing any plains and lines from the images that are orthogonal or parallel to any of the coordinate axes. However, the rate of distortion of intensity level and the form of low-sized objects remain vital. Figure 1 (bottom) shows the simulated image of a low-sized object and its spectrum. As we can see, the significant part of spectrum of the object’s image is spread along all the frequencies. This fact proves that our filter can be effective. Figure 2 shows summary images and their spectra before (top) and after (bottom) filtering. The object’s intensity is slightly reduces and the cross-like beams appeared, but these are admissible distortions. Fig. 2. Images before and after filtering (left) and their spectra (right). 42 The algorithm for the filter is implemented in the frequency domain by using forward and backward Fourier transform with transitional multiplication of the spectrum by the spatial-frequency function (see Fig. 3). The complexity of the Fourier transform is O(M 3 · log(M )), the complexity of multiplication is O(M 3 ). Thus, the algorithmic complexity of filtration is O(M 3 ). Fig. 3. Filtration in spatial frequency domain. Figure 4 shows images before and after filtering. The probed object is the ferroconcrete wall with double-layered reinforcements of 0.3 m width. Specific attenuation is 35 dB/m, permittivity equals 7. The distance between the wall and the scanner’s aperture is 1.2 m. The objects that we want to be highlighted are metal plates with size of 3 cm located in the reinforcement ribs junction on one of the ribs and between ribs on equal distance from them and at the depth from 10 to 20 cm. The filtering proves significant reduction of intensity of images of the wall’s plain and reinforcement ribs. 5 Conclusion Analysis of algorithmic complexity of the algorithm suggested for synthesis of 3D radio holographic images was carried out. Application of the serial and parallel processing was considered. The experiments were carried out for performance increasing by application of the parallel computing and vector instructions set. It was shown that by using the GPUs, it is possible to achieve performance sufficient for the real-time synthesis of the images. We have shown possibility of using the secondary linear filtering for the radio-frequency 3D images to reduce the masking effect of wall surfaces and reinforcement lattice as well as highlight the low-sized masked objects. The developed system can locate inhomogeneity (size of which can be a few centimeters) in double-layered reinforced concrete at the depth of a few tens of centimeters. 43 Fig. 4. Images before (top) and after (bottom) filtering. Acknowledgment This work was partially supported by Russian Foundation for Basic Research (Project No. 15-01-00629). References 1. Grinyov, A. U.: Questions of subsurface radiolocation (in Russian). Radiotechnika, Moscow (2005) 2. Dobryak, V. A., Kalmykov, Al. A., Kalmykov, An. A., Kurilenko, A. S.: Georadar with 3D images synthesis (in Russian). In: Proceedings of XI international scientific and technical conference Physics and technical applications of wave processes. UrFU, Yekaterinburg (2012) 3. Dobryak, V. A., Kalmykov, Al. A., Kalmykov, An. A., Kurilenko, A. S.: Theory and practice of three-dimensional radio frequency visualization of objects (in Rus- sian). In: The 23rd Int. Crimean Conf. Microwave & Telecommunication Technology (CriMiCo2013), pp. 11691170. Sevastopol (2013) 4. Dobryak, V. A., Kalmykov, Al. A., Kalmykov, An. A., Kurilenko, A. S.: Application of MIMO-lines in the problem of three-dimensional radio frequency visualizations of subsurface objects (in Russian). In: The 23rd Int. Crimean Conf. Microwave & Telecommunication Technology (CriMiCo2013), pp. 11951196. Sevastopol (2013) 44 5. Dobryak, V. A., Kalmykov, Al. A., Kalmykov, An. A., Kurilenko, A. S.: Addi- tional focusing in the problem of three-dimensional radio frequency visualizations of subsurface objects (in Russian). In: The 23rd Int. Crimean Conf. Microwave & Telecommunication Technology (CriMiCo2013), pp. 11851186. Sevastopol (2013) 6. Kalmykov, Al. A., Dobryak, V. A., Kalmykov, An. A., Kurilenko, A. S., Akimova, E. N., Skurydina, A. F., Misilov, V. E.: Parallel algorithms for synthesizing and processing of 3D radio holographic images (in Russian). In: Proceedings of the 10th Annual International Scientific Conference on Parallel Computing Technologies, pp. 521529, CEUR Workshop Proceedings, Aachen (2016) 7. Barkhatov, A. V., Kozlov, A. S.: Fast calculation of time-frequency function in radar using GPUs (in Russian). Radiolokatsiya i radionavigatsiya. 5, 4246 (2015) 8. Ampilov, O. V., Pyatkin, A. K., Topchiev, S. A., Nikitin, M. V.: Digital signal processing in coherent monopulse radar (in Russian). OAO Radiophysica, Moscow (2005)