3D Radio Holographic Images Synthesis and
Filtration on Multiprocessor Computing Systems

      Alexey A. Kalmykov1 , Vadim A. Dobryak1 , Andrey A. Kalmykov1 ,
      Anton S. Kurilenko1 , Elena N. Akimova1,2 , Aliya F. Skurydina1,2 ,
                            Vladimir E. Misilov1,2
                  1
                   Ural Federal University, Yekaterinburg, Russia
      kaa@iidt.ru, dobryak1958@gmail.com, andrey-kalmykov@yandex.ru,
    2
      Krasovskii Institute of Mathematics and Mechanics, Ural Branch of RAS,
                               Yekaterinburg, Russia
            aen15@yandex.ru, {afinapal, out.mrscreg}@gmail.com


       Abstract. The work describes an algorithm for 3D radio holographic
       images synthesis in a subsurface sounding system designed by the au-
       thors. The algorithm was parallelized and implemented for multicore pro-
       cessors and NVIDIA GPU. Computational complexity of the algorithm
       was estimated. Numerical experiments were performed for estimating the
       algorithm performance and efficiency. Experiments were carried out on
       multicore processors with vector instructions set and NVIDIA GPU. The
       developed system can locate inhomogeneity (size of which can be a few
       centimeters) in doublelayered reinforced concrete at the depth of a few
       tens of centimeters.

       Keywords: 3D images, subsurface radiolocation, fast Fourier transform,
       parallel computations, linear filtration, CUDA, OpenMP


1    Introduction
A number of problems, which are solved using the subsurface sounding [1], is
growing today. It is stipulated by increased availability of locators quality, as well
as, by increased performance of modern computational systems. Application of
the parallel or quasi-parallel systems allows one to perform the sounding and
image synthesis in a fraction of a second. The wide range of problems includes
high-performance passenger screening systems, searching for the hidden objects
in buildings, computer vision systems, archeology, etc. The 3D radio holographic
subsurface locator developed by authors is described in [2–6]. The basic operat-
ing principles include: application of the sounding signal with a linear frequency
modulation, correlation filter based on processing the reflected signal, an inter-
nal coherence of the system, and radio holographic synthesis of 3D images. All
these principles allow the system to obtain high spatial resolution by using ultra-
wideband sounding signal, high angular resolution by using aperture synthesis
with portable antennas, and MIMO-lines or matrices. Further modification of
the image synthesis algorithm considering refraction was developed. Now for
                                                                                  37

practical application of the system, it is essential to use effective software and
hardware for fast processing of radio signals and showing results in the real time.
The authors see two ways of using hardware to implement fast signal process-
ing. The first one is to use specialized computers with signal processors and
programmable logic devices. The second one is to use general-purpose personal
computers with high-performance coprocessors. Combination of these two ways
can be used [7, 8]. In this paper, we estimate performance of one parallel algo-
rithm for 3D image synthesis using the CPU and GPU, estimate the resolution
obtained, and propose the way to its increasing.


2   Image synthesis algorithm

The algorithm comprises the following steps [3]:
   1. There is a set of beat-frequency waveforms {sk (t)}, k = 1, K, for different
positions {(xk , yk , zk )} of antenna.
   2. Compute the complex spectrums of these waveforms using the Fourier
transform
                                 Sk (f ) = F {sk (t)}.                           (1)

The delay is τ = f · Tm /∆f , so, we get a set {Sk (τ )}. Here, Tm is the modulation
period and ∆f is the frequency deviation. The algorithmic complexity of this
step is O(N · log(N ) · K), where N is the number of quantization steps for one
signal.
    3. Perform phasing for the whole set

                             Sk0 (τ ) = Sk (τ ) · e−jφ0 (τ ) ,                   (2)

where φ0 (τ ) = 2π(f0 − ∆f /2) · τ . Complexity of this step is O(N · K).
    4. For all coordinates of the required volume (x, y, z), calculate the total
intensity
                                           X
                            I(x, y, z) = k    Sk0 (τ )k,                     (3)
                                                K

where τk is the total delay of the echo from the (x, y, z) point considering possible
refractions at the media interfaces. The complexity of this step is O(M ·K), where
M = X · Y · Z is the total number of voxels in the synthesized image, X, Y, Z
are the voxel resolutions for each coordinate.
    The total algorithmic complexity of the algorithm is O(K(N · log(N ) + M )).
To achieve the desired resolution of synthesized images, we have to impose cer-
tain restrictions onto all components of this expression. Application of the system
with the serial sounding allows one to use parallelization for each single signal
processing (on each step of the Fourier transform computation and accumula-
tion) and to find values independently for different points using T threads. Thus,
the complexity can be reduced to O( K   T (N · log(N ) + M )).
38

3    Parallelization and numerical experiments results

Application of the multithreaded system allows one to process the data for each
thread independently. So, application of T threads reduces time complexity by T
times. However, this also increases space complexity by requirement to store copy
of the data for each thread. Combination of these methods can be used. In the
experiment, the synthesized image has the following parameters: the resolution
in voxels is 100 × 100 × 300, the beat-frequency waveforms set K has the length
of 2048, and the number of quantization steps is 216. Both the serial and parallel
methods were applied to the same image. Table 1 shows the computation times
for the serial method and parallel one with four threads. It is seen that the
computation time increases linearly with the number of signals.


Table 1. Computation time versus the number of signals for different implementations
of the algorithm

                                             Execution time, sec
     Number K of signals
                              Serial signal processing   Parallel signal processing
             2048                      30                          22
             4096                      55                          48
             6044                      76                          60
             8192                      100                         85


    Another way to increase the implementation performance is to use opti-
mization for the certain processor architecture. For example, contemporary Intel
processors support the vector instructions sets. This allows one to increase the
performance of floating-point operations significantly.
    The results of the automatically vectorized program for the Intel Core i7
processor using the Intel C++ Compiler 14.0 are shown in Table 1. The thread
parallelization was performed using the OpenMP technology. The number of
signals is K = 2048.
    The algorithm was also implemented for GPU using the NVIDIA CUDA
technology. The experiments were carried out using the NVIDIA GTX 780Ti
(2880 cores, 875 MHz). Table 3 shows that this implementation allows one to
produce an image with 15 frames per second.
    The experiments were also performed using a PC with the Intel Core i7
3610M (2.4 GHz, 9 logical cores) processor with 16 GB of RAM. The algorithm
parameters were: M = 106 and N = 262144. The time shown in tables is the
minimal time of 1000 experiments with the same conditions. The FFT algorithm
was taken from the FFTW and NVIDIA cuFFT libraries.
                                                                                 39

Table 2. Computation time versus the number of threads; the number of signals is
2048

                                             Execution time, sec
                Number T of threads
                                           without AVX     with AVX
                          1                   109             78
                          2                    60             34
                          3                    38             24
                          4                    32             21
                          5                    28             14
                          6                    20             13
                          7                    19             12
                          8                    18             10


 Table 3. Computation time versus the number of signals for NVIDIA GTX 780Ti

                     Number K of signals     Execution time, sec
                              1536                  0.15
                              2048                  0.12
                              2816                  0.07


4   Linear filtration

The problem of the low-sized subsurface object visualization motivates creating
devices with characteristics extremely close to their theoretical physical limits.
Some of the fundamental bases of those limits are a wavelength of the probing
signal, aperture sizes, and signal losses in the probed mediums. At the same
time, methods of additional image filtering facilitate reaching these limits, which
is useful in solving various applied problems.
    For example, consider a problem of probing different building structures.
The purpose of linear filtering consists of highlighting the low-sized objects on a
background of high-intensity reflections from reinforcements and wall surfaces.
Would the filtering method based on the minimal prior information (e.g. that
the wall is considered to be flat, the reinforcement bars to be long, etc.) be still
effective?
40


                   Fig. 1. Images (left) and their spectra (right).


    The principle of the filter operation is the following. Figure 1 (top) shows the
simulated two-dimensional image of the reinforcement lattice and the absolute
value of its spatial frequency spectrum.
    It can be shown that the absolute value does not depend on the shift of
the lattice with respect to the picture frame. The shift occurs only in the phase-
frequency spectrum. The fluctuations of the absolute values of frequency compo-
nents depend on the lattice’s step size. The significant volume of spatial harmonic
curves for a rectangular reinforcement lattice is concentrated along the coordi-
nate axes. With rotation of the lattice, the spectrum rotates about the vertical
axis.
    Horizontally oriented ribs correspond to the harmonics located alongside one
of the axes, and the vertically oriented ribs correspond to the harmonics located
alongside the other one. Extending this conclusion to the third dimension, we
can be assured that any orthogonal plain in the 3D image corresponds to the
spectrum concentrated alongside a single plain as well.
                                                                                     41

    This fact suggests using a linear 3D filter with the spatial-frequency spectrum
that equals 0 for any coordinates that have at least one component at zero and
equals 1 in any other case. The filter is supposed to be effective in clearing any
plains and lines from the images that are orthogonal or parallel to any of the
coordinate axes. However, the rate of distortion of intensity level and the form
of low-sized objects remain vital.

   Figure 1 (bottom) shows the simulated image of a low-sized object and its
spectrum. As we can see, the significant part of spectrum of the object’s image is
spread along all the frequencies. This fact proves that our filter can be effective.

   Figure 2 shows summary images and their spectra before (top) and after
(bottom) filtering. The object’s intensity is slightly reduces and the cross-like
beams appeared, but these are admissible distortions.


       Fig. 2. Images before and after filtering (left) and their spectra (right).
42

   The algorithm for the filter is implemented in the frequency domain by using
forward and backward Fourier transform with transitional multiplication of the
spectrum by the spatial-frequency function (see Fig. 3). The complexity of the
Fourier transform is O(M 3 · log(M )), the complexity of multiplication is O(M 3 ).
Thus, the algorithmic complexity of filtration is O(M 3 ).


                   Fig. 3. Filtration in spatial frequency domain.


    Figure 4 shows images before and after filtering. The probed object is the
ferroconcrete wall with double-layered reinforcements of 0.3 m width. Specific
attenuation is 35 dB/m, permittivity equals 7. The distance between the wall
and the scanner’s aperture is 1.2 m. The objects that we want to be highlighted
are metal plates with size of 3 cm located in the reinforcement ribs junction on
one of the ribs and between ribs on equal distance from them and at the depth
from 10 to 20 cm.
    The filtering proves significant reduction of intensity of images of the wall’s
plain and reinforcement ribs.


5    Conclusion

Analysis of algorithmic complexity of the algorithm suggested for synthesis of 3D
radio holographic images was carried out. Application of the serial and parallel
processing was considered. The experiments were carried out for performance
increasing by application of the parallel computing and vector instructions set.
It was shown that by using the GPUs, it is possible to achieve performance
sufficient for the real-time synthesis of the images. We have shown possibility of
using the secondary linear filtering for the radio-frequency 3D images to reduce
the masking effect of wall surfaces and reinforcement lattice as well as highlight
the low-sized masked objects. The developed system can locate inhomogeneity
(size of which can be a few centimeters) in double-layered reinforced concrete at
the depth of a few tens of centimeters.
                                                                                     43


               Fig. 4. Images before (top) and after (bottom) filtering.


Acknowledgment This work was partially supported by Russian Foundation
for Basic Research (Project No. 15-01-00629).

References
1. Grinyov, A. U.: Questions of subsurface radiolocation (in Russian). Radiotechnika,
   Moscow (2005)
2. Dobryak, V. A., Kalmykov, Al. A., Kalmykov, An. A., Kurilenko, A. S.: Georadar
   with 3D images synthesis (in Russian). In: Proceedings of XI international scientific
   and technical conference Physics and technical applications of wave processes. UrFU,
   Yekaterinburg (2012)
3. Dobryak, V. A., Kalmykov, Al. A., Kalmykov, An. A., Kurilenko, A. S.: Theory
   and practice of three-dimensional radio frequency visualization of objects (in Rus-
   sian). In: The 23rd Int. Crimean Conf. Microwave & Telecommunication Technology
   (CriMiCo2013), pp. 11691170. Sevastopol (2013)
4. Dobryak, V. A., Kalmykov, Al. A., Kalmykov, An. A., Kurilenko, A. S.: Application
   of MIMO-lines in the problem of three-dimensional radio frequency visualizations
   of subsurface objects (in Russian). In: The 23rd Int. Crimean Conf. Microwave &
   Telecommunication Technology (CriMiCo2013), pp. 11951196. Sevastopol (2013)
44

5. Dobryak, V. A., Kalmykov, Al. A., Kalmykov, An. A., Kurilenko, A. S.: Addi-
   tional focusing in the problem of three-dimensional radio frequency visualizations
   of subsurface objects (in Russian). In: The 23rd Int. Crimean Conf. Microwave &
   Telecommunication Technology (CriMiCo2013), pp. 11851186. Sevastopol (2013)
6. Kalmykov, Al. A., Dobryak, V. A., Kalmykov, An. A., Kurilenko, A. S., Akimova,
   E. N., Skurydina, A. F., Misilov, V. E.: Parallel algorithms for synthesizing and
   processing of 3D radio holographic images (in Russian). In: Proceedings of the 10th
   Annual International Scientific Conference on Parallel Computing Technologies, pp.
   521529, CEUR Workshop Proceedings, Aachen (2016)
7. Barkhatov, A. V., Kozlov, A. S.: Fast calculation of time-frequency function in radar
   using GPUs (in Russian). Radiolokatsiya i radionavigatsiya. 5, 4246 (2015)
8. Ampilov, O. V., Pyatkin, A. K., Topchiev, S. A., Nikitin, M. V.: Digital signal
   processing in coherent monopulse radar (in Russian). OAO Radiophysica, Moscow
   (2005)