=Paper=
{{Paper
|id=Vol-1987/paper15
|storemode=property
|title=A GPU-enabled Black-box Optimization in Application to Dispersion-based Geoacoustic Inversion
|pdfUrl=https://ceur-ws.org/Vol-1987/paper15.pdf
|volume=Vol-1987
|authors=Vadim Bulavintsev,Pavel Petrov,Mikhail Posypkin,Oleg Zaikin
}}
==A GPU-enabled Black-box Optimization in Application to Dispersion-based Geoacoustic Inversion==
<pdf width="1500px">https://ceur-ws.org/Vol-1987/paper15.pdf</pdf>
<pre>
    A GPU-enabled Black-box Optimization in Application
        to Dispersion-based Geoacoustic Inversion

                                         Vadim Bulavintsev
              Matrosov Institute for System Dynamics and Control Theory SB RAS
                                          Irkutsk, Russia
                                    v.g.bulavintsev@gmail.com
                                            Pavel Petrov
        Il’ichev Pacific Oceanological Institute FEB RAS, Far Eastern Federal University
                                        Vladivostok, Russia
                                         petrov@poi.dvo.ru
                                         Mikhail Posypkin
                           Dorodnicyn Computing Centre FRC CS RAS
                                          Moscow, Russia
                                      mposypkin@gmail.com
                                             Oleg Zaikin
              Matrosov Institute for System Dynamics and Control Theory SB RAS
                                          Irkutsk, Russia
                                       zaikin.icc@gmail.com


                                                        Abstract
                       In this study, a GPU-enabled implementation of an algorithm for the
                       solution of a real-life optimization problem arising in the geoacoustic
                       inversion is proposed. In the inversion algorithm, a single-hydrophone
                       recording of a pulse acoustic signal is used for the estimation of the
                       waveguide parameters from the dispersion data. We devised and imple-
                       mented a hierarchical optimization scheme where and eﬃcient GPU-
                       enabled implementation of the objective function calculation is com-
                       bined with the CPU implementation of the search algorithm. The
                       GPU-enabled implementation turned out to be several dozen times
                       faster than the pure CPU one. Exploiting this advantage we were able
                       to successfully solve the considered problem in reasonable time.


1    Introduction
Black-box optimization methods can be employed to solve real-life problems from diﬀerent areas (e.g., see
[Evtushenko et al., 2016, Gornov et al., 2016]). In this study, we apply black-box optimization for solving one
problem arising in geoacoustic inversion. The notion of geoacoustic inversion refers to a collection of tech-
niques that can be used for the reconstruction of geoacoustical waveguide structure from the sound pressure
Copyright ⃝
          c by the paper’s authors. Copying permitted for private and academic purposes.
In: Yu. G. Evtushenko, M. Yu. Khachay, O. V. Khamisov, Yu. A. Kochetov, V.U. Malkova, M.A. Posypkin (eds.): Proceedings of
the OPTIMA-2017 Conference, Petrovac, Montenegro, 02-Oct-2017, published at http://ceur-ws.org


                                                             95
measurements [Katsnelson et al., 2012]. A typical geoacoustical waveguide in a shallow sea is comprised by
a water column and several layers of bottom sediments. Normally measurements for the geoacoustic inver-
sion are performed using expensive receiver arrays. However, recently it was shown that single-hydrophone
recording of a broadband pulse signal can be also successfully used for estimating the acoustical parameters
of sea bottom [Bonnel & Chapman, 2011, Bonnel et al., 2012, Bonnel et al., 2013]. The implementation of the
method from [Bonnel & Chapman, 2011] in practice can be thought of as a solution of an minimization prob-
lem, where every evaluation of the objective function requires numerous solutions of an acoustic spectral prob-
lem [Petrov, 2014, Zaikin & Petrov, 2016]. In the present paper we use GPU-enabled black-box optimization to
solve the inversion problem mentioned above.
   Let us give a brief outline of the paper. In Section 2 we describe an black-box optimization algorithm, used
to solve the problem. In Section 3 we describe a problem of geoacoustic inversion. In Section 4 we describe
two implementations of the proposed algorithm, and show the results of computational experiments on the test
problem. In the rest of the paper we draw conclusions.

2    New Black-box Optimization Algorithm
In black-box optimization problems derivatives are either unavailable or very hard to compute. Thus, solution
methods should not rely on derivatives. There are a lot of local search techniques which can be used for this
purpose: Hooke-Jeeves method [Hooke & Jeeves, 1961], pseudo-gradient approaches [Evtushenko et al., 2016]
and a variety of coordinate descent techniques. We developed a simple modiﬁcation of the coordinate descent
method presented in Fig. 1.
 1 while maxi=1,...,n δi > ε do
 2     for i = 1 to n do
 3        x′ = x + δi ei
 4        if f (x′ ) < f (x) then
 5            x := x′
 6            δi := αδi
 7        else
 8            x′ = x − δi ei
 9            if f (x′ ) < f (x) then
10                 x := x′
11                 δi := αδi
12            else
13                 δi := βδi
14            end
15        end
16     end
17 end
                                            Algorithm 1: ASN Search
   The proposed algorithm (ASN search) uses asymmetric neighborhood adaptation. It tries to improve the
objective function f (x) by shifting by some value δi along each coordinate vector ei in both directions. In the
case of success (failure) the respective step size is increased (decreased) by multiplying it on α > 1(β < 1). When
approaching minimum values of δi usually decrease. The method terminates when the maximal component of
the vector δ becomes less that the predeﬁned ε > 0.
   The choice of initial δ and ε signiﬁcantly aﬀects the performance. For the problem under consideration the
best performance were obtained with δi = 10−1 , i = 1 . . . , n, ε = 10−4 .

3    Test Problem
The algorithm suggested in Section 2 is now applied to the solution of the following (synthetic) problem of
geoacoustic inversion.
   Consider a homogeneous 2D geoacoustic waveguide {(r, z)|z > 0} [Jensen et al., 2011], where the sound speed
and density depend on the depth z but do not depend on r (the range of the receiver [Jensen et al., 2011]).
The layer 0 ≤ z ≤ h represents the water colum, where the sound speed c(z) is a continuous function of depth,
                                3
while the density is ρw = 1 g/cm . The halfspace x > h is a penetrable (liquid) bottom, where the sound speed


                                                        96
                     Table 1: Search space and the true values of the unknown parameters.
                              Parameter True value Min value Max value
                              cb , m/s     1700         1550         1850
                                       3
                              ρb , g/cm    1.7          1.1          2
                              R, m         7000         6850         7150
and density are cb and ρb respectively. In this study, the value cb , ρb are considered unknown media parameters
(i.e. we have no data about the acoustic properties of the bottom). Our goal is to estimate these quantities by
analyzing some acoustic data.
    Assume that a pulse acoustic signal is emitted by a point source located at R = 0, z = zs and
received by a hydrophone at the distance R from the source. The exact value of the receiver range
R is also considered unknown. Performing some time-frequency analysis of the received signal (see e.g.
[Bonnel & Chapman, 2011, Bonnel et al., 2012, Petrov, 2014, Zaikin & Petrov, 2016]), we can obtain the arrival
times of diﬀerent modal components of the signal τ exp (f, m) as the functions of frequency f (here m is the mode
number [Jensen et al., 2011, Bonnel & Chapman, 2011]).
    Using the mode theory of sound propagation in shallow-water waveguides we can also compute the theoretical
arrival times τ th (f, m) for any given set A = (cb , ρb , R) of the values of the unknown parameters. Introducing a
ﬁtness that quantiﬁes the agreement between the theoretical and experimental arrival times
                                            ∑M ∑N m th                                2
                                                             τ (fn , A) − τmexp
                                                                                (fn )
                                   E(A) = m=1 n=1 ∑mM                                   ,                        (1)
                                                               m=1 Nm

we turn the geoacoustic inversion problem into a global minimization problem (note that in practice we use
the discrete set of frequencies {f1 , f2 . . . , fNm } and take into account M trapped modes [Jensen et al., 2011]).
Indeed, the vector Amin which denotes the solution of the problem E(A) → min for certain search space, contains
such values of unknown parameters that make the best match with experimental data.
   For our test case we simulated the signal at the receiver using the mode theory, then we applied a warping-based
algorithm to the resulting time series, and subsequently used the resulting syntetic dispersion data τ exp (f, m) as
the input for our inversion algorithm. The sound speed proﬁle in the water column and the pulse signal emitted
by the source was the same as in [Zaikin & Petrov, 2017]. The true values of the unknown parameters and the
search space (a cuboid) are presented in Table 1.

4   Computational Experiments
For many classes of computational problems, graphics processing units (GPUs) show signiﬁcant speedup over
central processing units (CPUs). In particular, GPUs can help to solve global optimization problems (e.g.,
see [Barkalov & Gergel, 2016]). However, to actually obtain this speed-up, the implementation of an algorithm
should be thoroughly adapted for GPU architecture [Bulavintsev, 2015]. Fortunately, in our approach to geoa-
coustic inversion problem the base algorithms are well ﬁt for a GPU. The complete process of solving the geoa-
coustic inversion problem could be represented as a hierarchy of the subproblems solved by the corresponding
sub-algorithms (see Table 2).

       Table 2: Hierarchy of algorithms used in the process of solving the geoacoustic inversion problem.
           Level                Problem                                   Algorithm
             1       Search for lowest residue point                     ASN Search
             2          Calc. residue for a point        Calc. residue over individual frequencies
                    Calc. modal group velocities for
             3                                                    Numerical diﬀerentiation
                           a single frequency
                     Sturm-Liouville problem on a      Calc. eigenvalues of a tridiagonal symmetric
             4
                                  mesh                 matrix (bisection algorithm [Demmel, 1997])
   In our computational experiments we considered the geoacoustic inversion problem described in Section 3.
So, the ASN search was applied for minimizing the objective function (1).
   The GPU programming model is based on data parallelism. To achieve peak eﬃciency, modern GPUs should
simultaneously run about 10 000 computational threads [Corporation, 2017, Bulavintsev, 2015]. In the GPU-
based implementation of our search procedure, we exploit data parallelism found in the parallel calculation of


                                                        97
       Table 3: Algorithm performance for various platforms and ﬂoating-point precisions formats.
                  Platform                    Precision   Residue    Time, s. Num. points points/s.
 CPU (alglib [Bochkanov & Bystritsky, 2016])   FP64      0.0128206     4678          800          0.1710
                                               FP64      0.0132451     1212          639          0.5272
                    CPU
                                               FP32      0.0209403      487          517          1.0616
                                               FP64      0.0131659      109          717          6.5779
                    GPU
                                               FP32      0.0215154       46          701         15.2391


                                  Table 4: Mixed-mode algorithm performance.
                      Stage I                                   Stage II
                       Final                                 Initial      Final                     Total
        Platform                  Time, s.    Platform                                  Time, s.
                      residue                               residue      residue                   time, s.
                                            CPU (alglib) 0.0137885 0.0130473             1446        1492
          GPU
                     0.0215154      46          CPU       0.0133772 0.0131325            1099        1145
         (FP32)
                                                GPU       0.0134315 0.0130635             81         127

modal group velocities for a single point. Level 1 sub-algorithm executes on the CPU, while level 2-4 sub-
algorithms execute on the GPU (Table 2). A typical residue calculation for a single search space point requires
calculation of modal group velocities for thousands of frequencies. Thus, data parallelism exposed in this way
should be enough to eﬃciently exploit modern GPUs.
   Traditionally, geoacoustics computations are performed in double precision ﬂoating point arithmetics (FP64).
However, modern consumer-grade GPUs suﬀer a great (for some devices, up to 32 times [Corporation, 2017])
drop in performance when using FP64 instructions instead of their single-precision (FP32) counterparts. This
fact led us to investigate the consequences of changing the algorithm to FP32. We considered the model problem
of geoacoustic inversion with 3 parameters (R, ρb , cb ) (see Section 3). The results are presented in Table 3. All
experiments were conducted on the Intel Core i7 930 CPU and the Nvidia GTX 1050 GPU, with the use of CUDA
8.0 SDK and GCC 5.4.0 compiler. The NVCC compiler was forced to produce native Compute Capability 6.1
code. Both NVCC and GCC compilers were run with -O3 optimization ﬂag.
   The most computationally intensive part of our algorithm is the bisection sub-algorithm at level no. 4 in
Table 2. We implemented it as described in [Demmel, 1997]) and [Corporation, 2017], producing single- and
double-precision versions, algorithmically identical on both GPU and CPU platforms. For reference, we provide
the double-precision version based on the AlgLib library [Bochkanov & Bystritsky, 2016], that we used in our
previous work [Zaikin et al., press]. The AlgLib library includes a state-of-the-art implementation of the bisection
algorithm, thoroughly tuned to produce the most accurate results possible with the modern CPUs ﬂoating point
units (FPUs). Our CPU-based implementation of bisection algorithm can not boast such accuracy. However, it
is notably faster than AlgLib due to its simplicity. That is why, our and AlgLib-based algorithm’s results are
diﬀerent. The discrepancy between the outputs (residue) of the same algorithm on the CPU and GPU is the
result of the diﬀerent implementations of ﬂoating-point units on these platforms.
   Table 3 clearly indicates the superiority of the GPU performance, even in FP64. The FP32 mode of the GPU
is 150 times faster than the FP64 mode, though the former’s ﬁnal residue (0.0215154) is much worse than the
latter one’s (0.0131284). However, recomputation of this ﬁnal point in FP64 gives the result (0.0134315, see
Table 4) that is on par with that of the FP64 CPU implementation.
   This leads us to suggest the mixed-mode, two-stage algorithm, that uses the ”coarse” FP32 computation mode
during Stage I, and then reﬁnes its result in Stage II by resuming the search in the ”ﬁne” FP64 computation
mode. The performance of this algorithm is presented in Table 4. For reference, the ”Initial residue” column of
Table 4 shows residue of the best point found by Stage I recomputed by various Stage II algorithms. It is easy to
see that the GPU-based FP32 algorithm for Stage I is best complemented for Stage II by the GPU-based FP64
algorithm. The total runtime of this mixed-mode algorithm is 127 seconds. This is faster than runtime of the
pure GPU-based FP64 algorithm (777 seconds), and its ﬁnal residue (0.0130635) is almost equal to that of the
pure CPU-based FP64 (alglib [Bochkanov & Bystritsky, 2016]) algorithm (0.0130471). The coordinates of the
points found in our experiments are available in Table 5.
   The FP64 performance of our GPU-based algorithm is only 2,5 times lower than the performance of our
FP32 algorithm on the same GPU (see Table 3). However, according to [Corporation, 2017] this performance
drop should be at least 4x times for the GTX1050 GPU used in our experiments. This means that the GPU


                                                        98
                   Table 5: Coordinates of the points found by the examined search algorithms.
                    Platform         Precision           Residue      R         ρb         cb
                                              Single-Stage algorithm
                   CPU alglib          FP64             0.0128206 7016.07 1.74473 1736.05
                                       FP64             0.0132451 7020.62 1.68634 1734.21
                      CPU
                                       FP32             0.0209403 7018.21 1.44446 1742.41
                                       FP64             0.0131659 7019.37 1.70663 1735.22
                      GPU
                                       FP32             0.0215154 7022.99 1.71911 1720.55
                                               Two-Stage algorithm
                   GPU (FP32) ->CPU alglib (FP64) 0.0130473 7015.68 1.70976 1734.97
                      GPU (FP32) ->CPU (FP64)           0.0131325 7020.25 1.71881 1731.43
                      GPU (FP32) ->GPU (FP64)           0.0130635 7019.3 1.76232 1730.56
performance is likely to be bottlenecked by suboptimal multiprocessor occupancy [Corporation, 2017] and/or
excessive memory transfers, rather than the FP64 FPUs of the device. The higher GPU performance we perceived
in our previous experiments with the parallelization scheme based on the simultaneous computing of residues
for many search space points supports this theory. This means that a thorough optimization of the GPU code
could increase the GPU performance of our geoacoustic inversion algorithm at least 2-3 times.
   The source code of our application is available online1 .

5     Conclusions
In the present paper, we suggested a modiﬁcation of a coordinate descent algorithm. This algorithm was
implemented in two versions. The ﬁrst one is designed for launching on a CPU. In the second one, the algorithm
itself works on a CPU, while the calculation of the objective function is performed on a GPU. Both versions
were applied for solving one synthetic problem of geoacoustic inversion. Being little less accurate (in the sense
of found solutions), the GPU-based version turned out to be much faster, than the CPU one.

Acknowledgements
This study was partially supported by the Council for Grants of the President of the Russian Federation (grants
No. MK-2262.2017.5, No. NSh-8081.2016.9, No. NSh-8860.2016.1), the Russian Foundation for Basic research
(grants No. 16-05-01074 a, No. 16-07-00155 a and No. 17-07-00510 a), Presidium of RAS programs I.33, I.5, and
the POI FEB RAS Program “Nonlinear dynamical processes in the ocean and atmosphere”.

References
[Barkalov & Gergel, 2016] Barkalov, K. and Gergel, V. (2016). Parallel global optimization on GPU. Journal of
  Global Optimization, 66(1):3–20.

[Bochkanov & Bystritsky, 2016] Bochkanov, S. and Bystritsky, V. (2016). Alglib - a cross-platform numerical
  analysis and data processing library. ALGLIB Project. Novgorod, Russia.

[Bonnel & Chapman, 2011] Bonnel, J. and Chapman, N. R. (2011). Geoacoustic inversion in a dispersive waveg-
  uide using warping operators. The Journal of the Acoustical Society of America, 130(2):EL101–EL107.

[Bonnel et al., 2013] Bonnel, J., Dosso, S. E., and Chapman, N. R. (2013). Bayesian geoacoustic inversion of
  single hydrophone light bulb data using warping dispersion analysis. The Journal of the Acoustical Society of
  America, 134(1):120–130.

[Bonnel et al., 2012] Bonnel, J., Gervaise, C., Nicolas, B., and Mars, J. I. (2012). Single-receiver geoacoustic
  inversion using modal reversal. The Journal of the Acoustical Society of America, 131(1):119–128.

[Bulavintsev, 2015] Bulavintsev, V. (2015). An evaluation of CPU vs. GPU performance of some combinatorial
  algorithms for cryptoanalysis. Vestnik Yuzhno-Ural’skogo Gosudarstvennogo Universiteta. Seriya “Vychislitel-
  naya Matematika i Informatika”, 4(3):67–84.
    1 https://github.com/mposypkin/acouwater/tree/f gpu


                                                          99
[Corporation, 2017] Corporation, N. (2017). Cuda c programming guide.

[Demmel, 1997] Demmel, J. W. (1997). Applied numerical linear algebra. SIAM.
[Evtushenko et al., 2016] Evtushenko, Y. G., Lurie, S. A., Posypkin, M. A., and Solyaev, Y. O. (2016). Ap-
  plication of optimization methods for ﬁnding equilibrium states of two-dimensional crystals. Computational
  Mathematics and Mathematical Physics, 56(12):2001–2010.

[Gornov et al., 2016] Gornov, A. Y., Zarodnyuk, T. S., Finkelstein, E. A., and Anikin, A. S. (2016). The method
  of uniform monotonous approximation of the reachable set border for a controllable system. Journal of Global
  Optimization, 66(1):53–64.
[Hooke & Jeeves, 1961] Hooke, R. and Jeeves, T. A. (1961). “ direct search” solution of numerical and statistical
  problems. J. ACM, 8(2):212–229.
[Jensen et al., 2011] Jensen, F. B., Kuperman, W. A., Porter, M. B., and Schmidt, H. (2011). Computational
   ocean acoustics. Springer, New-York et al.
[Katsnelson et al., 2012] Katsnelson, B. G., Petnikov, V. G., and Lynch, J. (2012). Fundamentals of Shallow
  Water Acoustics. Springer US, New-York et al.
[Petrov, 2014] Petrov, P. S. (2014). A method for single-hydrophone geoacoustic inversion based on the modal
  group velocities estimation: Application to a waveguide with inhomogeneous bottom relief. In Proceedings of
  the International Conference Days on Diﬀraction 2014, pages 186–191.

[Zaikin & Petrov, 2016] Zaikin, O. and Petrov, P. (2016). Algorithm of reconstruction of the sound speed proﬁle
  in a shallow-water geoacoustic waveguide from modal dispersion data. Optoelectronics, Instrumentation and
  Data Processing, 52(3):259–265.
[Zaikin & Petrov, 2017] Zaikin, O. and Petrov, P. (2017). Application of iterative hill climbing to the sound
  speed proﬁle inversion in underwater acoustics. In Proc. of The Fifth International Workshop on Mathematical
  Models and their Application, volume 173 of IOP Conf. Series: Materials Science and Engineering, pages 1–8.

[Zaikin et al., press] Zaikin, O., Petrov, P., Posypkin, M., Bulavintsev, V., and Kurochkin, I. (in press). Us-
  ing volunteer computing for sound speed proﬁle estimation in underwater acoustics. In Proc. of The Third
  International conference on BOINC: Fundamental and Applied Science and Technology.


                                                      100

</pre>