Intelligent High-Performance Computing for Big Data Processing in
                 Fiber Optical Measuring Networks

                        Elena V. Zakasovskaya¹, Valentin S. Tarasov 1,2 , Nadezhda I. Denisova³

     ¹Vladivostok State University of Economics and Service, Vladivostok, Russia, elena.zakasovskaya@vvsu.ru
                   ²Far Eastern Federal University, Vladivostok, Russia, valentin.tarasov@vvsu.ru
                 ³Saint Petersburg University, St Petersburg, Russia, denisovanadezda0@gmail.com


                                                         Abstract
                       The paper deals with the problem of reconstructing the parameters of
                       physical fields using distributed information and measurement systems
                       for cases of incomplete laying of measurement lines. High-performance
                       computing is typically used for solving advanced problems and
                       performing research activities through computer modeling. The rise of
                       Big Data has changed the entire perspective of data and data handling.
                       Ever growing analytical needs for Big Data can be satisfied with
                       extremely high-performance computing models. A new combined
                       algorithm is presented, which is concluded in the “optimization of the
                       geometry” of the measuring network with a view to further applying the
                       complex of neural networks. The possibility of choosing and using the
                       appropriate neural network from a complex of several pre-trained.

1        Introduction
    Computerization of almost all areas of modern life has a great impact on a human, on the majority of his activities
and it contributes to development of information technology. In accordance with the modern tendency in the
development of measuring instruments, in case of large amount of collected and processed information it is necessary
to use not a lot of measuring instruments, but rather complex devices such as information and measuring systems [1].
    Information measuring systems (IMS) is used to solve a wide range of applied problems however the main purpose
is to provide continuous monitoring of large-scale and spatially inhomogeneous multidimensional physical fields [2].
    What is important in the work of the IMS is the process of collecting information, on which the type of measuring
network depends. Topology of the communication system depends on the choice of network technology and as a
consequence the scope of application, types of input signals, types of measurements and functional properties of
components. Examples of IMS with fundamentally different network topology are:
    •     Fiber-optic measuring networks (FOMN) based on fiber-optic information and measuring networks;
    •     Information-measuring systems based on wireless sensor networks (WSN).
    Present day science intensive production cannot do without constant monitoring and control over the behavior
dynamics of parameters range of distributed physical fields (PFs). The distributed information-measuring systems are
called upon to solve this problem.
    Information and measuring systems based on wireless sensor networks have great potential. This sort of low-
power communication devices can be deployed over the entire area of almost any physical space, ensuring continuous
monitoring of physical phenomena in real time, processing and transferring collected information, and coordinating
actions with other nodes of the network. However, it is impossible to fully deploy intelligent distributed IMS using
wireless technologies in critical infrastructures, primarily due to the lack of network technologies that meet
information security requirements [3].
    FOMN are of greatest interest. These systems have various topologies and organization, and are constructed, for
example, on fiber-optical element base [1]. One of the fundamental parts of the distributed fiber optical measuring
system (DFOMS) is distributed fiber optical measuring network responsible for collecting measuring information
regarding the PF parameters under study [2].


    Copyright © 2019 for the individual papers by the papers' authors. Copyright © 2019 for the volume as a collection by its
editors. This volume and its papers are published under the Creative Commons License Attribution 4.0 International (CC BY 4.0).
   In: Sergey I. Smagin, Alexander A. Zatsarinnyy (eds.): V International Conference Information Technologies and High-
Performance Computing (ITHPC-2019), Khabarovsk, Russia, 16-19 Sep, 2019, published at http://ceur-ws.org

                                                              30
               Organization of Effective Work of High-Performance Computing Systems
______________________________________________________________________________________________

   FOMN represents a set of fiber-optical measuring lines (MLs) [1, 2] stacked in accordance with a certain setup on
the surface studied. Thus, reconstructing distributed PFs’ parameters against characteristics of optical radiation
passing through FOMN assigns this mathematical problem to tomography [4]. To restore distributed function of PFs
by means of FOMN, MLs were stacked along 2-4 directions.
   In case of full data, in other words, with sufficient quality projections on all 180 degrees of angular range, high
quality reconstructions are known to be obtained [4-6]. For comparison, to receive quality images in industry
tomography, the necessary number of directions is p = 102-103.
   Standard analytical methods are unacceptable for a fiber-optical tomography as direct application of inverse
operator does not provide a unique stable solution. It is characteristic of any low angle problem in tomography. So,
there is a good reason to consider other algorithms, perhaps, their synthesis as well. Restoring PFs’ functions by using
FOMN can be broken into several steps: sampling, receiving and processing projection data, and back projecting.
   The existing success and great prospects for the development of information and measuring systems are largely
due to the fact that sensor networks built on the same principles can be used in completely different areas of human
activity. However, wireless sensors have a number of limitations that have a negative impact on the provision of
information security in the transmission of data within and outside the network.

2        Notifications and Standard Definitions
   Let's f (x1, x2) is the function of distributed PFs’ parameter on a planar surface  Throughout this paper we will
assume that f is infinitely differentiable and has a compact support. The 2D Radon transform  maps a density
function f as its line integrals. The objective of tomography is to produce an accurate image of an object interior based
on a finite number of scanned views.
   Mathematically, the problem is to reconstruct f, given the measurements of g(, s) on . The short-term objective
will be focused on a comprehensive description of a projection function g = f.
   Let index i determine an i-th direction of scanning i, and index j determine the samples sij in the selected i-th
direction. In this case a pair of indexes (i, j) corresponds to a straight line Lij along which the area is scanned. Then a
projection value along the straight line Lij can be written as
                                                         g ij =  ij f =  f ( x1 , x 2 ) dl ,                                  (1)
                                                                           Lij


where  is the Radon transform of function f, and dl is a gain along the straight line Lij. Pairs of numbers (i, sij)
determine the parallel setup of scanning on a plane.
   Let's break the area of research S R2 into smaller sites, so as
                                                                                 N
                                                                    S =          k =1
                                                                                          Sk .

   We consider the function f constant in each cell Sk and equal to fk, the symbol f also denotes the matrix
corresponding to this partition:
                          f1             f2            fm                                                                       (2)
                                                            
                        f               fm+2           f2m 
              f =                                              F = ( f1                                            f nm ) .
                          m +1                                                                                            Т
                                                                                     f2          fm   f m +1   f2m
                                                            
                                                           
                   f n ( m −1) +1   f n ( m −1) + 2   f nm 

   Elementary cells Sk are referred to as image elements. Let’s assume function f as constant in each site Sk and equal
to fk. Symbol f also denotes a matrix corresponding to this decomposition. Then the integrated equations (1) are
transformed into a system of linear algebraic equations whose matrix forms looks as following:
                                                                      AF = G                                                    (3)


3        Optimization of FOMN Geometry
    The specificity of the fiber-optic tomography tasks is the presence of an FOMN ultra-small survey data acquisition
scheme. As a rule, in such FOMN, the number of measuring lines is less than the number of monitored areas. SLAE
is underdetermined here (3).
    Due to the fact that the input data has a large dimension, it is necessary to perform processing that allows you to
select the most significant parameters by reducing the number of free variables in SLAE (3).
    In this context, FOMN optimization consists in deleting rows and columns along the edges of the matrix f (in the
“trimming” of matrix f), the sum of the elements of which has values equal to zero. Knowing the size of the matrix f
and the values of the column of the projection data, one can always check whether the matrix f has such a row or
column. Then the rows and columns with this feature are removed from the matrix f. Next, the matrix itself (2) and
the column of projection data are modified.

                                                                          31
               Organization of Effective Work of High-Performance Computing Systems
______________________________________________________________________________________________

   As a result of the Trimming Algorithm, a new matrix f of size n '× m' is formed, with n'≤n, m'≤m. Thus, when
executing the algorithm described above, candidate areas in which the required “objects” are located are selected.


                                                          (
                                                    -0.5 ( x -6 ) + ( y -11)
                                                                  2            2
                                                                                   )                (
                                                                                               -0.5 ( x -16 ) + ( y -7 )
                                                                                                             2             2
                                                                                                                               )
    Figure 1: a) investigated function z ( x , y ) = e         +e              , b) and c) projection graphs, d) the
                         subdomain obtained as a result of applying the “trimming” procedure.

  Let us give an example of applying the above method to a specific distribution of a parameter of a physical field,
which in analytical form is given by a function:
                                                              (
                                                          -0.5 ( x -6 ) + ( y -11)
                                                                      2                2
                                                                                           )            (
                                                                                                    -0.5 ( x -16 ) + ( y -7 )
                                                                                                                  2                2
                                                                                                                                       )
                                         z ( x, y ) = e                                        +e

   Figure 1 shows projection data for two mutually perpendicular scanning directions. From the graphs it can be seen
that the first and last values of the projection data in both cases have quite a lot of zero values and, therefore, it is
advisable to do a preprocessing procedure in the form of trimming the area at the edges.
   As a result of applying the FOMN Trimming Algorithm, the measured measurement network with dimensions
n×m=30×30 is transformed into a network with dimensions n '× m' =7×11 (Fig. 1).
   After the Trimming Algorithm described above for an area (matrix f, respectively), you can apply both regular
recovery procedures, such as FBP [4], ART [7], and special algorithms developed by the authors UQC [8-9] , as well
as neural network algorithms to restore the functions under study [8, 10].
   After the recovery of functions for the trimmed n '× m' region is completed, a procedure for restoring the original
n×m dimensions are performed using a list of the surface layer containing information on deleted rows and columns
of the matrix f.

4        Complex of Neural Networks
   High-performance computing technology focuses on developing parallel processing algorithms and systems by
incorporating both administration and parallel computational techniques.
   The next item of projection processing is the neural network processing of projection data obtained as a result of
optimization of FOMN geometry:

                                                NN n , m 
                                                        1   1
                                                                
                                                               
                                                              
                                          SN =  NN ni , mi   NN ni , mi .
                                                                
                                                                
                                                                
                                                   NN n K , m K 
                                                               
   Let FOMN, after applying the procedure described in clause 2, have dimensions n '× m', n'≤n, m'≤m. In general,
the sizes n and m should decrease (n'≤n, m'≤m). This happens in most cases, because the spatial frequency b = π
imposes restrictions on the size of the objects under study. In extreme cases, you will have to use the neural network
for the entire area.

                                                                          32
               Organization of Effective Work of High-Performance Computing Systems
______________________________________________________________________________________________

   It is not known in advance what sizes will be there. Therefore, the question naturally arises: what exactly is the
size of the neural network to use?
   The answer to the question posed is contained in the approach proposed in this paper. It consists in the following:
   1. We will train in parallel (independently of each other) several neural networks of different sizes.
   Denote by NN (ni, mi) a neural network of size ni × mi, i.e. neural network, which is intended for FOMN
processing of the appropriate size.
   Through SN denote we set of all K pre-trained neural networks of the form NN (ni, mi):

                               SN =    NN ( n , m ) ,  , NN ( n , m ) , ..., NN ( n , m )  ,
                                                1    1                          i       i                               K        K
                                                                                                                                          (4)
                               n1    ni    n K , m1    m i    m K ,
                                   ( ni , m i )  ( n j , m j ) ; i  j , 1  i , j  K .                                                 (5)

   2. For processing the projection data of the n '× m', n'≤n, m'≤m posting from FOMN, we choose in the SN set of
the form (4) a neural network of a suitable size, i.e. NN (ni, mi), for which
                                         ni −1  n '  ni , mi −1  m '  mi , 1  i , j  K .                                            (6)
   From conditions (5), (6), it obviously follows that the neural network NN (ni, mi) is a network of the smallest
dimension, with which it is possible to process the projection data of a measuring network of size n '× m'.

5        Using RBF Networks
   In the work, neural networks of radial basis type are used in the work of neural networks. Earlier in the article [10],
the authors have already investigated the possibility of using radial basis neural networks (RBFNN).
   The information generated by the network, represented by vector G, was a set of topographic data for which the
neural network must reconstruct the vector F. Thus, the neural network must perform the transformation F = A-1(G),
having previously been trained on a set of training pairs {(G, F)}.
   To create a training page, the author used Reinforcement method of selecting training pairs in which pairs of the
form (Gi, Fi) were considered, where AFi = Gi.
   When creating RBFNN training pairs in [10], Gaussian-type functions were used, and the parameters were selected
as lattice points of the corresponding scanning scheme and Gaussian pairs.
   For example, for the 5x5 field, a training page was created, consisting of 3325 training pairs, on which the RBFNN
network was trained. It was experimentally shown that the constructed network makes it possible to restore the
functions of the spatial distribution of the studied physical quantity with an error at a single point of no more than 2%,
and has good predictive capabilities. However, it was noted that with this method of recovery in high-dimensional
tasks using FOMN, there are serious difficulties in training the network due to the very large amount of training
pages. Therefore, it became necessary to search for optimal paths when using neural networks.
   One of the ways to optimize the processing of information is the use of a set of previously-trained neural networks
of various dimensions.
   The choice of reference functions should depend on the width of the spectrum b of the function f (x, y) under study.
Functions of a Gaussian type
                                                                            (
                                                                        − ai ( x − ci ) + ( y − bi )
                                                                                        2              2
                                                                                                           )
                                                    z ( x, y ) = e                                                                      (7)
can be used here, since they take non-zero values only in the zone around a certain center.
    To analyze the neural network method for solving the problem using RBFNN, this work considered the
tomographic task of restoring the functions of the FP according to the information coming from the information-
measuring system of size 30x30.
    It was assumed that the reference effect on the field has the form of a smooth function with a limited effective
width of the spectrum b equal to the conditional spectral unit p. It is considered that all values of the function are non-
negative and normalized.
    In this work, the same three types of reference distributions of a physical quantity are used as in [10]. The first and
second types refer to the regular method, and the third refers to the random method. We describe them in more
details.
    Type I. The reference field distributions in this case are single Gaussians of the form (7), whose centers are located
at the nodes of the measuring network. It was found that the optimal parameters for learning are ai parameters, which
take values.
    Type II. Analytically, these functions can be represented as:
                                                                 (
                                                                       ) (          )                          (   ) + ( y - b2 ) 
                                                                          2        2                              2            2
                                                             - a1  x - c1 + y - b1             - a2  x -c2                             (8)
                                            z ( x, y ) = e                         
                                                                                            +e        


provided that the carriers are at least 2π / b. These are Gaussian couples with non-intersecting carriers.


                                                                        33
               Organization of Effective Work of High-Performance Computing Systems
______________________________________________________________________________________________

   Type III. Reference distributions of this type were obtained using a randomization process with normalization.
Each integer random set a1, а2, b1, c1, b2, c2= 1, N was assigned a function of the form (8). Each vector before
inclusion in the training page was normalized.

6         Numerical Modeling
   To analyze the neural network method of solving the problem using the RBFNN complex, the information-
measuring network 30x30 was considered.
   In Table 1 for each neural network of a radial-basic type belonging to the set of SN, the following characteristics
are presented:
   - dimensions (ni, mi) correspond to the geometry of the measuring network, which is processed by a neural network
of radial basis type NN(ni, mi),
   - the total amount of the training page (TP) includes types I - III,
   - the average training time for the results of a series of several (from 10 to 15) computational experiments,
   - values of the normalized mean square error (MSE) across the entire training page,
   - the number of impacts recognized by the neural network NN(ni, mi),
   - predicting capabilities, i.e. recognition by a network of types of effects that do not belong to the training page,
   - the quality of training is averaged characteristic, associated, including with the presence of artifacts as a result of
insufficient amount of the training page.
   From the results in Table 1, it can be seen that as the size of the network grows, the quality of learning decreases.

                  Table 1: Characteristics of radial basis type neural networks belonging to the set of SN

               Size           Volume   Studying   Mse error throughout       Impact                   Predictive         Quality of
              (ni, mi)          TP        time           the TP              number                   properties         education
    1.          3×3            1390      5 sec          2×10-27                3                          +                high
    2.          5×5            3325      2 min         2,8×10-16               4                          +                high
    3.          7×7            4850     3-4 min        1,3×10-27               4                          +                high
    4.        10×10            8500     22 min         1,8×10-27               3                          +               medium
    5.        15×15            9102     22 min.        3,5×10-27               3                          ±               medium
    6.        20×20            9264    30 мmin.        3,4×10-29               3                          ±               medium
    7.        30×30            9000     25 min.           10-25                2                          –                low

         Table 2. The results of data processing by the RBFNN complex for the reference functions of the form (7)

                           ai          n' × m'            Size in %            Time of processing                         MSE error
         1.               0,5           3×3                  1%                     0,0554                                 1,9×10-4
         2.               0,1           5×5                2,77 %                   0,0823                                  0,0062
         3.              0,05           7×7                5,44 %                   0,0879                                  0,0102

              Table 3. Results of data processing by the RBFNN complex for reference functions of the form (8)

    Distance between centers Gaussian                 n' x m'                Size in %                                MSE error
                         4                              3×5                     1,6 %                                  1,9×10-4
                         5                              4×5                     2,2 %                                 1,87×10-4
                         11                            7×13                     10%                                     0,0181
                         20                           14×14                     22 %                                    0,0017
                         34                           25×25                    69,5 %                                   0,0024

    From the above results, it follows that processing with the help of Trimming Algorithm an area with subsequent
collective processing by neural networks yields a large gain in accuracy. This is explained by the localization of the
site of impact on the network and processing using a neural network, as a rule, of a lower dimension, which is trained
more efficiently and quickly. At the same time, the error of the standard deviation for the elements from the training
page drops from 15 to 20 times.
                                                                                    (
                                                                               − 0.5 ( x − 6 ) + ( y − 19 )
                                                                                             2                2
                                                                                                                  )
   Figure 2 shows the results of processing the species exposure z ( x , y ) = e         .
   After localization of the impact, a 3 × 3 size similar region was obtained, which is processed by a well-trained
neural network NN (3.3). Finally, the original dimensions of the area were restored (Fig. 3b). In fig. 3c shows the
result of the restoration of the function under study using a neural network of maximum size NN (30.30). The quality


                                                            34
               Organization of Effective Work of High-Performance Computing Systems
______________________________________________________________________________________________

of training NN (30,30) is low, which explains the appearance of artifacts even with the restoration of a single impact
on the measuring network.
   Tables 2 and 3 present the results of the work of the proposed method for the reference effects on the measuring
network of single effects (7) and double effects (8), respectively.


                                                        − 0.5 ( ( x − 6 ) + ( y − 19 ) )
                                                                   2     2

    Figure 2: a) investigated function, z ( x , y ) = e                                  b) the result of recovery using the complex of neural
                               networks SN, c) the result of restoration using NN (30, 30)

7         Conclusion
   Information driven economy relies on the actionable insights extracted from data analytics. The era of data
revolution has paved way to the need of convergence of paradigms like High Performance Computing and Big Data
processing. The amalgamation of these paradigms is a herculean task involving various aspects like data management
and computing efficiency. This has given rise to evolution of the data storage technologies and computing models.
   The article presents a new combined projection data processing algorithm for reconstructing information received
from fiber-optic measurement lines distributed by FOMN.
   This algorithm consists in the sequential execution of two processes:
   1. Pre-processing of measurement information by localizing the locations of impact on FOMN,
   2. Application of a complex of neural networks for processing measuring systems of various geometries.
   From the above results it follows that:
   1. Processing using the area cropping procedure with subsequent collective processing by neural networks gives a
gain in accuracy largely due to the localization of the site of impact on the network and processing using the neural
network, as a rule, of lower dimensionality, which is trained more qualitatively and quickly. At the same time, the
value of the mse error for the elements from the training page drops 15 to 20 times.
   2. Reducing the mse error and shortening the processing time mainly depends on how radically the computational
process has been optimized as a result of the preprocessing and on the complexity of the function being restored.

References
1. Kulchin, Yu.N.: Distributive Fiber Optical Measuring System. Fizmatlit, Moscow, 272 p. (2001)

2. Kulchin, Yu.N., Vitrik, O.B., Kirichenko, O.V., Petrov, Yu.S.: Multidimensional signal processing by using fiber
        optic distributed measuring network // Quantum Electronics, Vol.20, No.5, 711-714 (1995)

3. Zakasovskaya E.V., Tarasov V.S., Glushchenko A.A.: Information security issues in the distributed information
        measurement system // ICIEAM, S.-Petersburg, Russia. May 16-19 (2017)

4. Natterer, F.: Mathematics of Computerized Tomography, John Wiley & Sons Ltd., N. Y., 288 p. (1986)

5. Herman, G. T.: Projections-Based Image Reconstruction. In: “Basics of Reconstructive Tomography”, Moscow,
       Mir, 352 p. (1983) (in Russian).

6. Mel’nikov, V.I., Meshkov, S.V.: Theory of activated rate processes: Exact solution of the Kramers problem, J.
        Chem. Phys. 85:1018–1027 (1986)

7. Zakasovskaya E.V., Fadeev, V.V.: Restoration of point influences by the fiber-optical network in view of a priori
        information. SPIE Proc. APCOM, V. 6675 (2007)


                                                                       35
               Organization of Effective Work of High-Performance Computing Systems
______________________________________________________________________________________________

8. Zakasovskaya, E.V., Tarasov, V.S.: Optical fiber imaging based tomography reconstruction from limited data //
        Computer Methods in Applied Mechanics and Engineering. Vol. 328, pp. 542-543 (2018)

9. Kulchin, Yu.N., Zakasovskaya, E.V.: Artifacts suppression in limited data problem for parallel fiber optical
        measuring systems // Optical Memory & Neural Networks. – Vol. 18, № 3. – pp. 171-180 (2009)

10. Kulchin, Yu.N., Zakasovskaya, E.V.: Application of Radial Basis Function Neural Network for Information
        Processing in Fiber Optical Distributed Measuring Systems, Optical Memory & Neural Networks
        (Information Optics), Vol. 17, № 4, pp. 317-327. (2008)


                                                      36