Proceedings of the VIII International Conference "Distributed Computing and Grid-technologies in Science and
             Education" (GRID 2018), Dubna, Moscow region, Russia, September 10 - 14, 2018


               VIRTUAL TESTBED AS A CASE FOR BIG DATA
        A.B. Degtyarev 1,a, A.V. Bogdanov 1, V.V. Korkhov 1, I.G. Gankevich 1,
                           Yu.V. Pylnev 2, A.V. Eibozhenko 2
1
    Department of Computer Modeling and Multiprocessing Systems, St.Petersburg State University, 7/9
                        Universitetskaya nab., St. Petersburg, 199034, Russia
    2
        Engineering company «NEOTECH MARINE», 23 Zastavskaya str., St. Petersburg, 196084, Russia

                                       E-mail: a a.degtyarev@spbu.ru


Complex modeling of the behavior of marine objects under the influence of real external excitation is
the most important problem. At present, the accuracy of direct simulation of phenomena with known
physics is comparable to the accuracy of the results obtained during the model experiment in towing
tanks. Particularly relevant is the creation of such marine virtual testbed for full-featured simulators
and when testing the knowledge base of onboard intelligent systems. Such integrated environment is a
complex information object that combines the features of both the enterprise system and the high-
performance modeling tool. Integrated environment based on these basic principles is designed to
solve in real time the following problems: 1. Collection and analysis of information on the current
state of dynamic object (DO) and the environment, remote monitoring of the state of objects. 2.
Evaluation and coordination of joint actions of DOs, proceeding from current conditions, with the aim
of optimally common problem solving. 3. Centralized support for decision-making by operators of DO
control in non-standard situations, organization of information support for the interaction of decision-
makers in the conduct of ongoing operations. 4. Computer modeling of possible scenarios of situation
development with the aim of selecting the optimal control strategy. 5. Centralized control of technical
means. 6. Cataloging and accumulation of information in dynamic databases. Modern architecture of
computer systems (especially GP GPU) allows direct full-featured simulation of a marine object in
real time. Efficient mapping to a hybrid architecture allows even the ability to render ahead of time
under various scenarios. The report discusses general concept of high-performance virtual testbed
development and the experience of creating on their basis full-featured simulators.

Keywords: virtual testbed, Big Data, complex system, naval hydrodynamics

               © 2018 Alexandr B. Degtyarev, Alexander V. Bogdanov, Vladimir V. Korkhov, Ivan G. Gankevich,
                                                                     Yury V. Pylnev, Anatoly V. Eibozhenko


                                                                                                         58
Proceedings of the VIII International Conference "Distributed Computing and Grid-technologies in Science and
             Education" (GRID 2018), Dubna, Moscow region, Russia, September 10 - 14, 2018


1. Introduction
        Recently more and more technical cases are designed and modeled on computers and this
tendency seems to be dominating. Nevertheless, there are situations very important for humanity that
cannot be modeled properly and reliably yet. The most obvious cases are ships modeling and design
and atomic power plans security. We shall try to analyze this problem from the point of view recent
advances in fundamental informatics.
        The key item for the approach of the modeling is mathematical foundation. In most cases the
different pieces of event under consideration are described by different mathematical approaches. And
thus to make the mathematical model of our object is the separate problem. The matching of different
mathematical pieces is extremely important in production of the code. Very often the scale of the
events varies to such degree that asymptotical methods are of a great help [1, 2, 3]. In most cases we
have to deal with what is called “complex system simulation” [4].
        Usually for different parts of the objects under consideration software packages were
proposed. But they were produced by different approaches for different platforms and usually they are
optimized for different software. The process of unification of all the approaches is called
“consolidation” and it is the separate problem of itself [5, 6, 7].
        Even a consolidation is achieved the procedure of the modeling is a taught one due to different
of scales and necessity of memorizing large amount of data from one iteration to another. On this stage
we clam that this problem is most effectively solved by retrieving of some ideas of so called “Big Data
approach” [6]. More than that it seems, that the complex systems simulation is very effective case for
“Big Data” analysis.
        It is very important that this approach makes it possible to organize the visualization of
intermediate results in the process of computation. It is important not only for correction of
parameters, but it might be a key element and in modification of the model and consolidation of the
procedure.
        We shall try to illustrate main ideas of our approach on the case of ship modeling that is
realized in a project “virtual testbed” in St. Petersburg State University.


2. Mathematical models
         The concept of a virtual testbed is a single consideration of a technical object from its design
to disposal. This concept combines the use of all heterogeneous information about an object, methods
of storage and modeling, and the extraction and acquisition of new knowledge. General structure of
virtual testbed is shown on Figure 1. Here A is a block of models, B is a block of control and
interpretation, C is a block of dataware, D is a block of applied information technology. Here we have
different kinds of models: models for estimation of dynamic object, navigation and operation situation
(M1); control models (M2) including on-board systems models (M), models of dynamic object
elements/systems interaction (MU); block of planning (M 3), where MP are models of long-term and
operative planning, MS are control models for components of virtual testbed. In B block we have DS –
dialog system for virtual testbed control; LS are local control systems; SA is a block of scenarios; RB
is a block of practical recommendations. Dataware block represents in general a distributed database
of various components of a virtual testbed (DB), obtained through consolidation. Block D combines
virtual reality components (VR), mathematical components of information technology (soft
computing, multiagent systems, etc. – SC&MAC) and components of distributed computer
environment (Big Data, cloud computing, hybrid computer systems, etc. – GRID).
         Already now, IT allows to fully compare the real results of human activity with their virtual
counterpart. This means that it is possible to organize such an integrated system in which all the
accumulated data are linked to each other and used when necessary. For example, drawings of
technical object created during design process are used in the future to implement the simulator of the
object. The data obtained as a results of direct simulation are used to form knowledge base of on-board
intelligent system, etc. So we see that that this is problem of Big Data.


                                                                                                         59
Proceedings of the VIII International Conference "Distributed Computing and Grid-technologies in Science and
             Education" (GRID 2018), Dubna, Moscow region, Russia, September 10 - 14, 2018


                                                            А

                           MP1         M3        MPH                    M2           M1

                                                                 MU1           MU1   MA1
                        MP11
                                      …      MPH1


                                             …                   …      МФ     …     …
                       …       MP1V                 MPHW

                                                                 MUm           MUK   MAn

                               MS1    …      MSL


                                                    LS1
                       ЛПР            DS            …              SA          RB    B
                                                    LSP


                                       DB1    …            DBR           СУБД        C


                                      VR            SC&MAS              GRID
                                                                                     D
                                                                                           .
                                       Figure1. Structure of virtual testbed
         It is obvious that such approach, when the simulation results can be fully matched with the
behavior of a technical object in natural conditions, requires high accuracy and adequacy of the
models used. For example, for naval hydrodynamics problems we can apply approach of direct
computational experiment [8, 9].
         Algorithmic implementation in this case is based on computational schemes of mixed
Lagrangian and Eulerian approaches. This is expressed similar “large particles” and “final volume”
methods in the double integration of first order equations of motions. Thus, the time cycle of
computing experiment is divided into three conditional stages:
         1 stage – Kinematic parameters are calculated for the centers of large fluid particles. For this
purpose, the current source data at fixed nodes of Eulerian coordinates are used;
         2 stage – Lagrangian or large deformable fluid particles are involved in free motion. They
redistribute the internal properties of the original Euler cells to adjacent space;
         3 stage – Laws of conservation of mass and energy are consistent. This is achieved by
deformation of shifted fluid particles. The next step is reinterpolation of characteristics of current in
initial nodes of the fixed Euler computational mesh.
         The described algorithmic approach, mainly because of the proposed splitting of the solution
on physical processes, gives the chance of explicit numerical schemes application at the first two
stages. In this case, it is possible to substantially increase effectiveness of computing procedures
through:
1. Natural parallelization of computation process;
2. Possibilities of adaptive correction of mesh area if needed;
3. Dynamic reconstruction of solution in accordance with fluid currents transformations in time.
         Another approach [10] for sea wind waves simulation is based on fast mathematical models
that are well suited for numerical calculations. Such kind models are not based on rigorous physical
principles, as in the previous case. However, if preliminary studies provide an opportunity to prove
their physical adequacy [11,12], they are also suitable for use as part of virtual testbed. However, such
models are often based on an extensive statistical generalization of natural data. Therefore, they
require careful preparation of the initial information, which is reflected in the parameters of the model
used. In the considered problem of sea waves modeling, in order to adequately reproduce external
influences, all conditions of wave formation are taken into account: geographical area, season, and the


                                                                                                         60
Proceedings of the VIII International Conference "Distributed Computing and Grid-technologies in Science and
             Education" (GRID 2018), Dubna, Moscow region, Russia, September 10 - 14, 2018


alternation of storms characteristic of a given area. The advantage of such models, due to their
simplicity, is the possibility of a relatively simple increase in their performance [10, 13, 14].


3. Consolidation technology
         Thus, a virtual testbed is designed to solve complex modeling problems and work with big
amount of data. So the main aspects of virtual testbed are the following:
         • Computing machinery – hardware;
         • Uniform information environment – GRID, middleware;
         • Program repository – libraries;
         • System integration – principles of testbed operation;
         • Concept of real time systems.
         Grid management can be known as the distributed database management of the problem of
data from a different point of view. We based our work on the object of oriented database management
systems and its particular objectivity [15]. We compared the approach and find out the common for
most efficient Data Grid that can manage the large of data stored in object-oriented databases. We
used a reliable basis for such an effort the procedures and Data Grids are new in the Database research
community after our test result and we know about the identifying the characteristics and requirements
of Data Grids and how they can be met in the most efficient way.
         Optimizing data is replication and access to the data over the wide area network (WAN), this
is not addressed sufficiently in database research. Database Management System (DBMS) is normally
one of the accessing data methods. For instance, a data server sends data to a client. For the Data Grid,
such a single access method may be not optimal. By using an ODBMS, some restrictions are pointed
to given some possible solution.
         The Globus project provides tools for the Grid computing like a job scheduling and also
working on Data Grid effort, that enables fast and efficient file transfer, a replica catalogue for
managing files and some of the more replica management functionality is based on the more file. In
the Grid community is a general tendency to deal with replication at the file level, such as a single file
is the lowest granularity of replication [4]. It is the advantage of the structure file does not need to be
known as the replication manager that is responsible for replicating files from one site to other sites
over the Wide Area Network (WAN).
         Middleware are all the services and applications necessary for efficient management of the
data sets and files within the data grid, while the providing of users quick access to the data and files.
Data access services are working hand in hand with the data transfer services to provide security
service, access controls for management and any data transfers within the data grid. Security services
provide mechanisms for authentication of users to ensure that properly identified. Using the password
is one of the security services for authentication. Authorization services are the mechanisms that can
control the user to be able to access after being identified after the authentication process.
         Research can be combined from the both of communities by introducing a replication
middleware layer that manage replication files by taking into the account of each site in a Data Grid,
this can be managed data locally with a database management system. The middleware is responsible
for site-to-site replication and synchronization in DBMS transactions on local data. In Grids, there are
so many tools for monitoring applications and network parameters, which can be used for filling the
gap. Hybrid solution is the replication middleware that has more restrictions for update
synchronization and transparent access to data between database and Grid research. There will also be
a performance of replica synchronization. However, the replication middleware is to provide several
relaxations of the concept of transparent data access and data consistency. In this paper we assume that
such a replication middleware is used for Data Grid.
         Our project is carried out at the Saint Petersburg State University research center and it can be
divided into 4 modules as followed:
         • Module Network
         • Random Dynamic process
         • Data grids replacement
         • Data Secure

                                                                                                         61
Proceedings of the VIII International Conference "Distributed Computing and Grid-technologies in Science and
             Education" (GRID 2018), Dubna, Moscow region, Russia, September 10 - 14, 2018


        The Clients server and Main servers are operated over a computer network on hardware
separate in our project. A server machine is running one or more server programs with a high
performance and sharing resources for clients. A client also shares any data from resources; clients
need to initiate communication sessions with servers, which await incoming requests. The delivery
packet is to the destination at a node in order to minimize the probability, that packets are
eavesdropped over a specific link of a randomization process for packet deliveries. In this process, the
previous next hop for the source node is identified in the first step of the process. That process can
random pick up a neigh boring node and the current packet transmission with the next hop. The
exclusion for the next hop is a selection that avoids transmitting two packets of consecutive data.
        We combined the data partition schemes with dynamic replication to achieve data security,
and access performance in data grids processing. The partition data need to be properly allocated to
achieve the actual performance benefits in replication process. Our project design can provide these
following advantages:
        • Data can be secured
        • Enable the sharing of coordinated data from various resources and provides various
            services with distributed and data intensive computing
        • Replication techniques are frequently used to improve the data and reducing of client
            response times and communication
        • Single point can’t accept in this system


4. Some ideas of Big Data Approach
        As it was mentioned in previous sections of the article in considered case we are dealing with
both data processing, and complex computations. In the first case, we should consider the data as a
whole, in the second case, data is exchanged between different branches. We need to combine these
components. At the same time, in the process of modeling, for example, it must be done at least
twice – in preprocessing and postprocessing. As a result, we get a huge amount of heterogeneous data
that change the original state of knowledge. With this data and knowledge it is necessary to work
differently under different conditions.
        So within the framework of virtual testbed we that Big Data are different as it was proposed in
[16]. And for every kind of situations we must to have different tool. In [16] new definition of “Big
Data” is characterized by the situation when the conditions for implementing the CAP theorem are
relevant. The CAP theorem [17] is a heuristic statement that in any realization of distributed
computations, it is impossible to provide the following three properties: Consistency, Availability and
Partition Tolerance. We define different kinds of Big Data as an appropriate combinations of C, A and
P on different stages of computation. Problems and features of implementations of virtual testbed fully
confirm this approach.

5. Numerical example
        One of the fundamental virtual testbed's functions in considered case is to simulate ship
motion under the impact of ocean waves, and mathematical formulae and numerical methods used for
that purpose are especially favorable to implement in a programme that runs on a GPU. These
implementations use
        • linear memory access patterns which are needed to coalesce memory loads and stores (i.e.
             help vectorise the code for GPU),
        • large number of floating point calculations including transcendental mathematical
             functions which are slow to execute on a CPU, and
        • geometrical transformations (rotation and translation) which are built-in functions in a
             GPU, but not in a CPU.
        It is possible to implement every numerical solver in Virtual testbed to run on a GPU to
eliminate costly data transfer of large multidimensional arrays and vector fields between CPU and
GPU memory altogether. This is one of the goals of the project, but it is not achieved yet: only the


                                                                                                         62
Proceedings of the VIII International Conference "Distributed Computing and Grid-technologies in Science and
             Education" (GRID 2018), Dubna, Moscow region, Russia, September 10 - 14, 2018


most demanding numerical solvers are rewritten for GPU to speed-up the programme. One of these
solvers is presented in the following paragraphs.
        The feature that distinguishes Virtual testbed from similar ship motion simulation programmes
is the use autoregressive moving average (ARMA) model for wavy surface generation. Unlike linear
models which represent wavy surface as weighted sum of cosines (which in turn allows to generate
velocity potential directly), this model represents each point of the surface as weighted sum of
previous in time and space points. In order to compute velocity potential, we derived a formula that
works for any discretely given wavy surface and gives the same velocity potential field for linear
models as the traditional approach. In order to compute wave pressure force acting on a ship hull we
decompose the hull into triangular panels and add all individual forces acting on each panel.
        ARMA model was found to be slow on GPU, but very fast on a CPU due to non-linear
memory access pattern, however, the new velocity potential formula allowed us to use fast Fourier
transforms to implement it efficiently on a GPU. Wave pressure computation involves geometrical
transformations, large number of floating point calculations which made it easy to rewrite for GPU.
These optimisations resulted in tenfold speed-up for velocity potential and fivefold speed-up for wave
pressure on a computer with AMD FX-8370 CPU and GeForce GTX 1060 6GB GPU.


                            Figure 2. Example of naval virtual testbed realization


6. Acknowledgement
       The work was supported by St.Petersburg State University (project id 26520170) and partly by
Russian Foundation for Basic Research (RFBR), grants #17-29-04288, #16-07-00886.


7. Conclusion
        It is clear that Big Data approach although is not the only one but is of a great help of large
complex problems. It is an especially important to have a possibility of visualizing of behavior of
technical objects on intermediate stages of calculations. That makes it possible to make a correction
both to the model and parameters chosen without additional spending of computer time. We hope that
very soon it will be possible to show the realistic technical objects in true natural environment.


                                                                                                         63
Proceedings of the VIII International Conference "Distributed Computing and Grid-technologies in Science and
             Education" (GRID 2018), Dubna, Moscow region, Russia, September 10 - 14, 2018


References
[1] Van Dyke M., Perturbation methods in fluid mechanics. NY, Academic Press, 1964, 229p.
[2] Nayfel Ali H., Perturbation Methods. WILEY-VCH, 2004, 425p.
[3] Morse P.M., Feshbach H. Methods of Theoretical Physics. NY, McGRAW-HILL BOOK
COMPANY, 1953.
[4] Bogdanov, A., Degtyarev, A., Korkhov, V. New approach to the simulation of complex systems. //
EPJ Web of Conferences, 2016, vol.108, 01002.
[5] Bogdanov, A., Degtyarev, A., Korkhov, V. Desktop supercomputer: what can it do? // Physics of
particles and nuclei letters, vol.14, is.7, 2017, pp.985-992.
[6] Gankevich, I., Gaiduchok, V., Korkhov, V., Degtyarev, A., Bogdanov, A. Middleware for big data
processing: test results. // Physics of particles and nuclei letters, vol.14, is.7, 2017, pp.1001-1007.
[7] Bogdanov, A., Degtyarev, A., Korkhov, V., Gaiduchok, V., Gankevich, I. Virtual supercomputer
as basis of scientific computing. In book: Horizons in Computer Science Research, vol.11, 2015,
pp.159-198.
[8] Bogdanov, A., Khramushin, V. Tensor Arithmetic, Geometric and Mathematic Principles of Fluid
Mechanics in Implementation of Direct Computational Experiments. // EPJ Web of Conferences,
2016, vol.108, 02013
[9] Degtyarev, A., Khramushin, V. Coordinate systems, numerical objects and algorithmic operations
of computational experiment in fluid mechanics // EPJ Web of Conferences, 2016, vol.108, 02018.
[10] Degtyarev, A., Gankevich, I. Simulation of Standing and Propagating Sea Waves with Three-
Dimensional ARMA Model. In book: The Ocean in Motion: Circulation, Waves, Polar Oceanography.
ed. by M.G.Velarde et al., Springer, 2018, pp.249-278
[11] Degtyarev, A.B. New approach to wave weather scenarios modeling. In book: Fluid Mechanics
and its Applications, vol.97, 2011, pp.599-617.
[12] Boukhanovsky, A., Rozhkov, V., Degtyarev, A. Peculiarities of computer simulation and
statistical representation of time–spatial metocean fields. // LNCS, vol.2073, 2001, pp.463-472.
[13] Bogdanov, A.V., Degtyarev, A.B., Khramushin, V.N. High performance computations on hybrid
systems: will “grand challenges” be solved? // Computer research and modeling, vol.7, is.3, 2015,
pp.429-437 (in Russian).
[14] Degtyarev, A. ; Gankevich, I. Hydrodynamic Pressure Computation under Real Sea Surface on
Basis of Autoregressive Model of Irregular Waves. // Physics of particles and nuclei letters, vol.12,
is.3, 2015, pp. 389-391.
[15] Bogdanov A., Thurein Kyaw Lwin, Stankova E. Storage Database System in the Cloud Data
Processing on the Base of Consolidation Technology. // LNCS, vol.9158, 2015, pp 311-320.
[16] Bogdanov, A., et al. Big Data as the future of information technology. // Book of abstracts 8 th Int.
Conf. Distributed Computing and GRID-technologies in Science and Education, Dubna, 2018, p.25
[17] Brewer, E.A. Towards robust distributed systems. // Proceedings of the XIX annual ACM
symposium on Principles of distributed computing. — Portland, OR: ACM, 2000. — vol. 19, no. 7


                                                                                                         64