<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Modeling of Non- stationary Processes on High-performance Computers</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Oleksandr Duchenko</string-name>
          <email>duchenkostud@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oleksandr Khimich</string-name>
          <email>khimich505@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alla Nesterenko</string-name>
          <email>alla.nest1958@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oleksandr Popov</string-name>
          <email>alex50popov@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Glushkov Institute of Cybernetics of National Academy of Sciences of Ukraine</institution>
          ,
          <addr-line>Academic Glushkov Avenue, 40, Kyiv, 03187</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Mathematical modeling is currently the main tool of studying objects, processes and phenomena of various natures. The resolution of applied problems often requires studying of mathematical models of different processes. In particular, mathematical modeling is essential for simulation of non-stationary processes when analyzing the strength of complex structures and heat transfer processes. Non-stationary problems of strength analysis of complex structures or their components arise when addressing issues such as extending the lifespan of structures, determining their reliability, and identifying the properties and behavior of complex structures subjected to combined loads (force, thermal). Nowadays, these problems are among the most resource-intensive problems in mathematical modeling. This is due to increasing demand on the quality of design solutions, the necessity of performing calculations for complex and unique structures, and the use of new structural materials etc. Improving the quality and efficiency of mathematical modeling is only possible through the use of fundamentally new and detailed models and by shifting from simulating individual elements to studying the object as a whole. The use of detailed computer models leads to a significant increase in the size of discrete models and the corresponding computational problems, which exceed the capabilities of modern personal computers and workstations. Therefore, in the mathematical modeling of non-stationary processes of various natures it is important to develop new methods and algorithms for parallel computing on high-performance heterogeneous computers, in particular those of hybrid architecture. To effectively utilize such high-performance systems, it is required to develop newgeneration computer algorithms and software. The development of algorithmic and software tools is a key direction in creating high-performance computational resources. This paper proposes a methodology for solving computational problems (Cauchy problems) of mathematical modeling of non-stationary processes on highperformance computers with parallel organization of calculation, using the example of mathematical modeling in continuum mechanics.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Mathematical modeling</kwd>
        <kwd>non-stationary processes</kwd>
        <kwd>high-performance computing</kwd>
        <kwd>Cauchy problems for systems of ordinary differential equations</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Nowadays, mathematical modeling is one of the main tools for researching objects, processes, and
phenomena of various natures. The need to study mathematical models of various processes often arises
when solving applied problems. Thus, the application of such mathematical models is essential for the
research, assessment and diagnostics of the stress-strain state of various structures such as buildings,
welded and other constructions. Non-stationary problems of strength analysis of complex structures or
their components arise when addressing issues such as extending the lifespan of structures, determining
their reliability, and identifying the properties and behavior of complex structures subjected to combined
loads (force, thermal). In addition, such models are used in the researching of thermal processes that occur
during welding of complex structures, various manufacturing processes, as well as in nanostructures to
optimize the cooling of electronic systems, etc.</p>
      <p>Solving computational problems related to the modeling of non-stationary processes in strength
analysis of complex structures and heat transfer processes requires significant computational resources.
This is due to increase in demand for the quality of design solutions, the need to perform calculations for
complex and unique structures, the use of new construction materials, etc. Ensuring the reliability of
computer simulation results is another resource-intensive factor, which depends on the accuracy of the
mathematical models used, as well as accuracy of the data provided, the size of the computational
problems etc. Improvement in quality and efficiency of mathematical modeling is possible only through
the use of new detailed models and shift from modeling separate stages of processes to modeling the entire
process as a whole. The use of detailed computer models leads to a significant increase in the size of
discrete (finite-dimensional) models and the corresponding computational problems.</p>
      <p>Considering these problems in such a setting leads to computational problems involving huge volumes
of data, for example, matrices with orders exceeding 10⁷, and increasing the adequacy of computer
modeling is usually accompanied by an exponential increase in research costs.</p>
      <p>Modern personal computers and workstations lack the resources necessary to implement these tasks.
Nowadays, the increase in computing performance is achieved through parallelization of calculations,
based on the use of heterogeneous computers with, in particular hybrid architecture.</p>
      <p>In computers with hybrid architecture, both MIMD and SIMD models of parallel computation are used,
and multi-core processors are complemented by co-processor accelerators. Nvidia and AMD, as leading
companies in high-performance hardware, have proposed the use of graphics processing units (GPUs,
graphics cards) as such accelerators.</p>
      <p>The use of heterogeneous computing systems is one of the promising directions in the development of
high-performance computing. A number of such hybrid computers of varying performance levels have
been created, ranging from personal hybrid supercomputers for local use to high-performance clusters. It
is necessary to develop next-generation computer algorithms and software to effectively utilize such
highperformance systems. The development of algorithmic and software tools is a priority in building
highperformance computational resources.</p>
      <p>This article proposes a methodology for solving computational problems (Cauchy problems) of
mathematical modeling of non-stationary processes on high-performance computers with parallel
organization of computation, using the example of mathematical modeling in the continuum mechanics.
2. Mathematical models of non-stationary processes</p>
      <p>Mathematically, the problem of calculating the stress-strain state (a dynamic problem in the theory of
elasticity), using the principle of virtual displacements, can be formulated as a variational problem [1]: find
a vector-function of displacements that, for any vector-function (any admissible
displacement), satisfies the integral identity
where is the boundary condition operator, is the specific heat capacity, is the thermal
conductivity of the material, is the volumetric power of the heat source.</p>
      <p>The main approach to solving such problems are numerical methods. To do this, the original problem in
an infinite-dimensional space is replaced with its finite-dimensional (discrete) analogue. Most often,
projection methods are used for this purpose, for example, the finite element method or finite difference
method [2, 3]. As a result of discretization, a Cauchy problem for a system of ordinary differential
equations (ODEs) is obtained.</p>
      <p>For a non-stationary strength analysis problem using the finite element method, a second-order system
of ODEs is obtained:
,
,</p>
      <p>For the heat transfer problem, using the finite element method or the finite difference method, we
obtain the following matrix Cauchy problem for a first-order system of ODEs.</p>
      <p>
        ,
,
Here, is the desired n-dimensional solution vector of the system of ODEs (e.g., displacement values
for (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) or temperature values for (
        <xref ref-type="bibr" rid="ref4">4</xref>
        )), is an n-dimensional right-hand side vector (e.g., loads or heat
source power), and , , are n×n matrices with sparse structure, where n is the dimension of
the systems of ODEs (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) or (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ).
      </p>
      <p>
        Discrete problems (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ), (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ) exhibit a number of features, in particular:
• high order of matrix dimensions in the discrete problems (up to tens of millions);
• the matrices have a sparse structure (e.g., banded, envelope, skyline, etc.);
• the elements of the matrices and vectors are computed with errors, caused by initial data
inaccuracies, discretization errors, and computational errors when evaluating these elements on a
computer.
      </p>
    </sec>
    <sec id="sec-2">
      <title>3. Solving Cauchy problems for systems of ODEs</title>
      <p>For the numerical solution of Cauchy problems for systems of ODEs, depending on their properties, a
variety of time integration methods exist. Direct integration methods are used, particularly Runge–Kutta
methods of various orders for first-order systems, and the Wilson-θ method of for second-order systems.
Additionally, for second-order systems, some software tools (see, for example, [2]) utilize the Fourier
method based on expanding the desired functions in terms of the structure’s natural vibration modes.</p>
      <p>Runge–Kutta methods are also used to solve second-order systems of ODEs by introducing auxiliary
functions . This transforms the system into a first-order system of ODEs of twice the size:
,</p>
      <p>Let’s consider solving the Cauchy problem for a first-order system of ODEs using the Runge–Kutta
method:
where is the desired vector-function solution.</p>
      <p>
        Among the Runge–Kutta methods, the fourth-order Runge–Kutta method is most commonly used, as it
provides the required accuracy with relatively low computational complexity. This method is implemented
over the time interval using the formulas:
(
        <xref ref-type="bibr" rid="ref4">4</xref>
        )
where
,
      </p>
      <p>,
,
i = 0,1,2,… the upper index is the point number, lower index is the component of the vector.</p>
      <p>Thus, the majority of arithmetic operations are performed to compute the right-hand side of the vector
function .</p>
      <p>
        The Wilson-θ method is used for the numerical solution of second-order systems of ODEs with initial
conditions (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ). To apply it, we rewrite the problem (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) as:
with initial conditions
,
      </p>
      <p>Here, U is the vector of the values of the function u at the nodes of the finite element mesh.</p>
      <p>In the Wilson-θ method, the time interval Т is divided into n equal subintervals . At each time
interval , і = 0,1,..,n-1, a linear change in acceleration is assumed (the second derivative
is a linear function of time).</p>
      <p>It is assumed that at the initial moment of time , the vectors , , are known. Before
starting integration over time, the integration step h is determined, required constants, the effective
stiffness matrix and its decomposition are computed.</p>
      <p>
        Then the integration of the system (
        <xref ref-type="bibr" rid="ref8">8</xref>
        )−(
        <xref ref-type="bibr" rid="ref9">9</xref>
        ) over time at point
the scheme:
- computation of the right-hand side of the system of linear algebraic equations
is performed according to
using
- using the
system
      </p>
      <p>;
- computation of the approximate solution
formulas:
decomposition of matrix
, computation of</p>
      <p>as a solution of the
, and its first and second derivatives using following</p>
      <p>Thus, this method consists of several sub-tasks requiring a significant number of arithmetic operations:
forming the matrix , its decomposition, and repetitive solving of N systems of equations of the
form .</p>
      <p>
        Another method used to solve the initial value problem (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) is the Fourier method. This method is based
on decomposition by the eigenvectors of the algebraic eigenvalue problem (AEVP):
      </p>
      <p>
        An approximate solution to problem (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) is represented as a linear combination of several eigenvectors
corresponding to the smallest eigenvalues of problem (10). That is, if
is a solution of AEVP (10), then assuming
(
        <xref ref-type="bibr" rid="ref7">7</xref>
        )
(
        <xref ref-type="bibr" rid="ref8">8</xref>
        )
(
        <xref ref-type="bibr" rid="ref9">9</xref>
        )
where
,
,
obtain (under certain assumptions regarding the damping matrix С [2]) a system that decomposes into
equations independent with respect
to :
      </p>
      <p>
        A significant contribution to the solution of problem (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) is usually provided by only about 20–30 terms
corresponding to the smallest eigenvalues. Thus, we obtain a system of second-order ordinary
differential equations that are mutually independent. The solution to this system can often be found
analytically, or it can be easily integrated using, for example, the fourth-order Runge–Kutta method. Since
such a problem exhibits inherent parallelism, it can be efficiently implemented on a parallel computer.
      </p>
      <p>The most computationally intensive part of this approach is solving the partial AEVP, which is best
done using subspace iteration methods [4].</p>
      <p>
        It is worth noting that both the Wilson θ-method and the Fourier method allow for multiple solutions
of the Cauchy problem for second-order system of ODEs (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) with different right-hand sides (i.e., different
loads) .
      </p>
    </sec>
    <sec id="sec-3">
      <title>4. Features of parallel algorithms for Cauchy problems for systems of ODEs</title>
      <p>For effective implementation of mathematical modeling, it is advisable to use high-performance
computing tools — both hardware and corresponding software. Modern high-performance computing
systems with parallel processing architectures — ranging from personal computers with a single multi-core
processor and multiple graphic accelerators to supercomputers with a massive number of processing units
of various architectures — utilize different models of parallel computation, dynamic computing
environment, network technologies, etc. [5].</p>
      <p>As the capabilities of modern computers continue to grow, approaches to designing parallel algorithms
are also evolving. These algorithms must take into account both the properties of the problem and the
architectural features of the computing resources, including memory structure of the processing units,
interconnections between them, synchronization of computations and data exchanges, and other relevant
factors.</p>
      <p>Depending on the method used to solve Cauchy problems for systems of ODEs, as mentioned above,
the computational load varies across different types of operations. In most algorithms, these are linear
algebra operations: matrix-vector and matrix-matrix operations, matrix decompositions, etc.</p>
      <p>For example, in the 4th-order Runge–Kutta method, the most computationally intensive operation in
terms of resource consumption and runtime is the multiplication of a sparse matrix by a vector. In the
Wilson-θ method, it is the decomposition of a symmetric positive-definite matrix and the repeated
solution of the corresponding system of linear algebraic equations. In the Fourier method — solving the
AEVP.</p>
      <p>As previously mentioned, the matrices involved in real-world computational problems often have very
large orders and sparse structures, such as banded or envelope etc [6]. These factors influence the selection
of methods, algorithms, and tools for solving the Cauchy problems for systems of ODEs. Therefore, in
order to choose the optimal parallel solution algorithm and reduce the required computational resources
(runtime, memory etc.), it is necessary to identify the data structure (both of the original matrix and, if
necessary, of the decomposition matrices). To enhance efficiency, the matrix structure needs to be
optimized if required. To recognize sparse data structures, this article recommends the use of neural
networks and machine learning. Structural regularization algorithms [6] allow transformation of arbitrary
matrix structures into one of the regular sparse forms (block-banded, block-skyline, block-diagonal with
framing, etc.). The solution algorithm is determined based on the resulting data structure.</p>
      <p>As mentioned above, solving Cauchy problems for systems of ODEs involves processing of extremely
large data volumes. It is possible to efficiently execute large volumes of homogeneous operations by using
block versions of methods and algorithms. Such approach consolidates a significant number of operations
into matrix-matrix or matrix-vector operations with dense blocks, which are obtained from structural
regularization of the data. To implement these operations, it is recommended to use software tools
provided by hardware developers, which are optimized for performing block matrix operations on the
corresponding processing units [7, 8].</p>
      <p>Important stages of development a parallel algorithm for hybrid systems include selection of an
efficient data representation method and consideration of the specific features of architecture of hybrid
computers. The increasing number of processors in parallel computers and the creation of new hybrid
heterogeneous architectures significantly increase communication overhead and reduce process efficiency.
Therefore, taking into account the computer architecture is a necessary condition in designing a parallel
algorithm.</p>
      <p>The data is distributed among processing units based on the choice of the parallel solution algorithm.
When designing algorithms for sparse matrix problems, it is crucial to choose the appropriate storage and
representation methods for non-zero elements. These methods are determined by the sparse matrix's
structure and the requirement of the solution algorithm. Data distribution and storage schemes are used to
ensure compact data representation, fast access and processing of large datasets, and minimization of data
exchange between processing units.</p>
      <p>The amount of calculations for each processing units in use should be approximately the same — this
will ensure uniform computational load (balancing) of the processing units. Additionally, subtask
distribution among processing units should aim to minimize the number of information dependencies
(communication interactions) among subtasks.</p>
      <p>Thus, to efficiently solve Cauchy problems for systems of ODEs arising in the mathematical modeling
of non-stationary processes, it is recommended to use intelligent software tools [4, 9].</p>
      <p>To summarize, the design of parallel algorithms for solving Cauchy problems for systems of ODEs
intended for implementation on computers of hybrid architecture, requires to follow the steps below:
• recognize sparse data structures of the matrices of the given problem and, if necessary, perform
their structural regularization;
• identify the architecture that is most efficient for solving the given problem;
• divide the problem into subtasks, identify information dependencies between them, and select the
appropriate parallelization environment;
• take into account the memory structure of the processing units to ensure high algorithm
performance;
• distribute data and computations across processing units to ensure balanced load on the computer's
computing elements.</p>
    </sec>
    <sec id="sec-4">
      <title>5. Conclusions</title>
      <p>The paper proposes the basic principles for efficient mathematical modeling of non-stationary
processes of various natures in a variable computational environment on modern high-performance hybrid
computer systems. During development of algorithmic and software tools for solving computational
problems (Cauchy problems for systems of ODEs) in the modeling of non-stationary processes it is
necessary to:
• consider the architecture and technical features of hardware and the specifics of the software tools
developed by hardware manufacturers;
• consider the mathematical properties of the problem being solved that ensure the reliability of
computational results;
• provide automatic selection of an efficient variable computational environment (a number and
types of processor devices, their interconnections, synchronization of computations and data
exchanges, memory types, etc.);
• utilize elements of artificial intelligence (in particular, artificial neural networks, machine learning,
etc.);
• utilize multi-precision arithmetic and variable-precision arithmetic;
• utilize block versions of corresponding algorithms which involve the allocation of subtasks with
large volumes of homogeneous arithmetic operations.</p>
      <p>It is recommended to use these principles to develop or modify algorithmic and software tools for
resolution of a wide range of problems of computer modeling of processes, phenomena and objects from
various subject areas on the latest high-performance computer systems with parallel organization of
calculations. In the future, it is advisable to develop algorithms using the NVIDIA-library NCCL (NVIDIA
Collective Communication Library) for collective communications, which easily integrates into modern
parallel programs. This library is designed for high efficient organization of data exchange between
graphic processing units, taking into account the topology of the computing system.</p>
    </sec>
    <sec id="sec-5">
      <title>Declaration on Generative AI</title>
      <p>The authors have not employed any Generative AI tools.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.P.</given-names>
            <surname>Timoshenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Goodier</given-names>
            , Theory of elasticity, M.,
            <surname>Nauka</surname>
          </string-name>
          ,
          <year>1975</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Gorodetsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. D.</given-names>
            <surname>Evzerov</surname>
          </string-name>
          , Computer models of structures, FACT, Kyiv,
          <year>2007</year>
          , 394 p.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>K. J.</given-names>
            <surname>Bathe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. L.</given-names>
            <surname>Wilson</surname>
          </string-name>
          ,
          <article-title>Numerical Methods in Finite Element Analysis, Prentice-Hall, Inc</article-title>
          . Englewood Cliffs, New Jersey,
          <year>1976</year>
          , 528 p.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.N.</given-names>
            <surname>Khimich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.N.</given-names>
            <surname>Molchanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A V.</given-names>
            <surname>Popov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.V.</given-names>
            <surname>Chistyakova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.F.</given-names>
            <surname>Yakovlev</surname>
          </string-name>
          ,
          <article-title>Parallel algorithms for the solving of computational mathematics problems</article-title>
          , Naukova Dumka, Kyiv,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <fpage>Top500</fpage>
          -
          <article-title>The list</article-title>
          . URL: http://www.top500.org/
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>George</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. W-H Liu</surname>
          </string-name>
          ,
          <source>Computer Solution of Large Sparse Positive Definite Systems</source>
          , Prentice-Hall, Inc. Englewood Cliffs, New Jersey,
          <year>1981</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Intel</given-names>
            <surname>® Math Kernel</surname>
          </string-name>
          <article-title>Library (Intel® MKL)</article-title>
          . URL: https://software.intel.com/en-us/intel-mkl
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>CUBLAS</surname>
          </string-name>
          <article-title>Library documentation</article-title>
          . URL: https://docs.nvidia.com/cuda/cublas/index.html
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>О.</given-names>
            <surname>М. Khimich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. V.</given-names>
            <surname>Chistjakova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. A.</given-names>
            <surname>Sidoruk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. S.</given-names>
            <surname>Ershov</surname>
          </string-name>
          ,
          <article-title>Adaptive computer technologies for solving problems of computational and applied mathematics</article-title>
          ,
          <source>in: Cybernetics and Systems Analysis</source>
          ,
          <year>2021</year>
          , volume
          <volume>57</volume>
          (
          <issue>6</issue>
          ), pp.
          <fpage>990</fpage>
          -
          <lpage>997</lpage>
          . https://doi.org/10.1007/s10559-021-00424-z.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>