<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>On the Efficient Parallel Computing of Long Term Reliable Trajectories for the Lorenz System</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ivan Hristov</string-name>
          <email>ivanh@fmi.uni-so</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Radoslava Hristova</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefka Dimova</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Peter Armyanov</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nikolay Shegunov</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Igor Puzynin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Taisia Puzynina</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zarif Sharipov</string-name>
          <email>zarif@jinr.ru</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zafar Tukhliev</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>JINR, Laboratory of Information Technologies</institution>
          ,
          <addr-line>Dubna</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Sofia University, Faculty of Mathematics and Informatics</institution>
          ,
          <country country="BG">Bulgaria</country>
        </aff>
      </contrib-group>
      <fpage>78</fpage>
      <lpage>88</lpage>
      <abstract>
        <p>In this work, we propose an efficient parallelization of multiple-precision Taylor series method with variable stepsize and fixed order. For given level of accuracy the optimal variable stepsize determines higher order of the method than in the case of optimal fixed stepsize. Although the used order of the method is greater than that in the case of fixed stepsize, and hence the computational work per step is greater, the reduced number of steps gives less overall work. In addition, the greater order of the method is beneficial in the sense that it increases the parallel efficiency. As a model problem, we use the paradigmatic Lorenz system. With 256 CPU cores in Nestum cluster, Sofia, Bulgaria, we succeed to obtain a correct reference solution in the rather long time interval - [0,11000]. To get this solution we perform two large computations: one computation with 4566 decimal digits of precision and 5240-th order method, and second computation for verification - with 4778 decimal digits of precision and 5490-th order method.</p>
      </abstract>
      <kwd-group>
        <kwd>Parallel Computing</kwd>
        <kwd>Multiple Precision</kwd>
        <kwd>Variable Stepsize Taylor Series Method</kwd>
        <kwd>Lorenz System</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Multiple precision Taylor series method is an affordable and very efficient
numerical method for integration of some classes of low dimensional dynamical
systems in the case of high precision demands [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The method gives a new
powerful tool for theoretical investigation of such systems.
      </p>
      <p>
        A numerical procedure for computing reliable trajectories of chaotic systems,
called Clean Numerical Simulation (CNS), is proposed by Shijun Liao in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and
applied for different systems [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The procedure is based on multiple
precision Taylor series method. The main concept for CNS is the critical predictable
time T , which is a kind of practical Lyapunov time. T is defined as the time for
c c
decoupling of two trajectories computed by two different numerical schemes. The
CNS works as follows. An optimal fixed stepsize is chosen. Then estimates of the
required order of the method N and the required precision (the number of exact
decimal digits K of the floating-point numbers) are obtained. The optimal order N
is estimated by computing the Tc – N dependence by means of the numerical
solutions for fixed large enough K. The estimate of K is obtained by computing the
T – K dependence by means of the numerical solutions for fixed large enough N.
c
This estimate of K is in fact an estimate for the Lyapunov exponent [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. For given
Tc the solution is then computed with the estimated N and K and after that one more
computation with higher N and K is performed for verification. The choice of N and
K ensures that the round-off error and the truncation error are of the same order.
      </p>
      <p>
        When very high precision and very long integration interval are needed, the
computational problem can become large. In this case, the parallelization of the
Taylor series method is an important task and needs to be carefully developed. The
first parallelization of CNS is reported in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and later improved in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. A pretty
long reference solution for the paradigmatic Lorenz system, namely in the time
interval [0,10000], obtained in about 9 days and 5 hours by using the computational
resource of 1200 CPU cores, is given in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. However, no details of the
parallelization process are given in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. In our recent work [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] we reported in details
a simple and efficient hybrid MPI + OpenMP parallelization of CNS for the Lorenz
system and tested it for the same parameters as those in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The results show very
good efficiency and very good parallel performance scalability of our program.
      </p>
      <p>
        This work can be regarded as a continuation of our previous work [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ],
where fixed stepsize is used. Here we make a modification of CNS with a variable
stepsize and fixed order following the simple approach given in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Although
the used order of the method is greater than that in the case of fixed stepsize, and
hence the computational work per step is greater, the reduced number of steps
gives less overall work. In addition, the greater order of the method is beneficial
in the sense that it increases the parallel efficiency. With 256 CPU cores in
Nestum cluster, Sofia, Bulgaria, we succeed to obtain a correct reference solution in
[0,11000] and in this way we improve the results from [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. To obtain this
solution we performed two large computations: one computation with 4566 decimal
digits of precision and 5240-th order method, and second computation for
verification – with 4778 decimal digits of precision and 5490-th order method for
verification. The computations lasted ≈ 9 days and 18 hours and ≈ 11 days and 7
hours, respectively. Let us note that the improvement of the numerical algorithm
does not change the parallelization strategy from our previous work [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], where
the parallelization process is explained in more details. The difference from the
previous parallel program is one additional OpenMP single section with
negligible computational work, which computes the optimal step.
      </p>
      <p>It is important to mention that although our test model is the classical Lorenz
system, the proposed parallelization strategy is rather general – it could be
applied as well to a large class of chaotic dynamical systems.</p>
    </sec>
    <sec id="sec-2">
      <title>Taylor series method and CNS for the Lorenz system</title>
      <p>
        We consider as a model problem the classical Lorenz system [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]:
where R = 28, σ = 10, b = 8/3 are the standard Salztman’s parameters. For these
parameters, the system is chaotic. Let us denote with xi, yi, zi, i = 0, ..., N the
normalized derivatives (the derivatives divided by i!) of the approximate solution at the
current time t. Then the N-th order Taylor series method for (1) with stepsize τ is:
      </p>
      <p>The i-th Taylor coefficients (the normalized derivatives) are computed as
follows. From system (1) we have</p>
      <p>By applying the Leibniz rule for the derivatives of the product of two
functions, we have the following recursive procedure for computing xi+1, yi+1, zi+1 for
i = 0, ..., N-1:</p>
      <p>
        To compute the i+1-st coefficient in the Taylor series we need all previous
coefficients from 0 to i. In fact, this algorithm for computing the coefficients of
the Taylor series is called automatic differentiation, or also algorithmic
differentiation [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. It is obvious that we need O(N2) floating point operations for
computing all coefficients. The subsequent evaluation of Taylor series with Horner’s rule
needs only O(N) operations.
      </p>
      <p>
        Let us now explain how we choose the stepsize τ. We use a variable stepsize
strategy, which makes the method much more robust then in the fixed stepsize
case. We use a simple strategy taken from [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], which ensures both the
convergence of the Taylor series and the minimization of the computational work per
unit time. If we denote the vector of the normalized derivatives of the solution
with Xi = (xi, yi, zi) and take a safety factor 0.993, then the stepsize τ is determined
by the last two terms of the Taylor expansions [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]:
(4)
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] the order of the method is determined by the local error tolerance.
However, we do not work explicitly with some local error tolerance and, we do
not use any explicit dependence between the local and the global error. Instead
of this, as in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], we compute an a priori estimate of the needed order of the
method for a reliable solution. As said before, the critical predictable time Tc is
defined as the time for decoupling of two trajectories computed by two different
numerical schemes (in this case – by different N). The solutions are computed
with large enough precision to ensure that the truncation error is the leading
one. As a criterion for decoupling time, we choose the time for establishing
only 30 correct digits. The obtained T – N dependencies for fixed stepsize τ
c
= 0.01 and variable stepsize are shown in Figure 1. As seen from this figure,
the computational work for one-step in the case of variable stepsize is ≈ 80%
greater than in the case of fixed stepsize – (2.98/2.22)2 ≈ 1.80. However, the
reduced number of steps gives less overall work. In addition, the greater order of
the method is beneficial in the sense that it increases the parallel efficiency. The
reason is that with increasing the order N of the method, the parallelizable part
of the work becomes relatively even larger than the serial part and the parallel
overhead part.
      </p>
      <p>Similarly, we compute an a priori estimate of the needed precision by means
of computing the Tc – K dependence. In this case, we compare the solutions for
different K and large enough N. We obtain the dependence Tc = 2.55K – 81, which
is the same, as expected, for fixed and for variable stepsize.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Parallelization of the algorithm</title>
      <p>
        The improvement of the numerical algorithm does not change the parallelization
strategy from our previous work [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], where the parallelization process is
explained in more details. However, as we will see, the variable stepsize not only
decreases the computational work for a given accuracy, but also gives a higher
parallel efficiency.
      </p>
      <p>
        Let us store the Taylor coefficients in the arrays x, y, z of lengths N+1. The
values of xi are stored in x[i], those of yi in y[i] and those of zi in z[i]. As explained
in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], the crucial decision for parallelization is to make a parallel reduction
for the two sums in (3). However, in order to reduce the remaining serial part
of the code and hence to improve the parallel speedup from the Amdal’s law,
we should utilize some limited, but important parallelism. We compute x[i+1],
y[i+1], z[i+1] in parallel. Moreover, we compute x[i+1] in advance, before
computing the sums in (3), when during the reduction process some of the
computational resource is free. In the same way we compute in advance Rx[i] – y[i] from
the formula for y[i + 1] and bz[i] from the formula for z[i + 1]. These
computations are taken in advance; because multiplication is much more expensive than
the other used operations, such as division by an integer number is not so
expensive. The three evaluations by Horner’s rule for the new x[0], y[0], z[0] are also
done in parallel.
      </p>
      <p>
        In this work we consider a hybrid MPI + OpenMP strategy [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], i.e.,
every MPI process creates a team of OpenMP threads. For multiple precision
floating-point arithmetic, we use GMP library (GNU Multiple Precision library)
[
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. The main reason to consider a hybrid strategy, rather than a pure MPI one,
is that OpenMP performs slightly better than MPI on one computational node.
For packing and unpacking of the GMP multiple precision types for the MPI
messages, we rely on the tiny MPIGMP library of Tomonori Kouya [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ],
[
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ].
      </p>
      <p>It is important to note that for our problem the pure OpenMP parallelization
has its own importance. First, the programming with OpenMP is easier, because it
avoids the usage of libraries like MPIGMP. Second, since the algorithm does not
allow domain decomposition, the memory needed for one computational node
is multiplied by the number of the MPI processes per that node, while OpenMP
needs only one copy of the computational domain and thus some memory is
saved.</p>
      <p>The sketch of our parallel program is given in Figure 2. Every thread gets
its id and stores it in tid and then the loop with index i is performed. Every MPI
process takes its portion – the first and the last index controlled by the process.
After that the directive #pragma omp for shares the work for the loop between
threads.</p>
      <p>
        Although OpenMP has a build-in reduction clause, we cannot use it,
because we use user-defined types for multiple precisions number and
user-defined operations. A manual reduction by applying a standard tree based parallel
reduction is done. We use containers for the partial sums of every thread and
these containers are shared. The containers are stored in the array sum. We have
in addition an array of temporary variables tempv for storing the intermediate
results of the multiplications. To avoid false sharing, a padding strategy is
applied [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. At the point where each process has computed its partial sums, we
perform MPI_ALLREDUCE between the master threads [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. It is useful to
regard MPI_ALLREDUCE as a continuation of the tree based reduction
process, which starts with the OpenMP reduction. Communications between
master threads are overlapped with some computations for x[i+1], y[i+1], z[i+1]
that can be taken before the computation of the sums in (3) is finished. When
the MPI_ALLREDUCE is finished, we compute in parallel the remaining
operations for x[i+1], y[i+1], z[i+1].
      </p>
      <p>
        In between the block which computes the Taylor coefficients and the block
which computes the new values of x[0], y[0], z[0] in parallel, we compute the
new optimal stepsize within an omp single section. While the block for
computing the Taylor coefficients is O(N2) and the block for evaluations of the
polynomials is O(N), this block is only O(1) and hence the work is negligible. Let us
note that the GMP library does not offer a power function for the computations
from formula (4). The good thing is that we do not need to compute the stepsize
with multiple precision and double precision is enough. Therefore, we use the C
standard library function pow in double precision. We do a normalization of the
large GMP floating point numbers in order to work in the range of the standard
double precision numbers. The C-code in terms of GMP library of our hybrid
MPI + OpenMP program can be downloaded from [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ].
      </p>
      <p>
        Let us mention that if one half of the OpenMP threads computes one of the
sums in (3) and the other half computes the other sum, one could also expect some
small performance benefit, because for the small indexes i the unused threads will
be less and the difference from the perfect load balance between threads will be
less. However, the last approach is not general because it strongly depends on the
number of sums for reduction (two in the particular case of the Lorenz system)
and the number of available threads.
#pragma omp parallel private(i,j,tid)
{
mpf_mul(tempv[pad*tid],x[i-j],z[j]);
mpf_add(sum[pad*tid],sum[pad*tid],tempv[pad*tid]);
mpf_mul(tempv[pad*tid],x[i-j],y[j]);
mpf_add(sum[pad*tid+1],sum[pad*tid+1],tempv[pad*tid]);
}
//Explicit OpenMP Parallel Reduction for log(p) additions
// The result of reduction is in sum[0] and sum[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
....................................................
#pragma omp barrier
....................................................
//MPI_ALLREDUCE for sum[0] and sum[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
//Communication is only between master threads
//Communication is overlapped with some computations
//for x[i+1],y[i+1],z[i+1] that can be taken in advance
.....................................................
#pragma omp barrier
.....................................................
// The rest computations
// for y[i+1],z[i+1] independently in parallel
...................................................
#pragma omp barrier
// Setting sum[pad*tid] and sum[pad*tid+1] to zero
...................................................
}
#pragma omp single
{
// Computing the optimal stepsize
// from x[N-1],y[N-1],z[N-1],x[N],y[N],z[N]
}
// One step forward with Horner’s rule
#pragma omp sections
{
// Computing the new x[0],y[0],z[0]
// independently in three parallel sections
}
}
      </p>
    </sec>
    <sec id="sec-4">
      <title>Computational resources. Performance and numerical results</title>
      <p>
        The preparation of the parallel program and the many tests are performed in
the Nestum Cluster, Sofia, Bulgaria [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] and in the HybriLIT Heterogeneous
Platform at the Laboratory of IT of JINR, Dubna, Russia [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. The large
computations for the reference solution in the time interval [0,11000] and the
presented results for the performance are from Nestum Cluster. Nestum is a
homogeneous HPC cluster based on two socket nodes. Each node consists of 2
x Intel(R) Xeon(R) Processor E5-2698v3 (Haswell-based processors) with 32
cores at 2.3 GHz. We have used Intel C++ compiler version 17.0, GMP library
version 6.2.0, OpenMPI version 3.1.2 and compiler optimization options -O3
-xhost.
      </p>
      <p>
        We use the same initial conditions as those in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], namely x(0) = -15.8, y(0)
= -17.48, z(0) = 35.64, in order to compare with the benchmark table in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. We
computed a reference solution in the rather long time interval [0,11000] and
repeated the benchmark table up to time 10000. Computing this table by two
different stepsize strategies is a good demonstration that Clean Numerical Simulation
(CNS) is a correct and valuable approach for computing reliable trajectories of
chaotic systems.
      </p>
      <p>We performed two large computations with 256 CPU cores (8 nodes in
Nestum). The first computation is with 4566 decimal digits of precision and 5240-th
order method (5% reserve from the a priori estimates). The second computation is
for verification – with 4778 decimal digits of precision and 5490-th order method
(10% reserve from the a priori estimates). The first computation lasted ≈ 9 days
and 18 hours and the second ≈ 11 days and 7 hours. The overall speedup with 256
cores for the first computation is 162.8, for the second – 164.6.</p>
      <p>By estimating the time needed for the same accuracy and with fixed
stepsize 0.01, we conclude that by applying variable stepsize strategy we have 2.1x
speedup. There are two reasons for this speedup – less overall work and increased
parallel efficiency. Although the work per step in the case of variable stepsize
increases by ≈ 80%, the average stepsize is ≈ 0.034 and thus the overall work is
≈ 53% from the work in the case of fixed stepsize 0.01. In addition, the parallel
efficiency increases from 55.5% up to 63.6% for the first computation and from
56.2% up to 64.3% for the second. This is because by increasing the order of the
method N, we increase the amount of the parallel work, which mitigates the
impact of the serial work and the parallel overhead work.</p>
      <p>
        As we computed the reference solution with some reserve of the estimated N
and K, we actually obtain the solution with some more correct digits. The
reference solution with 60 correct digits at every 100-time units can be seen in [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ].
The reference solution at t = 11000 is:
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>Parallelized version of multiple precision Taylor series method and particularly
the Clean Numerical Simulation should be used with a variable stepsize strategy
as a better alternative of the fixed stepsize one. An important observation is that
variable stepsize not only decreases the computational work for a given accuracy,
but also gives a higher parallel efficiency.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgement</title>
      <p>We thank for the opportunity to use the computational resources of the Nestum
cluster, Sofia, Bulgaria. We would like to give our special thanks to Dr. Stoyan
Pisov for his great help in using the Nestum cluster and Prof. Emanouil
Atanassov from IICT, BAS for valuable discussions and important remarks on the
parallelization process. We also thank the Laboratory of Information Technologies
of JINR, Dubna, Russia for the opportunity to use the computational resources
of the HybriLIT Heterogeneous Platform. The work is supported by a grant of
the Plenipotentiary Representative of the Republic of Bulgaria at JINR, Dubna,
Russia.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Barrio</surname>
          </string-name>
          , R.:
          <article-title>Performance of the Taylor series method for ODEs/DAEs</article-title>
          .
          <source>Applied Mathematics and Computation 163.2</source>
          ,
          <fpage>525</fpage>
          -
          <lpage>545</lpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Barrio</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , et al.:
          <article-title>Breaking the limits: the Taylor series method</article-title>
          .
          <source>Applied mathematics and computation 217.20</source>
          ,
          <fpage>7940</fpage>
          -
          <lpage>7954</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Liao</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>On the reliability of computed chaotic solutions of non-linear differential equations</article-title>
          .
          <source>Tellus A: Dynamic Meteorology and Oceanography 61.4</source>
          ,
          <fpage>550</fpage>
          -
          <lpage>564</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Liao</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>On the numerical simulation of propagation of micro-level inherent uncertainty for chaotic dynamic systems</article-title>
          .
          <source>Chaos, Solitons &amp; Fractals</source>
          <volume>47</volume>
          ,
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Liao</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>On the clean numerical simulation (CNS) of chaotic dynamic systems</article-title>
          .
          <source>Journal of Hydrodynamics, Ser. B 29.5</source>
          ,
          <fpage>729</fpage>
          -
          <lpage>747</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jing</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liao</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Over a thousand new periodic orbits of a planar three-body system with unequal masses</article-title>
          .
          <source>Publications of the Astronomical Society of Japan 70.4</source>
          ,
          <issue>64</issue>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>On the relation between reliable computation time, float-point precision and the Lyapunov exponent in chaotic systems</article-title>
          .
          <source>arXiv preprint arXiv:1410.4919</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          :
          <article-title>Computational uncertainty and the application of a high-performance multiple precision scheme to obtaining the correct reference solution of Lorenz equations</article-title>
          .
          <source>Numerical Algorithms 59.1</source>
          ,
          <fpage>147</fpage>
          -
          <lpage>159</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Clean numerical simulation for some chaotic systems using the parallel multiple-precision Taylor scheme</article-title>
          .
          <source>Chinese science bulletin 59.33</source>
          ,
          <fpage>4465</fpage>
          -
          <lpage>4472</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Liao</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>On the mathematically reliable long-term simulation of chaotic solutions of Lorenz equation in the interval</article-title>
          [
          <volume>0</volume>
          , 10000].
          <source>Science China Physics, Mechanics and Astronomy 57.2</source>
          ,
          <fpage>330</fpage>
          -
          <lpage>335</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Hristov</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          , et al.:
          <article-title>Parallelizing multiple precision Taylor series method for integrating the Lorenz system</article-title>
          .
          <source>arXiv preprint arXiv:2010</source>
          .
          <volume>14993</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Jorba</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zou</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>A software package for the numerical integration of ODEs by means of high-order Taylor methods</article-title>
          .
          <source>Experimental Mathematics 14.1</source>
          ,
          <fpage>99</fpage>
          -
          <lpage>117</lpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Lorenz</surname>
          </string-name>
          , E.:
          <article-title>Deterministic nonperiodic flow</article-title>
          .
          <source>Journal of the atmospheric sciences 20.2</source>
          ,
          <fpage>130</fpage>
          -
          <lpage>141</lpage>
          (
          <year>1963</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Moore</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Methods and applications of interval analysis</article-title>
          .
          <source>Society for Industrial and Applied Mathematics</source>
          (
          <year>1979</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Gropp</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          , et al.:
          <article-title>Using MPI: portable parallel programming with the message-passing interface</article-title>
          (Vol.
          <volume>1</volume>
          ). MIT press (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Chapman</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jost</surname>
          </string-name>
          , G.,
          <string-name>
            <surname>Van Der Pas</surname>
          </string-name>
          , R.:
          <article-title>Using OpenMP: portable shared memory parallel programming</article-title>
          (Vol.
          <volume>10</volume>
          ). MIT press (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17. GNU GMP library, https://gmplib.org/,
          <source>last accessed</source>
          <year>2021</year>
          /06/24
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Kouya</surname>
          </string-name>
          , T.: BNCpack, http://na-inet.jp/na/bnc/,
          <source>last accessed</source>
          <year>2021</year>
          /06/24
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Kouya</surname>
          </string-name>
          , T.:
          <article-title>A Brief Introduction to MPIGMP</article-title>
          &amp; MPIBNCpack
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Nikolaevskaya</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , et al.:
          <article-title>MPIBNCpack library</article-title>
          .
          <source>Studies in Computational Intelligence</source>
          <volume>397</volume>
          , pp.
          <fpage>123</fpage>
          -
          <lpage>134</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Kouya</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Performance Evaluation of Multiple Precision Numerical Computation using x86 64 Dualcore CPUs</article-title>
          .
          <source>FCS2005 Poster Session</source>
          (
          <year>2005</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22. Article source code, https://github.com/rgoranova/hpcvss, last accessed
          <year>2021</year>
          /06/24
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23. Nestum Home Page, http://hpc-lab.sofiatech.bg/,
          <source>last accessed</source>
          <year>2021</year>
          /06/24
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24. HybriLIT Home Page, http://hlit.jinr.ru/,
          <source>last accessed</source>
          <year>2021</year>
          /06/24
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>