<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>OPENMP COMPUTING OF A REFERENCE SOLUTION FOR COUPLED LORENZ SYSTEM ON [0,400]</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>I. G. Hristov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>R. D. Hristova</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>S. N. Dimova</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>P. R. Armyanov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>N. G. Shegunov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>I. V. Puzynin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>T. P. Puzynina</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Z. A. Sharipov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Z. K. Tukhliev</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Ivan Hristov</institution>
          ,
          <addr-line>Radoslava Hristova, Stefka Dimova, Petar Armyanov, Nikolay Shegunov, Igor Puzynin, Taisiya Puzynina, Zarif Sharipov, Zafar Tukhliev</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Joint Institute for Nuclear Research, MLIT</institution>
          ,
          <addr-line>Dubna</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Sofia University “St. Kliment Ohridski”, FMI</institution>
          ,
          <addr-line>Sofia</addr-line>
          ,
          <country country="BG">Bulgaria</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <fpage>5</fpage>
      <lpage>9</lpage>
      <abstract>
        <p>Obtaining a long term reference trajectory on the chaotic attractor for the coupled Lorenz system is a difficult task, due to the sensitive dependence on the initial conditions. Using the standard doubleprecision floating point arithmetic, we cannot obtain a reference solution longer than 2.5 time units. Combining OpenMP parallel technology together with GMP library (GNU multiple precision library), we parallelize the Taylor series algorithm for the coupled Lorenz system and obtain a reference solution in the rather long time interval - [0,400]. We performed two large computations, each using one CPU-node (32 cores), based on two Intel® Haswell processors. First computation was with 2158 decimal digits of precision and 2480-th order method, and second computation was for verification with 2254 decimal digits of precision and 2580-th order method. The needed time for the second (larger) computation was 6.3 days. The parallel speedup when using these 32 cores is 23.1 with parallel efficiency 72.1 %.</p>
      </abstract>
      <kwd-group>
        <kwd>Variable step-size Taylor series method</kwd>
        <kwd>Clean numerical simulation</kwd>
        <kwd>Multiple precision</kwd>
        <kwd>OpenMP parallel technology</kwd>
        <kwd>Coupled Lorenz system</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>(1)</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        To compute a reliable long term trajectory for a chaotic dynamic system, we need a multiple
precision floating point arithmetic library in order to deal with the sensitive dependence on the initial
conditions. This is not enough, we also need a numerical method that steps efficiently at the level of
the high precision, i.e. we need a method that allows arbitrary high order of accuracy. In this paper we
use a numerical procedure called “Clean numerical simulation” to obtain verified long term
trajectories for chaotic dynamical systems [1]. The procedure is based on the multiple precision Taylor
series method [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In the case of very long time of integration, the computational problem can become
pretty large and we need a parallelization of the algorithm. Although in our recent work [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] we
reported a general MPI+OpenMP parallelization for the classical Lorenz system, pure OpenMP
parallelization has its own importance and deserves special attention. In this paper we combine
OpenMP parallel technology together with GMP library [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] to parallelize the Taylor series method and
compute a reliable trajectory on the chaotic attractor for the more difficult coupled Lorenz system. The
work can be regarded as a continuation and improvement of the results in our previous work [5].
      </p>
    </sec>
    <sec id="sec-3">
      <title>2. The model problem – coupled Lorenz system</title>
      <p>
        We consider as a model problem the coupled Lorenz system from [
        <xref ref-type="bibr" rid="ref5">6</xref>
        ]:
=  ( −  ),
=    −  –  −    ,
= 
− 
=  ( −  ),
=  (  −  –  ) +    ,

= 
− 
where a = 10, b = 8/3, c = 10, rs = 28, rf = 45,   = 10-2,   = 10. The first three and last three equations
are called slow and fast dynamics respectively. The first three equations (without the last term in the
second equation) exactly represent the classical Lorenz system. For the above parameters the system
has a chaotic attractor, which means a sensitive dependence on the initial conditions. This dependence
is described by the relation  ( ) ~  (0)  , where λ is the maximum Lyapunov exponent and  ( ) is
the distance between two adjacent trajectories. Solving the above relation with respect to t, we obtain
the following relation for the predictability horizon (the Lyapunov time T):  ~ 1

). Here tol is
our tolerance and ε is the round-off unit (precision). For the coupled Lorenz system λ = 11.5. If we use
the standard double precision (ε = 2-53) and tolerance tol = 10-3, then T ~ 2.5. This means that with the
general double-precision arithmetic we cannot obtain a reliable solution longer than 2.5 time units. Let
us mention that for the classical Lorenz system λ = 0.905 and hence the coupled Lorenz system is
      </p>
      <p>(

much harder to simulate.</p>
    </sec>
    <sec id="sec-4">
      <title>3. Taylor series algorithm for coupled Lorenz system</title>
      <p>
        Let us denote the normalized i-th derivatives (the derivatives divided by i!) at point t with xi,
yi, zi, Xi, Yi, Zi. Then the N-th order Taylor series method with step-size  for system (1) is [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]:
 ( +  ) ≈  0 + ∑ =1     ,
      </p>
      <p>( +  ) ≈  0 + ∑ =1     ,
 ( +  ) ≈  0 + ∑ =1     ,
 ( +  ) ≈  0 + ∑ =1     ,
 ( +  ) ≈  0 + ∑ =1</p>
      <p>( +  ) ≈  0 + ∑ =1</p>
      <p>
        The normalized derivatives are computed with the so called automatic differentiation [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. If
we apply the Leibniz rule for the derivatives of the product of two functions to system (1), we obtain
the following procedure for computing them. For i = 0, … , N – 1 we compute:
  +1 =  +1
  +1 =  +1 
  +1 =
1
1
1
 +1
(∑ =0   −   −    )
 (∑
      </p>
      <p>=0   −   −    )
  +1 =  +11
 (  −   ),   +1 =  +11 (    −   − ∑</p>
      <p>These are the formulas for the automatic differentiation. After computing the normalized
derivatives (the Taylor coefficients) up to the N-th order, we evaluate the Taylor polynomials by
Horner’s rule. The pseudocode for the Taylor series algorithm for coupled Lorenz system is
straightforward:
while (time &lt; T) {
for (i = 0; i&lt;N; i++) {
s1=s2=s3=s4=s5=0.0;
for (j=0; j&lt;=i; j++) {
}
s1+= x[i-j]*z[j]; s2+= x[i-j]*y[j];
s3+= X[i-j]*Y[j]; s4+= X[i-j]*Z[j];
s5+= X[i-j]*y[j];
}
}
// Computing x[i+1], y[i+1], z[i+1],
// X[i+1], Y[i+1], Z[i+1] from formulas (2)
// Computing the optimal step-size τ
.................................................................
// One step forward with Horner's rule
// for the new x[0], y[0], z[0], X[0], Y[0], Z[0]
.................................................................
time+=tau;
(2)
(3)</p>
      <p>
        It is obvious that we need O(N2) floating point operations for computing the Taylor
coefficients and that parallel reduction for the sums in the inner loop is the crucial part of the
parallelization. To make the method more robust, we choose a simple variable step-size strategy taken
from [
        <xref ref-type="bibr" rid="ref6">7</xref>
        ]:
 = 0.9293 min {(
      </p>
      <p>1
||UN−1||∞)
1
N−1</p>
      <p>1
, (||UN||∞) }
1
N
where Ui = (xi, yi, zi, Xi, Yi, Zi ). This choice of  ensures both the convergence of the Taylor series,
and the minimization of the computational work per unit time.</p>
    </sec>
    <sec id="sec-5">
      <title>4. OpenMP parallelization of the algorithm</title>
      <p>
        OpenMP [
        <xref ref-type="bibr" rid="ref1">8</xref>
        ] has its own importance for the above algorithm, because: (I) OpenMP is simpler
than MPI since the communication between threads is realized by the shared memory and we do not
need to learn special libraries for packaging and unpackaging of multiple precision numbers. (II)
OpenMP is slightly faster than pure MPI, most likely because of the additional overhead for grouping
GMP data for MPI massages. (III) OpenMP uses less memory, since the algorithm does not allow
domain decomposition and the computational domain has to be multiplied by the number of MPI
processes. Although OpenMP has a build-in reduction clause, we cannot use it, because we use
userdefined types for multiple precision numbers and user-defined operations. Thus, we have to do the
reduction manually. To avoid false sharing, a padding strategy is applied for the shared containers for
the partial sums of the threads [
        <xref ref-type="bibr" rid="ref1">8</xref>
        ]. The main points of the parallelization are: (I) For given i make an
explicit parallel reduction for the sums in (2). (II) After computation of sums for given i, compute each
formula for xi+1, yi+1, zi+1, Xi+1, Yi+1, Zi+1 independently in parallel. (III) After computation of all the
derivatives up to the N-th, compute the variable step-size in a single section. (IV) At last use Horner’s
rule for evaluation of all 6 components of the solution independently in parallel. It is important to note
that our approach for parallelization can be simply applied to a large class of chaotic dynamical
systems, because we usually have formulas like (2), which comes from the automatic differentiation
rules. The sketch of the OpenMP code for one time step in terms of GMP library reads as follows:
#pragma omp parallel private(i,j,tid)
{
tid = omp_get_thread_num();
for (i = 0; i&lt;N; i++) { //N - the order of the method
# pragma omp for schedule (static)
for (j=0; j&lt;=i; j++) {
mpf_mul(tempv[pad*tid],x[i-j],z[j]);
mpf_add(sum[pad*tid],sum[pad*tid],tempv[pad*tid]);
// The same computations for the other 4 sums
}
// Explicit tree based parallel Reduction
......................................................
#pragma omp sections
{
// Computing x[i+1],y[i+1],z[i+1],X[i+1],Y[i+1],Z[i+1]
// independently in 6 parallel sections
}
.......................................................
      </p>
      <p>// Setting elements of the array "sum" to zero
}
#pragma omp single
{
}
#pragma omp sections
{
// Computing the variable step-size from (3)
// One step forward with the Horner's rule
// independently for each 6 components
}</p>
      <p>}</p>
    </sec>
    <sec id="sec-6">
      <title>5. Performance and numerical results</title>
      <p>The preparation of the parallel program and the many tests are performed in the HybriLIT
platform at MLIT, JINR [9] and in the Nestum cluster, Sofia, Bulgaria [10]. Following Shijun Liao
from [1], we first computed a priori estimations for the needed order N of the method and the needed
decimal digits of precision K. Let Tc be the practical Lyapunov time defined by the time at which the
Euclidean distance between two adjacent trajectories becomes more then 10-30. We computed the
following linear Tc-K and Tc-N dependencies. For fixed step-size 0.001: Tc=0.1961N-6,
Tc=0.1998K-6. For variable step-size: Tc=0.1734N-6, Tc=0.1998K-6 (the same as for the fixed
stepsize as expected). Using these estimations we computed a reference solution in the rather long time
interval [0, 400]. We took as initial conditions those from paper [11] in order to compare with the
benchmark table there up to time=100. We performed two large computations, each using one
CPUnode based on two Intel® Haswell processors (32 cores) in Nestum cluster: one computation with
K=2158 and N=2480 and a second computation for verification with K=2254 and N=2580. We obtain
numerically the following step-sizes for the second (larger) computation: min=0.001156,
max=0.008854, avg=0.002855. The estimated speedup for the serial program when using variable
instead of fixed step-size strategy is 2.855(0.1734/0.1961)2~2.23. The measured speedup is very close
to that - 2.20. Because the parallel efficiency with variable step-size is slightly better (lager order of
the method gives more computational work per step), the parallel program with variable step-size is
2.32 faster than the analogical fixed step-size parallel program. The needed time for the second
computation using one node (32 cores) in Nestum cluster is 6.3 days. The parallel speedup when using
these 32 cores is 23.1 with parallel efficiency 72.1%. As we compute the reference solution with some
reserve of the estimated N and K, we actually obtain the solution with some more correct digits. The
parallel program and the reference solution with 60 correct digits at every 10 time units can be seen in
[12].</p>
    </sec>
    <sec id="sec-7">
      <title>6.Conclusions</title>
      <p>OpenMP parallelization of the multiple precision Taylor series method with variable step-size
for the coupled Lorenz system is realized. A very good parallel efficiency for one CPU-node is
observed. An important observation is that the variable step-size not only makes the method more
robust and decreases the computational work, but also increases the parallel efficiency, compared to
the fixed step-size case. Our approach for parallelization can be simply applied to a large class of
chaotic dynamical systems. OpenMP has some advantages vice MPI, the most important of which is
its simplicity, and it should be the preferred choice in the case of using a moderately large
computational resource.</p>
    </sec>
    <sec id="sec-8">
      <title>7. Acknowledgements</title>
      <p>We greatly thank for the opportunity to use the computational resources of the HybriLIT
Platform and Nestum cluster. The work is supported by a grant of the Plenipotentiary Representative
of the Republic of Bulgaria at JINR, Dubna, Russia.
[12] https://github.com/rgoranova/coupledlorenz</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [8] [9] [1]
          <string-name>
            <surname>Liao</surname>
            ,
            <given-names>Shijun.</given-names>
          </string-name>
          <article-title>"On the reliability of computed chaotic solutions of non-linear differential equations."</article-title>
          <source>Tellus A: Dynamic Meteorology and Oceanography 61.4</source>
          (
          <year>2008</year>
          ):
          <fpage>550</fpage>
          -
          <lpage>564</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Barrio</surname>
            ,
            <given-names>Roberto.</given-names>
          </string-name>
          <article-title>"Performance of the Taylor series method for ODEs/DAEs."</article-title>
          <source>Applied Mathematics and Computation 163.2</source>
          (
          <year>2005</year>
          ):
          <fpage>525</fpage>
          -
          <lpage>545</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Hristov</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ivan</surname>
          </string-name>
          , et al.
          <article-title>"On the efficient parallel computing of long term reliable trajectories for the Lorenz system</article-title>
          .
          <source>" arXiv preprint arXiv:2101.06682</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4] https://gmplib.org
          <source>/ (accessed 02.08.21)</source>
          [5]
          <string-name>
            <surname>Dimova</surname>
          </string-name>
          , Stefka, Hristov, Ivan, et al.
          <article-title>"OpenMP parallelization of multiple precision Taylor series method." arXiv preprint arXiv:</article-title>
          <year>1908</year>
          .
          <volume>09301</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Boffetta</surname>
          </string-name>
          ,
          <string-name>
            <surname>Guido</surname>
          </string-name>
          , et al.
          <article-title>"An extension of the Lyapunov analysis for the predictability problem</article-title>
          .
          <source>" Journal of the Atmospheric Sciences 55.23</source>
          (
          <year>1998</year>
          ):
          <fpage>3409</fpage>
          -
          <lpage>3416</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Jorba</surname>
            , Angel, and
            <given-names>Maorong</given-names>
          </string-name>
          <string-name>
            <surname>Zou</surname>
          </string-name>
          .
          <article-title>"A software package for the numerical integration of ODEs by means of high-order Taylor methods."</article-title>
          <source>Experimental Mathematics 14.1</source>
          (
          <year>2005</year>
          ):
          <fpage>99</fpage>
          -
          <lpage>117</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Chapman</surname>
            , Barbara,
            <given-names>Gabriele</given-names>
          </string-name>
          <string-name>
            <surname>Jost</surname>
          </string-name>
          , and
          <string-name>
            <surname>Ruud Van Der Pas</surname>
          </string-name>
          .
          <article-title>Using OpenMP: portable shared memory parallel programming</article-title>
          . Vol.
          <volume>10</volume>
          . MIT press,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          http://hlit.jinr.
          <source>ru/ (accessed 02.08.21)</source>
          [10] http://hpc-lab.
          <source>sofiatech.bg/ (accessed 02.08</source>
          .21)
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>