<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>High-performance simulations of continuously variable transmission dynamics</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>S.G. Orlov</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>N.B. Melnikova</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yu.G. Ispolov</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>N.N. Shabrov</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National Research University ITMO</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Peter the Great St. Petersburg Polytechnic University</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <fpage>41</fpage>
      <lpage>48</lpage>
      <abstract>
        <p>The paper describes a parallel computational model simulating continuously variable transmission (CVT) dynamics. A specific feature of the CVT model is the combination of relatively small problem sizes (about 1000 unknowns) and high computational costs (up to several weeks of sequential code computations for a thirty seconds simulation period). The main source of computational complexity here is the calculation of non-linear contact forces acting between transmission parts at each step of the explicit time integration procedure. Below we analyze simulation workflow and runtime load distribution among the application modules. Based on the profiling data, we present a task parallel multithreaded implementation of the model over shared memory and discuss steps towards further parallelization.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The present paper focuses on the design and implementation of the high-performance
computational model of a continuously variable transmission (CVT) dynamics. CVT has been extensively used
in the automotive industry for last decades. Numerical simulations of CVT dynamics are commonly
employed in the engineering practice at the design stage. However, at the high level of models
complexity and realism, simulations become very time-consuming. Depending on the model complexity, it
can take from several weeks to several months of sequential code computations for a thirty seconds
model time period.</p>
      <p>
        There have been developed a number of CVT models with different levels of realism and
complexity. These include: (a) primitive low-dimension chain models [1]; (b) discrete multi-body
mechanical models [
        <xref ref-type="bibr" rid="ref2 ref3">2,3</xref>
        ]; (c) continuous FEM models [4]. An important requirement for chain models is the
ability to represent each pin and plate separately, hence low-dimension chain models similar to [1]
were not considered in our research. Our chain models differ from those described in [
        <xref ref-type="bibr" rid="ref2 ref3">2,3</xref>
        ] in that they
are more detailed and consider 3D motion of chain. Continuous FEM models are extremely
highweighted due to a large number of contact interactions in the system; their application is, in fact,
limited to detailed static simulations of local contact interactions between typical CVT parts [
        <xref ref-type="bibr" rid="ref5">4, 5</xref>
        ]
coupled to a multi-body system analysis [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>Our original CVT model has been specially designed in order to keep an optimal balance between
model realism and computational costs. Reliability of this model has been validated during more then
ten years of successful application in the real industrial environment. However, even this relatively
light-weighted model is quite time-consuming: for a time step h = 2,1∙10−8 sec, model time T = 1 sec,
serial mode simulation takes about 60 hours on CPU Intel G2130 @3.2 GHz.</p>
      <p>Parallel implementation of the CVT model has naturally become our next step in building a really
fast application. Our final goal is reducing typical computational times by 100 times, to get simulation
done within 30-40 minutes. At the current working stage, a task based OpenMP parallelization has
been performed for the most time-consuming portion of the code (see Sections 4-5 for details).</p>
      <p>The rest of the paper is organized as follows: Section 2 introduces the basic principles of CVT
work; Section 3 describes mathematical foundations of our CVT model, Section 4 considers numerical
implementation issues and serial performance analysis, Section 5 presents parallel benchmarking
results, and, finally, Section 6 summarises conclusions and future work directions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. CVT mechanics basics</title>
      <p>A schematic view of the CVT is presented in Figure 1. Two pairs of conic sheaves (pulleys) are
installed on two shafts (driving and driven) and connected with a steel chain. The shafts are on elastic
supports. One pulley in each pair is able to move along its axis, the other one is fixed. The axial
motion of the pulleys provides an infinite number of gear ratios in a fixed diapason. Besides, there is a
hydraulic system controlling the moving sheaves, and a stabilization system for CVT gear ratio.</p>
      <p>The chain consists of rocker pins and plates (Figure 2). The driving torque is transmitted due to
friction forces in contact points between pins and sheaves.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Mathematical model of CVT dynamics</title>
      <p>The dynamics of a holonomic system of elastic bodies can be described using standard
mathematical formalism of Lagrangian mechanics [6]. In application to our CVT model, this can be written
in the following form:</p>
      <p>dt q q
Where L  T   is Lagrangian (sum of kinetic and potential energies),
T  T (q, q) is the kinetic energy,   (q) is the potential energy.</p>
      <p>d L</p>
      <p>L
</p>
      <p> Q,
Q  

q</p>
      <p> Q applied is the vector of generalized forces, (q, q)  0 is the Rayleigh dissipative
function, and Qapplied is the applied forces vector.</p>
      <p>Substitution of potential and kinetic energies expressions into (1) gives us a system of ordinary
differential equations with a non-linear right hand side:</p>
      <p>Aq  F(t, q, q)</p>
      <p>Where A  A(t, q) is a symmetric and positive-definite inertia matrix and the right-hand side
vector is a sum of elastic, damping and external forces.</p>
      <p>CVT chain consists of many separate solid bodies, which are rocker pins and plates (Figure 2); the
total number of them is over 1000, and there are many contact interactions between these bodies.
Besides, it’s essential to take local deformations of the bodies into account in a mathematical model in
order to correctly predict the behaviour of the entire system.</p>
      <p>Mathematical expressions for calculating potential and kinetic energies for the parts of our CVT
model have been published in [7]. In short, we suppose that pins, pulleys and shafts have inertia, and
(1)
(2),
plates do not. Plates’ masses are added to the pins connected by those plates (see (Figure 3a for a
cross-sectional view of pins connected by plates). The pins experience extension and bending; we
neglect pins torsion. The plates (Figure 3c,d) work mostly for extension, but also for bending and
torsion. The shafts work mostly for tension, and, in a less degree, for extension and bending. Two halves
of each pin roll over each other (Figure 3b). Pin halves contacts and contacts between pin ends and
pulleys are modelled with Herz theory of contact [8].</p>
      <p>(a)</p>
    </sec>
    <sec id="sec-4">
      <title>4. Numerical implementation</title>
      <sec id="sec-4-1">
        <title>4.1 Time integration scheme</title>
        <p>The well-known four-stage Runge-Kutta time integration scheme was used to integrate ODE
system (2). The scheme is explicit; it is stable in the region on plane (h) shown in Figure 4 (here  is the
maximal eigenvalue) [9]. The highest frequencies in the model correspond to axial tension of the pins;
the values typically are about 107-108 rad/sec, which requires time integration step value less then
10-710-8 sec.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2 Software implementation</title>
        <p>Functionality of our CVT dynamics simulator currently includes:
 Building various CVT models from a set of standard parts in graphical pre-processor;
 The computational core: integrating ODE systems + generalized eigenvalue problem
solution;
 Visualization utilities (2D, 3D, video recording);
 Building plots and spectrograms;
 Logging user actions;
 Help system.</p>
        <p>The code has been implemented in object-oriented approach in C++. Graphical user interface
employs Qt5 library functionality. The application is cross-platform and can run both on Windows and
Linux. Other software technologies used in the code are:
 V8 (JavaScript implementation)
 Doxygen (for developers’ and users’ documentation systems)
 om (in-house object model)
 cmake (meta-building system)</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3 Work load distribution in a serial mode</title>
        <p>Both serial and parallel simulations were run on a computational node of the supercomputing
cluster TESLA in the Laboratory of Virtual Reality (St. Petersburg State Polytechnic University). The
node is equipped with a six-cored Intel Xeon CPU X5660 2.80GHz, 8 GB RAM with 3 memory
channels and 500 GB of disk storage. Operative system is Linux Ubuntu, C++ compiler version is GNU
v.4.5.1 with OpenMP v.3.0.</p>
        <p>According to the profiling results for a sequential code, the “heaviest” computational module
(58% of computational time) is the one calculating contact forces between pins and plates
(“ChainForces” module in Figure 5). Chain-pulley contact forces calculation takes 18% of
computational time. Inertia matrix decomposition and constraints elimination take 7% and 5% of time,
correspondingly. Computation of other forces (shafts support, pulley control forces et cetera) takes about
2% of computational time. “Rk4Step” in Figure 5 denotes a load portion taken by the whole
simulation (100%, naturally). The rest 9% of the global elapse time are not occupied with the specified
modules; they are spent for I/O, memory operations and other overheads.</p>
        <p>Load distribution in serial code
l
a
iton 100
tua 80
opm ,e% 6400
fco itm 20
itron 0
Chain-Pulley Contact Forces Eliminate Constraints</p>
        <p>P ChainForces Other Forces</p>
        <p>Inertia Matrix Decomposition
o</p>
        <p>Rk4Step</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4 Sources of parallelism</title>
        <p>Logical workflow of the simulation is shown in Figure 6. The modules within the group of
concurrent tasks are totally independent and, in principle, can be run simultaneously. The workflow steps
connected by arrows are logically sequential and transfer their results to the next element in the
computational sequence. After each time integration step, all contact pairs in the model are checked for
gap opening: if a normal reaction in some contact pair becomes negative, the gap is open and the
contact pair will not be active in the next integration step.</p>
        <sec id="sec-4-4-1">
          <title>Calculate chain forces</title>
        </sec>
        <sec id="sec-4-4-2">
          <title>Calc. pin-pulley contact forces</title>
        </sec>
        <sec id="sec-4-4-3">
          <title>Calc. all other forces</title>
        </sec>
        <sec id="sec-4-4-4">
          <title>Calc. &amp; decompose inertia matrix</title>
          <p>A group of concurrent tasks</p>
        </sec>
        <sec id="sec-4-4-5">
          <title>Eliminate constraints</title>
        </sec>
        <sec id="sec-4-4-6">
          <title>Perform a RK4 stage</title>
        </sec>
        <sec id="sec-4-4-7">
          <title>At the last</title>
          <p>stage of
RK4: update
state vector
and check
contact
statuses</p>
          <p>Time integration loop
Besides task concurrency, there is a second, even more important, source of parallelism based on
data locality naturally resulting from the locality of contact interactions between transmission parts.
Typically total number of chain pins nPins = 80 100; each pin interacts with a set of 20-30 plates and
with two neighboring pins. Computation of contact forces between a pin and a pulley is totally
independent from neighboring pins. Hereby, one pin (with a set of calculations associated with this pin)
can represent a grain of parallelism in this application.</p>
        </sec>
      </sec>
      <sec id="sec-4-5">
        <title>4.5 OpenMP parallelization algorithm</title>
        <p>Locality of interactions in a CVT chain has been used in the OpenMP parallelization of the chain
forces calculation module. The chain was split into np parts, where np is the number of cores; np
varied from 1 to 6. Globally, simulation was controlled by a master thread which invoked a group of
parallel threads each time the “chain forces” module was called, namely four times per time step in RK4
method (Figure 7).</p>
        <p>…//initialize simulation
for(i=0;i&lt;nsteps;i++){ //time integration loop</p>
        <p>for(k=0;k&lt;4;k++){ //a loop over RK4-stages
#pragma omp parallel sections //fork threads</p>
        <p>{
#pragma omp section
{
CalcChainForces(0, nPins/np);
}
#pragma omp section
{</p>
        <p>CalcChainForces(nPins/np+1, 2*nPins/np);
}
…</p>
        <p>}// join threads
…/* Do the rest of work: calculate other forces, update
and decompose inertia matrix, eliminate constraints,
perform one RK4 stage */</p>
        <p>}//End of RK4-stage loop. Update state vector. Check
contact statuses.
}//end of time integration loop
…
Parallel efficiency benchmarking results will be discussed further in Section 5.2.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Simulation results</title>
      <sec id="sec-5-1">
        <title>5.1 Test case description and basic dynamics analysis results</title>
        <p>The test case simulated dynamics of a CVT for 1.118 sec model time period. Time integration
step was constant and equaled to 10-8 sec. The results of simulations have been visualized and
analyzed in the in-house post-processing component of our software system.</p>
        <p>Local behavior of CVT components, as well as global characteristics (such as efficiency) can be
observed through many postprocessor values. Figure 8 shows some examples (plate and link tension
force, pin axial force, chain torsion angle, support force and CVT efficiency).</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2 Parallel performance benchmarking</title>
        <p>In parallel performance tests, model simulation period was 0.1 sec; time integration step was 10-8
sec. In serial mode, this test takes about an hour on Intel Xeon X5660 (see Table 1 for computational
times).</p>
        <p>OpenMP parallelization reduced computational time almost twice (in 1.9 times, to be precise),
when using all 6 cores (see Figure 9b, Table 1). The parallel portion of the code (chain forces
calcula5
p
u4
d
e
p3
e
S
2
1</p>
        <p>1
tion) scales almost perfectly, producing 1.8, 3.5 and 5.5 times acceleration on two, four and six cores,
correspondingly. The sequential portion of the code naturally slows down the whole program.</p>
        <p>According to Amdahl’s law, the maximal speedup of the code with a 42% portion of sequential
work on six cores would equal to 1.94:
42%  58% / 6
100%</p>
        <p>We have obtained speedup=1.9; hence, the OpenMP parallelization of the chain forces calculation
module was almost ideal, with minor synchronization and memory bandwidth losses.</p>
        <p>Our next optimization steps will include parallelization of the module computing chain-pulley
contact forces and inertia matrix decomposition module.</p>
        <p>(a) Parallel section speedup
(b) Global speedup (parallel + sequential
parts)
2</p>
        <p>3 4
number of cores
5
6
1
2</p>
        <p>3 4
number of cores
5
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions and future work</title>
      <p>We have developed a discrete model of continuously variable transmission dynamics and
implemented it in the object-oriented software tool. The model has been successfully validated during more
then ten years of its application in the real industrial environment. The most time-consuming portion
of the code has been parallelized on a 6-cored CPU with OpenMP in the task-parallel paradigm. The
obtained code speedup (1.9) is close to the maximal speedup (1.94) estimated by the Amdahl’s law for
the code with a 42% portion of sequential calculations.</p>
      <p>Our next steps will naturally include parallelization of the rest of modules and exploiting
functional concurrency of the workflow, according to Figure 6. Then, in order to obtain a speedup of about
100, we are going to implement MPI parallelization with local data storage (one pin per one process)
and organize ring communications between processes (pins). A latency of 1 microsecond will
hopefully allow hiding MPI communications behind computations: according to our benchmarks, a portion of
serial computations per one pin takes 6 microseconds at each time integration step on Intel Xeon CPU
X5660 2.80GHz. Additionally, we will perform OpenMP parallelization of the nested loops over
plates and study the efficiency of this hybrid parallelization.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>The authors thank Russian Fund of Fundamental Research for financial support under the frame
of Grant № 13-07-12077.
1.8
p
du1.6
epe1.4
S1.2
1
1.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>M.</given-names>
            <surname>Bullinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Pfeiffer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Ulbrich</surname>
          </string-name>
          ,
          <article-title>Elastic modelling of bodies and contacts in continuous variable transmissions</article-title>
          ,
          <source>Multibody Syst. Dyn</source>
          .
          <volume>13</volume>
          ,
          <fpage>175</fpage>
          -
          <lpage>194</lpage>
          (
          <year>2005</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>L.</given-names>
            <surname>Neumann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ulbrich</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Pfeiffer</surname>
          </string-name>
          ,
          <article-title>New model of a CVT rocker-pin chain with exact joint kinematics</article-title>
          ,
          <source>J. Comput. Nonlinear Dyn</source>
          .
          <volume>1</volume>
          (
          <issue>2</issue>
          ),
          <fpage>143</fpage>
          -
          <lpage>149</lpage>
          (
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>L.</given-names>
            <surname>Neumann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ulbrich</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Pfeiffer</surname>
          </string-name>
          ,
          <article-title>Optimization of the joint geometry of a rocker pin chain</article-title>
          ,
          <source>Machine Dyn. Problems</source>
          <volume>29</volume>
          (
          <issue>4</issue>
          ),
          <fpage>97</fpage>
          -
          <lpage>108</lpage>
          (
          <year>2005</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>M.T. Lates</surname>
            ,
            <given-names>R.G.</given-names>
          </string-name>
          <string-name>
            <surname>Velicu</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Papuc</surname>
          </string-name>
          .
          <article-title>Multiscale modeling of chain-guide contact by using tests and FEM</article-title>
          .
          <source>11th World Congress on Computational Mechanics WCCM XI</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Th</surname>
            . Geier,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Foerg</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Zander</surname>
            , and
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Ulbrich</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Pfeiffer</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Brandsma</surname>
            ,
            <given-names>A. van der Velde</given-names>
          </string-name>
          ,
          <article-title>Simulation of a push belt CVT considering uni- and bilateral constraints</article-title>
          ,
          <source>Journal of Applied Mathematics and Mechanics</source>
          <volume>86</volume>
          (
          <issue>10</issue>
          ), pp.
          <fpage>795</fpage>
          -
          <lpage>806</lpage>
          (
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>V.</given-names>
            <surname>Arnold</surname>
          </string-name>
          ,
          <source>Mathematical Methods of Classical Mechanics, 2nd edition</source>
          (Springer, New York,
          <year>1989</year>
          ),
          <source>Chap. 3.</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <source>Journal of Applied Mathematics and Mechanics</source>
          <volume>94</volume>
          (
          <issue>11</issue>
          ), pp.
          <fpage>917</fpage>
          -
          <lpage>922</lpage>
          . WILEY-VCH Verlag GmbH &amp; Co. KGaA, Weinheim. DOI 10.1002/zamm.201300249 (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>K.</given-names>
            <surname>Johnson</surname>
          </string-name>
          , Contact Mechanics (Cambridge University Press, Cambridge,
          <year>1987</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>E.</given-names>
            <surname>Hairer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Wanner. Solving Ordinary Differential Equations</surname>
          </string-name>
          <string-name>
            <given-names>II</given-names>
            : Stiff and
            <surname>Differential-Algebraic</surname>
          </string-name>
          <string-name>
            <surname>Problems</surname>
          </string-name>
          , p.
          <fpage>17</fpage>
          . Springer-Verlag Berlin Heidelberg (
          <year>1996</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>