=Paper= {{Paper |id=Vol-1482/041 |storemode=property |title=High-performance simulations of continuously variable transmission dynamics |pdfUrl=https://ceur-ws.org/Vol-1482/041.pdf |volume=Vol-1482 }} ==High-performance simulations of continuously variable transmission dynamics== https://ceur-ws.org/Vol-1482/041.pdf
      Суперкомпьютерные дни в России 2015 // Russian Supercomputing Days 2015 // RussianSCDays.org



          High-performance simulations of continuously variable
                        transmission dynamics
                   S.G. Orlov1, N.B. Melnikova1,2, Yu.G. Ispolov1, N.N. Shabrov1
1
    Peter the Great St. Petersburg Polytechnic University, 2National Research University ITMO

           The paper describes a parallel computational model simulating continuously variable
           transmission (CVT) dynamics. A specific feature of the CVT model is the combination of
           relatively small problem sizes (about 1000 unknowns) and high computational costs (up to
           several weeks of sequential code computations for a thirty seconds simulation period). The
           main source of computational complexity here is the calculation of non-linear contact
           forces acting between transmission parts at each step of the explicit time integration proce-
           dure. Below we analyze simulation workflow and runtime load distribution among the ap-
           plication modules. Based on the profiling data, we present a task parallel multithreaded im-
           plementation of the model over shared memory and discuss steps towards further paralleli-
           zation.


1. Introduction
     The present paper focuses on the design and implementation of the high-performance computa-
tional model of a continuously variable transmission (CVT) dynamics. CVT has been extensively used
in the automotive industry for last decades. Numerical simulations of CVT dynamics are commonly
employed in the engineering practice at the design stage. However, at the high level of models com-
plexity and realism, simulations become very time-consuming. Depending on the model complexity, it
can take from several weeks to several months of sequential code computations for a thirty seconds
model time period.
     There have been developed a number of CVT models with different levels of realism and com-
plexity. These include: (a) primitive low-dimension chain models [1]; (b) discrete multi-body mechan-
ical models [2,3]; (c) continuous FEM models [4]. An important requirement for chain models is the
ability to represent each pin and plate separately, hence low-dimension chain models similar to [1]
were not considered in our research. Our chain models differ from those described in [2,3] in that they
are more detailed and consider 3D motion of chain. Continuous FEM models are extremely high-
weighted due to a large number of contact interactions in the system; their application is, in fact, lim-
ited to detailed static simulations of local contact interactions between typical CVT parts [4, 5] cou-
pled to a multi-body system analysis [5].
     Our original CVT model has been specially designed in order to keep an optimal balance between
model realism and computational costs. Reliability of this model has been validated during more then
ten years of successful application in the real industrial environment. However, even this relatively
light-weighted model is quite time-consuming: for a time step h = 2,1∙10−8 sec, model time T = 1 sec,
serial mode simulation takes about 60 hours on CPU Intel G2130 @3.2 GHz.
     Parallel implementation of the CVT model has naturally become our next step in building a really
fast application. Our final goal is reducing typical computational times by 100 times, to get simulation
done within 30-40 minutes. At the current working stage, a task based OpenMP parallelization has
been performed for the most time-consuming portion of the code (see Sections 4-5 for details).
     The rest of the paper is organized as follows: Section 2 introduces the basic principles of CVT
work; Section 3 describes mathematical foundations of our CVT model, Section 4 considers numerical
implementation issues and serial performance analysis, Section 5 presents parallel benchmarking re-
sults, and, finally, Section 6 summarises conclusions and future work directions.

2. CVT mechanics basics
     A schematic view of the CVT is presented in Figure 1. Two pairs of conic sheaves (pulleys) are
installed on two shafts (driving and driven) and connected with a steel chain. The shafts are on elastic


                                                       41
    Суперкомпьютерные дни в России 2015 // Russian Supercomputing Days 2015 // RussianSCDays.org


supports. One pulley in each pair is able to move along its axis, the other one is fixed. The axial mo-
tion of the pulleys provides an infinite number of gear ratios in a fixed diapason. Besides, there is a
hydraulic system controlling the moving sheaves, and a stabilization system for CVT gear ratio.




                     Figure 1. A schematic view of continuously variable transmission
     The chain consists of rocker pins and plates (Figure 2). The driving torque is transmitted due to
friction forces in contact points between pins and sheaves.




                    Figure 2. CVT chain: pairs of contacting pins connected with plates

3. Mathematical model of CVT dynamics
     The dynamics of a holonomic system of elastic bodies can be described using standard mathe-
matical formalism of Lagrangian mechanics [6]. In application to our CVT model, this can be written
in the following form:

                                          d L L
                                                    Q,                                               (1)
                                          dt q q
    Where L  T   is Lagrangian (sum of kinetic and potential energies),
    T  T (q, q ) is the kinetic energy,   (q) is the potential energy.
          
    Q        Q applied is the vector of generalized forces, (q, q )  0 is the Rayleigh dissipative func-
          q
tion, and Q applied is the applied forces vector.
     Substitution of potential and kinetic energies expressions into (1) gives us a system of ordinary
differential equations with a non-linear right hand side:

                                          Aq  F (t, q, q )                                   (2),
     Where A  A(t, q) is a symmetric and positive-definite inertia matrix and the right-hand side vec-
tor is a sum of elastic, damping and external forces.
     CVT chain consists of many separate solid bodies, which are rocker pins and plates (Figure 2); the
total number of them is over 1000, and there are many contact interactions between these bodies. Be-
sides, it’s essential to take local deformations of the bodies into account in a mathematical model in
order to correctly predict the behaviour of the entire system.
     Mathematical expressions for calculating potential and kinetic energies for the parts of our CVT
model have been published in [7]. In short, we suppose that pins, pulleys and shafts have inertia, and


                                                        42
    Суперкомпьютерные дни в России 2015 // Russian Supercomputing Days 2015 // RussianSCDays.org


plates do not. Plates’ masses are added to the pins connected by those plates (see (Figure 3a for a
cross-sectional view of pins connected by plates). The pins experience extension and bending; we ne-
glect pins torsion. The plates (Figure 3c,d) work mostly for extension, but also for bending and tor-
sion. The shafts work mostly for tension, and, in a less degree, for extension and bending. Two halves
of each pin roll over each other (Figure 3b). Pin halves contacts and contacts between pin ends and
pulleys are modelled with Herz theory of contact [8].

                 (a)                                                             (b)




                 (c)                                                             (d)




Figure 3. (a) Cross-sectional view of pins connected with plates; (b) two pin halves rolling over each other; (c) a
                   chain plate; (d) bending of a chain plane and independent rotations of pins

4. Numerical implementation
4.1 Time integration scheme

     The well-known four-stage Runge-Kutta time integration scheme was used to integrate ODE sys-
tem (2). The scheme is explicit; it is stable in the region on plane (h) shown in Figure 4 (here  is the
maximal eigenvalue) [9]. The highest frequencies in the model correspond to axial tension of the pins;
the values typically are about 107-108 rad/sec, which requires time integration step value less then 10-7-
10-8 sec.




                        Figure 4. Explicit RK4 time integration scheme – stability region




                                                        43
    Суперкомпьютерные дни в России 2015 // Russian Supercomputing Days 2015 // RussianSCDays.org


4.2 Software implementation

    Functionality of our CVT dynamics simulator currently includes:
         Building various CVT models from a set of standard parts in graphical pre-processor;
         The computational core: integrating ODE systems + generalized eigenvalue problem so-
            lution;
         Visualization utilities (2D, 3D, video recording);
         Building plots and spectrograms;
         Logging user actions;
         Help system.
    The code has been implemented in object-oriented approach in C++. Graphical user interface em-
ploys Qt5 library functionality. The application is cross-platform and can run both on Windows and
Linux. Other software technologies used in the code are:
         V8 (JavaScript implementation)
         Doxygen (for developers’ and users’ documentation systems)
         om (in-house object model)
         cmake (meta-building system)

4.3 Work load distribution in a serial mode

     Both serial and parallel simulations were run on a computational node of the supercomputing
cluster TESLA in the Laboratory of Virtual Reality (St. Petersburg State Polytechnic University). The
node is equipped with a six-cored Intel Xeon CPU X5660 2.80GHz, 8 GB RAM with 3 memory chan-
nels and 500 GB of disk storage. Operative system is Linux Ubuntu, C++ compiler version is GNU
v.4.5.1 with OpenMP v.3.0.
     According to the profiling results for a sequential code, the “heaviest” computational module
(58% of computational time) is the one calculating contact forces between pins and plates
(“ChainForces” module in Figure 5). Chain-pulley contact forces calculation takes 18% of computa-
tional time. Inertia matrix decomposition and constraints elimination take 7% and 5% of time, corre-
spondingly. Computation of other forces (shafts support, pulley control forces et cetera) takes about
2% of computational time. “Rk4Step” in Figure 5 denotes a load portion taken by the whole simula-
tion (100%, naturally). The rest 9% of the global elapse time are not occupied with the specified mod-
ules; they are spent for I/O, memory operations and other overheads.
                                                       Load distribution in serial code
                      Portion of computational




                                                 100
                                                  80
                              time, %




                                                  60
                                                  40
                                                  20
                                                   0
                                                       s                               s
                                         ces       rce        orc
                                                                 es        tion traint      te p
                                  in Fo r ct F o         r  F        p  osi    ns       Rk4S
                            Ch
                               a         nta      Oth
                                                       e         com te Co
                                    y Co                  i x De      ina
                             ul le                    at r        lim
                          - P                       M            E
                       ain                    rtia
                    Ch                     Ine
           Figure 5. Load distribution between the computational modules in the serial mode

4.4 Sources of parallelism

    Logical workflow of the simulation is shown in Figure 6. The modules within the group of con-
current tasks are totally independent and, in principle, can be run simultaneously. The workflow steps
connected by arrows are logically sequential and transfer their results to the next element in the com-
putational sequence. After each time integration step, all contact pairs in the model are checked for


                                                                   44
    Суперкомпьютерные дни в России 2015 // Russian Supercomputing Days 2015 // RussianSCDays.org


gap opening: if a normal reaction in some contact pair becomes negative, the gap is open and the con-
tact pair will not be active in the next integration step.

   Calculate chain forces
                                                                                       At the last
   Calc. pin-pulley contact forces                                    Perform           stage of
                                                 Eliminate
                                                                       a RK4          RK4: update
                                                 constraints
   Calc. all other forces                                               stage         state vector
                                                                                       and check
                                                                                        contact
   Calc. & decompose inertia matrix                                                     statuses

      A group of concurrent tasks
                                              Time integration loop
    Figure 6. Logical workflow of the simulation within one RK4 step: concurrent and sequential modules

    Besides task concurrency, there is a second, even more important, source of parallelism based on
data locality naturally resulting from the locality of contact interactions between transmission parts.
Typically total number of chain pins nPins = 80  100; each pin interacts with a set of 20-30 plates and
with two neighboring pins. Computation of contact forces between a pin and a pulley is totally inde-
pendent from neighboring pins. Hereby, one pin (with a set of calculations associated with this pin)
can represent a grain of parallelism in this application.

4.5 OpenMP parallelization algorithm

     Locality of interactions in a CVT chain has been used in the OpenMP parallelization of the chain
forces calculation module. The chain was split into np parts, where np is the number of cores; np var-
ied from 1 to 6. Globally, simulation was controlled by a master thread which invoked a group of par-
allel threads each time the “chain forces” module was called, namely four times per time step in RK4
method (Figure 7).
             …//initialize simulation
             for(i=0;i