<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Estimating the Performance of Ab Initio Calculation by VASP on Openpower High Performance System</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Vyacheslav E. Lozhnikov</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexander V. Mamonov</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vadim O. Borzilov</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marina V. Mamonova</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pavel V. Prudnikov</string-name>
          <email>prudnikovpv@omsu.ru</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aleksei A. Sorokin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Georgy G. Baksheev</string-name>
          <email>g.baksheev@g.nsu.ru</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computing Center of Far-Eastern Branch, Russian Academy of Sciences</institution>
          ,
          <addr-line>Khabarovsk</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Theoretical Physics, Omsk State University</institution>
          ,
          <addr-line>Omsk</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Novosibirsk State University</institution>
          ,
          <addr-line>Novosibirsk</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <fpage>16</fpage>
      <lpage>19</lpage>
      <abstract>
        <p>In this work we compare the performance of Pascal P100 GPUs vs POWER8 CPU on OpenPOWER HPC system by VASP calculation of energy and magnetic characteristics of Fe/Cu(111)/Fe and Co/Cu(100)/Co multilayer magnetic nanostructures. We revealed that the VASP code demonstrates a maximum performance on OpenPOWER System with the GPUs.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>i=1
From the variational principle, the Kohn-Sham equations are obtained:

−

</p>
      <p>
 2 + Veff +   B xc (r ) −  v v (r ) = 0,
</p>
      <p>N
m (r ) =  v* (r ) v (r )</p>
      <p>v=1
B xc =
E xc  (r ), m(r )
m(r )
(1)
(2)
where Bxc is the effective magnetic field arising from the exchange-correlation energy.</p>
      <p>The main problem associated with the density functional theory method is that exact analytical expressions for
exchange and correlation functionals are known only for the particular case of a gas of free electrons.
Nevertheless, the existing approximations allow us to calculate a number of physical quantities with sufficient
accuracy.</p>
      <p>
        In this work we used GGA (generalized gradient approximations) approximations in terms of Perdew–Burke–
Ernzerhof (PBE) [
        <xref ref-type="bibr" rid="ref10">9</xref>
        ]:
      </p>
      <p>E xGcGA n (r ), n (r ) =   n (r ), n (r ),  n (r ),  n (r )dr
    
(3)</p>
      <p>
        The essence of the projection augmented wave (PAW) method is to transform the pseudowave functions, obtained
in the pseudopotential method into all-electron wave functions, thereby restoring the information lost when
considering pseudowave functions. The number of plane wave components is limited by the Cut-off Energy. To
describe the first Brillouin zone we used standard method Monkhorst–Pack with the parameter K-points
characterizing regular grid in k-space [
        <xref ref-type="bibr" rid="ref11">10</xref>
        ].
      </p>
    </sec>
    <sec id="sec-2">
      <title>3 Compiling the</title>
    </sec>
    <sec id="sec-3">
      <title>Architectures</title>
    </sec>
    <sec id="sec-4">
      <title>Parallel</title>
    </sec>
    <sec id="sec-5">
      <title>Version of</title>
    </sec>
    <sec id="sec-6">
      <title>VASP</title>
    </sec>
    <sec id="sec-7">
      <title>Code for</title>
    </sec>
    <sec id="sec-8">
      <title>Openpower and</title>
    </sec>
    <sec id="sec-9">
      <title>Intel</title>
      <p>Official support of the GPU calculations appeared in VASP from version 5.4.1 and in the our work we used
version 5.4.4. VASP has one precompiled configuration file, named makefile.include, with a lot of parameters.
Showing all parameters is redundancy and we present the main part of it in table 1. We use Intel Parallel Studio XE
C/C++ with Intel MKL library to compile the VASP package on X86_64 architecture. The optimization flags were
choosen -O1 and -O2 because compilation with harder optimization was not complete successfully. It was used
Ubuntu 16.04 with 4.4.0-137 kernel. We used XlC 13.1.5 and Xlf 15.1.5 with including ESSL library on CentOS 7
with 3.10.0-514 kernel to compile VASP on IBM Power System S822LC.</p>
      <p>
        IBM Power System S822LC is two-socket HPC system with two POWER8 CPUs with 20 cores running at 4
GHz and interconnected with two Nvidia Pascal P100 GPUs with a high bandwidth (80GByte in and 80GByte out)
NVLink 1.0 interface (Fig. 1). It is very important for exchange data between multiple GPUs and fast load data
from CPU. The major goal of this system is efficiently use GPU units and accelerating calculations. A large part of
HPC resources installed during the last decade are based on Intel CPUs. Novel generations of Intel CPUs present a
wide spectrum of multicore processors [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Intel Core i7 4770 is desktop processor, but it is “tock” model in Intel
extensive strategy of microprocessor development – it is mostly complete 22 nm architecture. We compared IBM
POWER8 with Intel Haswell because both architectures introduced in 2013 year and had 22 nm technical
processes.
      </p>
      <p>In this work, the results of numerical first-principles calculations of the energy and magnetic characteristics for
cobalt and iron films on a copper surface obtained by using VASP software package by means of the Projector
Augmented Wave (PAW) method are presented. The values of the total energy of collinear spin configuration, the
total magnetic moment and the magnetic moments of Co and Fe atoms are calculated. We investigated a system
consisting of a copper slab and adsorbed on it from both sides by a ferromagnetic film with the thickness of the films
in three monoatomic layers. The multilayer structure was simulated using a periodic 2×2 36-atom supercell with the
lattice constant corresponding to the copper substrate a = 3.6367 (5) Å, which we obtained as a result of calculations
taking into account the optimization of the lattice parameters. The surface face orientation is (100) for Co/Cu system
and (111) for Fe/Cu system.</p>
      <p>For Fe/Cu system the calculations of the total energy were realized for ferromagnetic and two different
antiferromagnetic spin configurations. The antiferromagnetic spin configurations for which the calculation was
carried out are shown in Fig. 3. The magnetic moment of the atoms is directed along the z axis.</p>
      <p>VASP INCAR file has several adjusting parameters that can increase GPU performance. The main of those are
NCORE, NPAR, NSIM, LPLANE.</p>
      <p>− NCORE determines how many cores work on individual orbital;
− NPAR depend on NCORE as NCORE = number of the cores / NPAR;
−
−
−
−</p>
      <p>If NPAR is equal to the number of cores than NCORE = 1, therefore, one orbital will treat by one core.
In the INCAR file we need to set only NCORE or NPAR parameter because NPAR have precedence over
NCORE and in the relatively modern version of VASP using NCORE instead of NPAR is recommended.
NSIM is an important parameter to get calculation on GPU faster. It changes the number of bands treated
simultaneously. There is an opinion that for GPU NSIM parameter needs to be increased while we have free
memory on the GPU.</p>
      <p>LPLANE is a useful parameter for optimization which can reduce intercommunication time, but it is actual
firstly for massively parallel systems, according to VASP documentation.</p>
      <p>For CoCu system we used three configurations to compare VASP calculation times with similar INCAR
parameters for IBM POWER8 CPU and Intel Haswell CPU. We set LREAL=.TRUE. as described at VASP official
documentation and use NCORE=1 with one MPI thread for GPU calculations. We do not use NVIDIA MPS system
and do not set NSIM parameter clearly, but we know that it is important for large tasks especially. Core i7 has only 4
real cores and we run VASP with 4 processes only. On POWER8 we run VASP on 8 cores for using most of one
CPU.
The results of magnetization and free energy calculations (Table 3) are well correlated. The calculations with the
GPU provide less accuracy, but the value of the error is not so sufficient. The times of calculations are different for
POWER8 system and Intel Core i7 (Table 4). The one POWER8 thread was more efficient than one Intel thread for
the VASP calculations. If we use GPU only with one MPI thread we have much better performance (Table 4) than
Intel or POWER CPUs even without optimizations of VASP parameters in INCAR file.</p>
      <p>For FeCu systems we used NCORE = 4 and NSIM = 32 to get better performance for GPU calculations We
perform simulation of the ferromagnet FeCu system on ten POWER8 cores to compare execution times with GPU
(Fig. 4). As we can see in Table 5 antiferromagnet spin configuration need sufficient more memory than a
ferromagnet. Execution time on GPU highly depends on spin configuration too. It makes sense to note, that GPU
utilization is not full and floating from about 20% to 70% during calculation for both systems.</p>
      <p>40
35
)30
s
r
u
o25
h
n
i
(
e20
m
i
t
c
e15
x
E
10
5
0</p>
      <p>VASP is widely used by researchers to get characteristics of solids and multilayer magnetic structures. NCORE
and NSIM parameters can be very useful to maximize performance on GPU. The values of acquired quantities and
accuracy of GPU calculations are in a good agreement with CPU results. To use VASP efficiently with GPUs more
memory and calculation time is required in comparing with calculations on CPU, especially for antiferromagnet spin
configurations.</p>
    </sec>
    <sec id="sec-10">
      <title>Acknowledgements</title>
      <p>
        We would like to thank the IBM experts, who help us to optimize the VASP package for IBM Power Systems
S822LC. This research was supported by the grants 17-02-00279, 18-42-550003 of Russian Foundation of Basic
Research and by the grant MD-6868.2018.2 of the President of the Russian Federation. The simulations were
supported by the computational resources of Shared Facility Center ”Data Center of FEB RAS” (Khabarovsk) [
        <xref ref-type="bibr" rid="ref11">10</xref>
        ].
Computations were performed with the methods and techniques which had been developed under the RFBR scientific
project number 18-29-03196.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Lejaeghere</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bihlmayer</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Björkman</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          , et al.:
          <article-title>Reproducibility in density functional theory calculations of solids</article-title>
          ,
          <source>Science</source>
          .
          <volume>351</volume>
          :
          <issue>aad3000</issue>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Kondrashov</surname>
            ,
            <given-names>R.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mamonova</surname>
            ,
            <given-names>M.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Povoroznuk</surname>
          </string-name>
          , E.S,
          <string-name>
            <surname>Prudnikov</surname>
            ,
            <given-names>V.V.</given-names>
          </string-name>
          :
          <article-title>First-principles investigations of the atomic structure and magnetic properties of Ni and Co films on Cu substrate</article-title>
          ,
          <source>Lobachevskii Journal of Mathematics</source>
          .
          <volume>38</volume>
          :
          <issue>940</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Kresse</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Furthmüller</surname>
          </string-name>
          , J.:
          <article-title>Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set</article-title>
          ,
          <source>Phys. Rev. B: 54:11169</source>
          (
          <year>1996</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Kresse</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marsman</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Furthmüller</surname>
            ,
            <given-names>J.: VASP THE GUIDE</given-names>
          </string-name>
          (
          <year>2015</year>
          ) https://cms.mpi.univie.ac.at/vasp/ vasp/vasp.html
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Stegailov</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vecher</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Efficiency Analysis of Intel and AMD x86 64 Architectures for Ab Initio Calculations: A Case Study of VASP</article-title>
          , In: Voevodin,
          <string-name>
            <given-names>V.</given-names>
            ,
            <surname>Sobolev</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          : (eds) Supercomputing
          <source>RuSCDays 2017. Communications in Computer and Information Science</source>
          , vol
          <volume>793</volume>
          . Springer, Cham (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Giannozzi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baroni</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bonini</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          , et al.:
          <article-title>QUANTUM ESPRESSO: a modular and open-source software project for quantum simulations of materials</article-title>
          ,
          <source>Journal of Physics: Condensed Matter</source>
          .
          <volume>21</volume>
          :
          <issue>395502</issue>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Gonze</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Amadon</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Anglade</surname>
            ,
            <given-names>P.M.</given-names>
          </string-name>
          , et al.:
          <article-title>ABINIT: First-principles approach to material and nanosystem properties</article-title>
          ,
          <source>Comput. Phys. Commun</source>
          .
          <volume>180</volume>
          :
          <issue>2582</issue>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Schwarz</surname>
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blaha</surname>
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Solid state calculations using WIEN2k, Computational Materials Science</article-title>
          .
          <volume>28</volume>
          :
          <fpage>259</fpage>
          -
          <lpage>273</lpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          8.
          <string-name>
            <surname>Perdew</surname>
            ,
            <given-names>J.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Burke</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ernzerhof</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Generalized Gradient Approximation Made Simple</article-title>
          ,
          <source>Phys. Rev. Lett</source>
          .
          <volume>77</volume>
          :
          <issue>3865</issue>
          (
          <year>1996</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          9.
          <string-name>
            <surname>Monkhorst</surname>
            ,
            <given-names>H.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pack</surname>
          </string-name>
          , J.D.:
          <article-title>Special points for Brillouin-zone integrations</article-title>
          ,
          <source>Phys. Rev. B</source>
          <volume>13</volume>
          :
          <issue>5188</issue>
          (
          <year>1976</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          10.
          <string-name>
            <surname>Sorokin</surname>
            ,
            <given-names>A.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Makogonov</surname>
            ,
            <given-names>S.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Korolev</surname>
            ,
            <given-names>S.P.</given-names>
          </string-name>
          :
          <source>Scientific and Technical Information Processing</source>
          <volume>4</volume>
          :
          <fpage>302</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>