<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Parallel algorithm for solving large-scale dynamic general equilibrium models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>N. B. Melnikov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>A. P. Gruzdev</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>M. G. Dalton</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>B. C. O'Neill</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Lomonosov Moscow State University</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>National Center for Atmospheric Research</institution>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>National Oceanic and Atmospheric Administration</institution>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <fpage>84</fpage>
      <lpage>95</lpage>
      <abstract>
        <p>We present a parallel algorithm for computing an equilibrium path in a large-scale economic growth model. We exploit the special block structure of the nonlinear systems of equations common in such models. Our algorithm is based on an iterative method of Gauss-Seidel type with prices of different time periods calculated simultaneously rather than recursively. We have implemented the parallel algorithm in OpenMP and MPI programming environments. The numerical results show that speedup improves almost linearly as number of nodes increases. Different methods for solving an individual block: Newton-type methods, Krylov subspace methods and trust-region methods, give similar results for the speedup.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1. Introduction
2. Block Gauss-Seidel method
i−1
X aij xjk+1 + aiixi +
j=1</p>
      <p>n
X aij xjk = bi,
j=i+1
is solved with respect to xi and the solution is taken as xik+1. We write method in a more
compact form by splitting the matrix as A = D + L + U , where D is diagonal and L and U are
strictly lower and upper triangular parts. Then</p>
      <p>xk+1 = M xk + c,
rk = (D + L)(xk+1 − xk),
where M = −(D + L)−1U is the iteration matrix and c = (D + L)−1b. The usual termination
criterium is ||rk||/||b|| &lt; ε, where rk = b − Axk is the residual. Since
we can use the properly scaled error kxk+1 − xkk to terminate iterations.</p>
      <p>The same idea is generalized to a nonlinear equations system f (x) = 0, where f = (f1, . . . , fn).
To obtain xik+1 the ith equation</p>
      <p>fi(x1k+1, . . . , xik−+11, xi, xik+1, . . . , xkn) = 0
is solved with respect to xi and the solution is taken as xik+1. An advantage of the method is
that one does not need to calculate the Jacobian of the system but only values of the functions
fi(x).</p>
      <p>Now we assume that fi and xi are vectors:
fi = (fi1, . . . , fij, . . . , fil),
xi = (xi1, . . . , xij, . . . , xil),
where the blocks fij and xij are also vectors having the same dimension. Let the jth block of
functions fij depend only on the jth block of variables xij. Then to obtain xik+1 the equations
fij(x1k+1, . . . , xik−+11, x, xik+1 . . . , xkn) = 0,
j = 1, l,
are solved independently with respect to x and the solution is taken as xikj+1.</p>
      <p>
        To solve the equations system for a separate block (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ), which we denote F (x) = 0 for
now, it is reasonable to use a method with superlinear convergence. Here, we describe three
methods that are further used for comparison: Newton’s method, Krylov subspace method and
trust-region method. In each of the three methods, at the current iterate xk, the linearized
system
      </p>
      <p>F ′(xk)sk = −F (xk)
is solved with respect to the step sk in order to obtain the next iterate xk+1 = xk + sk.</p>
      <p>
        In Newton’s method, a LU -factorization of the Jacobian F ′(xk) is used (see, e.g., [6]). The
inputs of the algorithm are the initial iterate x0 and a termination tolerance tol.
input : x0, tol
output : the solution xk
1 k ← 0;
2 while ||F (xk)|| &gt; tol do
3 solve (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) with respect to sk using the LU-factorization;
4 xk+1 ← xk + sk;
5 k ← k + 1;
6 end
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
(
        <xref ref-type="bibr" rid="ref3">3</xref>
        )
      </p>
      <p>
        In the Newton Iterative Solver (NITSOL) [7, 8], instead of the direct method for solving
the linearized system, an inexact Newton’s method with backtracking is used. The termination
criterium is
where ηk ∈ [0, 1) is the tolerance. To obtain the step sk, the generalized minimal residual
(GMRES) Krylov subspace method is used (see, e.g., [9]). Once an initial sk has been determined,
it is tested and, if necessary, reduced in length through backtracking until an acceptable step is
obtained. The parameter t ∈ (
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ) is used to judge sufficient reduction θ ∈ (θmin, θmax). The
Jacobian is calculated using the difference approximation, just as in Newton’s method.
4
5
6
7
8
9
10 end
end
xk+1 ← xk + sk;
k ← k + 1;
input : x0, tol;
      </p>
      <p>
        ηmax ∈ [0, 1), t ∈ (
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ), 0 &lt; θmin &lt; θmax &lt; 1
output : the solution xk
1 k ← 0;
2 while ||F (xk)|| &gt; tol do
3 solve (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) with respect to sk using a Krylov subspace method (termination
criterium (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ));
while ||F (xk + sk)|| &gt; [1 − t(1 − ηk)]||F (xk)|| do
ηk ← 1 − θ(1 − ηk);
sk ← θsk, θ ∈ [θmin, θmax];
      </p>
      <p>The third method solves the system of nonlinear equations using the trust-region method
to obtain the current iterate (see, e.g., [10]). In the routine NEQBF, the following problem
||F (xk) + Bksk||2 −→ min,</p>
      <p>
        sk
||sk|| ≤ δk.
(4a)
(4b)
is solved to get the direction sk, where δk is the size of the trust region and Bk is the approximate
Jacobian evaluated at the current point xk. Then, the function at the point xk+1 = xk + sk
is evaluated and used to decide whether the new point xk should be accepted. Problem (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ) is
solved approximately by the double dogleg method. The Jacobian is approximated by Broyden’s
formula
      </p>
      <p>Bk+1 = Bk +
([F (xk+1) − F (xk)] − Bksk)skT .</p>
      <p>skT sk
input : x0, tol;</p>
      <p>
        δ0, θ ∈ (
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        )
output : the solution xk
1 k ← 0;
2 while ||F (xk)|| &lt; tol do
3 δk ← δ0;
4 while f(xk+1) ≥ f(xk) do
5 solve (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ) with respect to sk using the double dogleg method;
6 δk ← θδk;
7 xk+1 ← xk + sk;
8 end
9 k ← k + 1;
10 end
3. Dynamic general equilibrium model
      </p>
      <p>In this section, we show that a dynamic GE model leads to a block equations system.
To be specific, we consider the one region PET model (see, e.g., [1, 2] for details). PET is a
neoclassical growth models with three types of agents: consumers, producers and government.
Consumers take the prices as given and solve the utility maximization problem over the infinite
time-horizon under their budget constraints (rational expectations). Producers minimize costs
given the levels of production and prices at each moment in time. The government plays a
neutral role redistributing the capital through taxes and transfers. Prices are determined by
the markets clearing conditions. The GE is defined as a solution to the nonlinear equations
system that consists of the first-order optimality conditions for consumers and producers and
supply-equals-demand conditions for markets.
3.1. Optimal economic growth</p>
      <p>
        The consumer side of the model consists of Nc heterogeneous groups of households. Demand
of the ith consumer group is determined by the intertemporal utility function
where t = 0, 1, ... is time, cit is the level of consumption (here and henceforth, small letters
indicate the per capita values), β ∈ (
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ) is the discount rate, ψ ∈ (−∞, 1) is the time substitution
rate and nit is the size of population. The instantaneous utility function ui(cit) is given by the
constant-elasticity of substitution (CES) function
      </p>
      <p>Ui(ci) =
1 X∞ βtnitui(cit),
ψ t=0
 Nc
ui(cit) = X(μijtcijt)ρ
j=1</p>
      <p>
        ψ
 ρ
,
with constant electivity σ = 1/(1−ρ). Here, the index j = 1, Nc labels goods, ρ ∈ (−∞, 1)\ {0} is
the substitution parameter among goods and μijt is the preference coefficient. Capital dynamics
is described as
(1 + νit) ki,t+1 = (1 − δ) kit + xit,
ki0 &gt; 0,
where xit is investment, δ ∈ (
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ) is the capital depreciation coefficient, 1 + νit = ni,t+1/nit is
the population growth coefficient (νit is the growth rate). Budget constraint of the ith consumer
group at time t is
      </p>
      <p>Nc
X pjtcijt + qtxit = (1 − θit) ωtlit + (1 − φit) rtkit + git,
j=1
where pjt is the price of jth consumer good, qt and rt are prices of investments and capital, ωt
is the wage rate, git is the government transfers, lit is the labor supply, θit and φit are the tax
rates on capital and labor incomes.</p>
    </sec>
    <sec id="sec-2">
      <title>Variables kit, cijt and xit are determined by maximizing</title>
      <p>
        under constraints (
        <xref ref-type="bibr" rid="ref5">5</xref>
        ) and (
        <xref ref-type="bibr" rid="ref6">6</xref>
        ). It is convenient to split the solution of problem (
        <xref ref-type="bibr" rid="ref5">5</xref>
        )–(
        <xref ref-type="bibr" rid="ref7">7</xref>
        ) into two
steps. The first one is to maximize the utility ui(cit) at time t given the expenditures mit:
Ui(ci) −→
      </p>
      <p>max ,
cit,kit,xit
−→ max,
cij
 Nc
X(μij cij )ρ
j=1</p>
      <p>1
 ρ</p>
      <p>
        Nc
X pj cij = mi,
j=1
(
        <xref ref-type="bibr" rid="ref5">5</xref>
        )
(
        <xref ref-type="bibr" rid="ref6">6</xref>
        )
(
        <xref ref-type="bibr" rid="ref7">7</xref>
        )
with respect to consumer demand of different types of goods cij , i.e. a static problem (we omit
the index t where it does not lead to confusion). Solving the dual problem
we obtain
      </p>
      <p>Nc
X pj cij −→ min,
j=1 cij
 Nc
X(μij cij )ρ
j=1</p>
      <p>1
 ρ
 Nc 
min X pj cij  = p¯ici,
j=1</p>
      <p> Nc
p¯i = X
j=1
pj
μij
= ci,</p>
      <p>ρ−1
ρ−ρ1  ρ
</p>
      <p>
        The second step is to solve problem (
        <xref ref-type="bibr" rid="ref5">5</xref>
        )–(
        <xref ref-type="bibr" rid="ref7">7</xref>
        ) for savings as a function of time (dynamics).
Rewriting (
        <xref ref-type="bibr" rid="ref5">5</xref>
        )–(
        <xref ref-type="bibr" rid="ref7">7</xref>
        ) in terms of p¯it and cit:
we obtain the first-order optimality condition (Euler equation)
1 X∞ βtnitcitψ −→ max ,
ψ t=0 cit,kit
pitcit + qtxit = (1 − θit) ωtlit + (1 − φit) rtkit + git,
(1 + νit) ki,t+1 = (1 − δ) kit + xit,
qt citψ−1 = β qt+1(1 − δ) + (1 − φi,t+1)rt+1 ci,t+1ψ−1.
      </p>
      <p>pit pi,t+1</p>
    </sec>
    <sec id="sec-3">
      <title>The transversality conditions</title>
      <p>lim λitkit = 0,
t−→∞
where λit is the Lagrange multiplier, ensures the existence of the optimal trajectory (see,
e.g., [11]).
3.2. Duality theory for production functions</p>
      <p>The production side of the PET model consists of competitive firms divided into sectors.
Each sector produces either a final good: NC consumer goods and “investment good”, or an
intermediate good: NE energy types and the rest, which is called materials (NX = NC +1+NE +1
is the total number of production sectors).</p>
      <p>Production of the good X is determined by the the inputs of capital K, labor L, energy
composite E¯ and materials M according to a CES function
where GI is the productivity factor for I = K, L, E¯, M and γX normalizes αI to sum to unity.
The parameters αI and GI can be sector and time dependent. The producer of the good X
minimizes the costs</p>
      <p>PK K + PLL + PE¯ E¯ + (1 + τM )PM M −→</p>
      <p>
        min
K,L,E¯,M
under the given level of production (
        <xref ref-type="bibr" rid="ref10">10</xref>
        ) (τM is the ad-valorem tax on the use of materials). The
solution has the form
min PK K + PLL + PE¯ E¯ + (1 + τM )PM M
      </p>
      <p>
        = PX X,
(
        <xref ref-type="bibr" rid="ref8">8</xref>
        )
(
        <xref ref-type="bibr" rid="ref9">9</xref>
        )
where the price PX is 1
      </p>
      <p>PX =</p>
      <p>GE¯
PE¯ ρXρX−1 + α M1−1ρX
(1 + τM )PM</p>
      <p>GM
ρXρX−1 !
,</p>
      <sec id="sec-3-1">
        <title>Similarly, production of the energy composite E¯ is determined according to</title>
        <p>
          (
          <xref ref-type="bibr" rid="ref11">11</xref>
          )
where Ei, i = 1, NE are different energy types. Solving the cost-minimization problem
under the given the level of production (
          <xref ref-type="bibr" rid="ref11">11</xref>
          ), we derive the prices
        </p>
        <p>PE¯ =
1
γE</p>
        <p>1
X α E1−iρE
i</p>
        <p>ρE ρE−1
(1 + τEi)PEi ρE−1 ! ρE</p>
        <p>GEi
and input-output shares AEi = Ei/E¯ for the energy use:</p>
        <p>AEi =</p>
        <p>1
αEi(γEGEi)ρE</p>
        <p>PE¯</p>
        <p>1
(1 + τEi)PEi ρE−1
,
where τEi is the ad-valorem tax on the energy use.</p>
        <p>Government revenues include capital and labor income taxes on households and specific
taxes on output plus ad-valorem taxes on energy use and materials for each industry,</p>
        <p>Nd NX
GREVt = X ni(φirki + θiωli) + X
i=1 j=1</p>
        <p>NE
τXj Xj + X τESj PESj ESj + τMj Mj
s=1
!
=</p>
        <p>Nd NX
= X(φiPK Ki + θiPLLi) + X
i=1 j=1</p>
        <p>NE
τXj + X τEsj PEsj AEsj AE¯j + τMj AMj
s=1
!</p>
        <p>Xj.</p>
        <p>Government expenditures are the sum of purchases and transfers, GEXPt = GPt + GTt.
Government purchases are assumed to be constant in the current prices: GPt/p¯t = (1 + ξ)tGP0/p¯0,
1By the envelope theorem, the price is identified with the marginal cost of production (see, e.g., [12]).
where p¯t = (1/Nd) PiNd p¯it. It is assumed that GP0 is given by a Cobb-Douglas production
function, so that the input-output coefficients are</p>
        <p>AGKP = αK</p>
        <p>PK
,</p>
        <p>ALGP = αL</p>
        <p>,
PL</p>
        <p>AGMP = αM ,</p>
        <p>PM</p>
        <p>AEGiP = αEi ,</p>
        <p>PEi
i = 1, NE.</p>
        <p>Similarly, government baseline transfers are neutral. There is additional lump-sum adjustment</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>LSAt to balance the government budget:</title>
      <p>Nd
GTt = p¯t (1 + ξ)t X nitgi0 + LSAt.</p>
      <p>p¯0 i=1
3.3. Equilibrium conditions</p>
      <p>Aggregate supply for capital and labor at each point in time are determined by summing
over levels supplied by each consumer group:</p>
      <p>Nd
KAS = X niki,
i=1</p>
      <p>Nd
LAS = X nili.</p>
      <p>i=1</p>
    </sec>
    <sec id="sec-5">
      <title>Aggregate demand for capital and labor are</title>
      <p>NX
KAD = X AjK Xj + AGKP GP,
j=1</p>
      <p>NX
LAD = X AjLXj + ALGP GP.
j=1</p>
    </sec>
    <sec id="sec-6">
      <title>Similarly, for consumer goods and investment.</title>
      <p>Total demand for intermediate goods z = X1AD, . . . , XNAED+1 is the sum of demands
to produce other intermediate goods, and those to produce final goods. Total demand for
intermediate inputs by final goods producers and government is equal to</p>
      <p> NX NX
y =  X Aj1XjAD + A1GP GP, . . . , X</p>
      <p>NE+2 NE+2</p>
      <p>j
ANE+1XjAD + AGNPE+1GP </p>
      <p>= (Y1, . . . , YNE+1)T .</p>
      <p>T
T
Since the production functions are homogeneous functions, equilibrium implies</p>
      <p>NE+1
XiAD = X AijXjAD +
j=1</p>
      <p>NX</p>
      <p>X
j=NE+2</p>
      <p>AijXjAD + AiGP GP,
i = 1, NE + 1.
or in the compact form z = Az + y, where A = (Aij) is the (NE + 1) × (NE + 1) matrix of
input-output coefficients. Hence outputs of the intermediate goods sectors z are determined by
the final goods production y as</p>
      <p>z = (I − A)−1y,
where I is the (NE + 1) × (NE + 1) unity matrix.</p>
      <p>A competitive equilibrium in each time moment is defined by markets clearing for factors
of production, capital and labor, final goods and a balanced government budget:
KAD = KAS,</p>
      <p>LAD = LAS,</p>
      <p>XAD = XAS,</p>
      <p>GREV = GEXP.</p>
      <p>Note that the consumer prices pj and q are the gross (after-output-tax) prices PX of consumer
goods and investment good in industry. The rental rate of capital r and wage rate w are equal
to the prices of capital PK and labor PL in industry.
3.4. Computation of equilibrium</p>
      <p>
        The PET model, as well as other dynamic GE models used in application, evaluates changes
during a transition period, and does not address possible effects on the long-run equilibrium.
Therefore, the infinite time horizon is replaced by a sufficiently large finite time interval [0, T ].
As t ≥ T the system is assumed to follow the balanced growth path, i.e. the quantities xit, kit,
lit, etc. grow exponentially with the growth rate ξ. Proceeding to the limit t −→ ∞ in (
        <xref ref-type="bibr" rid="ref8">8</xref>
        ), we
obtain
(1 − φ)r
      </p>
      <p>q
(1 + ξ)1−ψ = β 1 − δ +
.</p>
      <p>
        (
        <xref ref-type="bibr" rid="ref12">12</xref>
        )
The boundary condition (
        <xref ref-type="bibr" rid="ref12">12</xref>
        ) at t = T is used instead of transversality condition at infinity (
        <xref ref-type="bibr" rid="ref9">9</xref>
        ).
This way, we get a two-point (discrete) boundary-value problem for each consumer group.
      </p>
    </sec>
    <sec id="sec-7">
      <title>The nonlinear equations system describing the GE can be written as</title>
      <p>where K is the capital and P are prices (at all time moment t = 0, T ). For brevity, we do
not write consumption, investment and transfers since they can be obtained from K and P .
The Gauss-Seidel method is applied to this system as follows. Assume that the sth iteration of
capital Ks has been determined. Then solving the system
f (K, P ) = 0,
f (Ks, P ) = 0
with respect to P , we obtain the (s + 1)th iteration of prices, P s+1 = P ; the previous iteration
P s is used as the initial point of the method. Different time blocks of this part of the algorithm,
called the inner loops, can be calculated in parallel.</p>
      <p>Once P s+1 has been determined, the next iteration of capital is obtained by solving the
system</p>
      <p>f (K, P s+1) = 0
with respect to K and setting Ks+1 = K. The part of the algorithm that calculates Ks+1 from</p>
      <sec id="sec-7-1">
        <title>Ks is called the outer loop (see [1, 3] for details of the capital iteration).</title>
        <p>4. Algorithm implementation</p>
        <p>First, we describe the sequential version (Algorithm 4). The input is the initial guess of the
capital K0 as a function of time and the outputs are the capital K and prices P as functions of
time. Here, tol is the tolerance and maxit is the maximum number of iterations in the outer loop;
T is the length of the time interval. Two types of arrays are used to store capital, consumption,
prices, etc. as functions of time. The arrays of the first type, called stor, contain the results of
the it-th iterate of the outer loop over all time moments t. The arrays of the second type, called
dyn, contain the results of the it-th iterate of the outer loop for at the current t. Variable dif f
is the error at the current iterate of the outer loop.</p>
        <p>In the OpenMP version, steps of the loop over time are performed in the parallel region
(Algorithm 5). All dyn and stor arrays are shared by all threads. The parameters arrays
are transferred through the copyin clause. The time steps are distributed into blocks using
schedule(auto).</p>
        <p>In the MPI version, we use only one communicator with number of nodes nsize. Each node
has its own copy of the dyn and stor arrays. Moreover, each node works with its own part
of the array, and the rest entries are set to zero. To synchronize the dyn and stor arrays, we
use the collective communication routine MPI AllReduce with the operation MPI SUM that
sums up different copies of an array and sends the result back to all nodes (lines 20 and 23 of
Algorithm 6).
input : K0
output : K, P
1 marker: if diff &gt; tol and it &lt; numIt then
2 for t ← 0 to T do
3 Calculate prices P (t) (inner loop);
4 Update dyn arrays;
5 end
6 Update stor arrays;
7 it ← (it + 1);
8 diff ← update (K, P );
9 end
10 goto marker
input : K0
output : K, P
1 marker: if diff &gt; tol and it &lt; numIt then
2 omp parallel default(private)
3 omp shared(dyn arrays, stor arrays)
4 omp copyin(parameters)
5 omp for
6 for t ← 0 to T do
7 Calculate prices P (t) (inner loop);
8 Update dyn arrays;
9 end
10 omp end parallel
11 Update stor arrays;
12 it ← (it + 1);
13 diff ← update (K, P );
14 end
15 goto marker</p>
        <p>To calculate diff in the outer loop, we normalize the error using the variable norm. In the
sequential version this variable is calculated in the first step of the loop over time by one of the
functions. The variable normCounter is used to calculate the number of calls of this function.
For better comparison of with the sequential version, we keep the same way of calculating norm
in the MPI version. To this end, at each iteration of the outer loop, all nodes wait till the node
with rank zero is completed. The zero node sends norm and normCounter using MPI SEND
to all other nodes and they resume their work (lines 5 – 7 of Algorithm 6). This approach does
not affect the productivity much since most of the time is spent on collective operations (lines
20 and 23).
5. Results and discussion</p>
        <p>The input data of the PET model consists of projections over time and base year data.
Projections are used for the population, i.e. the number of consumers in different groups, and
for productivity growth coefficients over the interval [0, T ]. The base year data divides into
production data from input-output tables, i.e. production shares, and consumer survey data,
i.e. capital, expenditures, savings, etc. of different consumer groups.</p>
        <p>The calculations were carried out over T = 200 years with the time step of one year. The
tolerance is tol = 10−5 for the outer loop and tol′ = 10−6 for the inner loops. The total number
of production sectors is NX = 10. Then the number of variables is of the order T × NX ∼ 103.
input : K0
output : K, P
1 MPI INIT();
2 nsize ← MPI COMM SIZE();
3 rank ← MPI COMM RANK();
4 marker: if diff &gt; tol and it &lt; numIt then
5 if rank! = 0 then
6 MPI RECV for norm and normCounter from node rank;
7
end
MPI AllReduce(MPI SUM) for all dyn arrays;</p>
        <p>With a more detailed multisector production and several regions the number of variables can be
of order of hundred thousands.</p>
        <p>For the OpenMP runs, we used Intel Visual Fortran Compiler XE 12.0. Calculations were
carried out on Intel Core i5 (4 cores, 2.70 GHz). For comparison, solution of the equations
system in the inner loop was obtained by Newton’s method, NITSOL and NEQBF. For each
method we made runs with optimization (-o3) and without it. The speedup graphs are presented
at Fig. 7. As we see, parallelization with four cores gives a more than three times speedup.</p>
        <p>Calculations with the MPI version were carried out using Fortran compiler IBM XL Fortran
Advanced Edition. Calculations were carried out on Blue Gene/P V 11.1. Solution of the
equations system in the inner loop was obtained by Newton’s method and NITSOL. For each
method we made runs with optimization (-o3) and without it. The speedup graphs are presented
at Fig. 8. Here, parallelization with 200 nodes and one rank per node gives a speedup of up to
45 times.</p>
        <p>The numerical results imply that the parallel version of the algorithm gives a substantial
improvement both with OpenMP and MPI. The efficiency of the parallel method is especially
pronounced with the MPI version, where the number of ranks can be as large as the number
of time steps. The increase of the speedup is almost linear and does not depend much on the
iterative method used in the inner loop and compiler optimization.</p>
        <p>The results allow us to analyze the scalability. We see that the parallelization method
shows good strong scaling, i.e. the speedup increases with the number of processors for the fixed
problem size. This justifies the use of parallelization in the Fair-Taylor method.</p>
        <p>Work is underway to determine the reasons why some of the methods used in the inner loop
perform better than others. Future work could address the effectiveness of the MPI version with
several ranks per node and hybrid realizations, such as OpenMP+MPI, with several processes
per node (see, e.g., [13, 14]).
2</p>
        <p>3</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Threads 4</title>
      <p>1.5
1
1
p
u
d
ee 25
p
S
50
45
40
35
30
20
15
10
5
0</p>
    </sec>
    <sec id="sec-9">
      <title>Newton (no opt.)</title>
    </sec>
    <sec id="sec-10">
      <title>Newton (-o3)</title>
    </sec>
    <sec id="sec-11">
      <title>NITSOL (no opt.)</title>
    </sec>
    <sec id="sec-12">
      <title>NITSOL (-o3)</title>
    </sec>
    <sec id="sec-13">
      <title>NEQBF (no opt.)</title>
    </sec>
    <sec id="sec-14">
      <title>NEQBF (-o3)</title>
    </sec>
    <sec id="sec-15">
      <title>Newton (no opt.)</title>
    </sec>
    <sec id="sec-16">
      <title>Newton (-o3)</title>
    </sec>
    <sec id="sec-17">
      <title>NITSOL (no opt.)</title>
      <p>NITSOL (-o3)
1
40
80
120
160
200</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Dalton</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>O</given-names>
            <surname>'Neill</surname>
          </string-name>
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Prskawetz</surname>
          </string-name>
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Jiang</surname>
          </string-name>
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Pitkin</surname>
          </string-name>
          <string-name>
            <surname>J</surname>
          </string-name>
          .
          <article-title>Population aging and future carbon emissions in the United States // Energy economics</article-title>
          .
          <source>2008</source>
          . Vol.
          <volume>30</volume>
          , N. 2. P.
          <volume>642</volume>
          -
          <fpage>675</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Melnikov</surname>
            <given-names>N.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>O'Neill</surname>
            <given-names>B.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dalton</surname>
            <given-names>M.G.</given-names>
          </string-name>
          <article-title>Accounting for the household heterogeneity in dynamic general equilibrium models // Energy economics</article-title>
          .
          <source>2012</source>
          . Vol.
          <volume>34</volume>
          , N. 5. P.
          <volume>1475</volume>
          -
          <fpage>1483</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Fair</surname>
            <given-names>R.</given-names>
          </string-name>
          , Taylor J.
          <article-title>Solution and maximum likelihood estimation of dynamic nonlinear rational expectations models</article-title>
          // Econometrica.
          <year>1983</year>
          . Vol.
          <volume>51</volume>
          , N. 4. P.
          <volume>1169</volume>
          -
          <fpage>1185</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. Brayton F.
          <article-title>Two practical algorithms for solving rational expectations models // Finance</article-title>
          and Economics Discussion Series.
          <year>2011</year>
          -44, U.S. Federal Reserve Board.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Faust</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tryon</surname>
            <given-names>R.</given-names>
          </string-name>
          <article-title>A distributed block approach to solving near-block-diagonal systems with an application to a large macroeconometric model</article-title>
          // Computational Economics.
          <year>1995</year>
          . Vol.
          <volume>8</volume>
          ,
          <string-name>
            <surname>N.</surname>
          </string-name>
          <year>4</year>
          . P.
          <volume>303</volume>
          -
          <fpage>316</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Ortega</surname>
            <given-names>J.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rheinboldt W.C.</surname>
          </string-name>
          <article-title>Iterative Solution of Nonlinear Equations in Several Variables</article-title>
          . SIAM,
          <year>2000</year>
          . 572 p.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Eisenstat</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Walker</surname>
            <given-names>H</given-names>
          </string-name>
          .
          <article-title>Globally convergent inexact Newton methods //</article-title>
          <source>SIAM Journal on Optimization</source>
          .
          <year>1994</year>
          . Vol.
          <volume>4</volume>
          ,
          <string-name>
            <surname>N.</surname>
          </string-name>
          <year>2</year>
          . P.
          <volume>393</volume>
          -
          <fpage>422</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Pernice</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Walker</surname>
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Nitsol</surname>
          </string-name>
          :
          <article-title>A Newton Iterative solver for Nonlinear systems //</article-title>
          <source>SIAM Journal on Scientific Computing</source>
          .
          <year>1998</year>
          . Vol.
          <volume>19</volume>
          , N. 1. P.
          <volume>302</volume>
          -
          <fpage>318</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Kelley C.T. Iterative</surname>
          </string-name>
          <article-title>Methods for Linear and Nonlinear Equations</article-title>
          . SIAM,
          <year>1995</year>
          . 166 p.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Conn</surname>
            <given-names>A.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gould</surname>
            <given-names>N.I.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toint</surname>
            <given-names>P.L.</given-names>
          </string-name>
          <article-title>Trust-region methods</article-title>
          .
          <source>SIAM</source>
          ,
          <year>2000</year>
          . 959 p.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Stokey</surname>
            <given-names>N.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lucas</surname>
            <given-names>R.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prescott</surname>
            <given-names>E.C.</given-names>
          </string-name>
          <article-title>Recursive Methods in Economic Dynamics</article-title>
          . Harvard University Press,
          <year>1989</year>
          . 608 p.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Varian</surname>
            <given-names>H.R. Microeconomic</given-names>
          </string-name>
          <string-name>
            <surname>Analysis</surname>
          </string-name>
          .
          <source>Norton</source>
          ,
          <year>1992</year>
          . 563 p.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Hasert</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klimach</surname>
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roller</surname>
            <given-names>S.</given-names>
          </string-name>
          <article-title>CAF versus MPI - Applicability of Coarray Fortran to</article-title>
          a Flow Solver // Springer.
          <year>2011</year>
          . Vol.
          <volume>6960</volume>
          , P.
          <fpage>228</fpage>
          -
          <lpage>236</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Hoefler</surname>
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dinan</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Buntinas</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Balaji</surname>
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barrett</surname>
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brightwell</surname>
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gropp</surname>
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kale</surname>
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thakur</surname>
            <given-names>R. MPI</given-names>
          </string-name>
          +
          <article-title>MPI: A New Hybrid Approach to Parallel Programming with MPI Plus Shared Memory /</article-title>
          / Computing.
          <year>2013</year>
          . Vol.
          <volume>95</volume>
          , P.
          <fpage>1121</fpage>
          -
          <lpage>1136</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>