<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Parallel algorithm for solving large-scale dynamic general equilibrium models</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">N</forename><forename type="middle">B</forename><surname>Melnikov</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Lomonosov Moscow State University</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">A</forename><forename type="middle">P</forename><surname>Gruzdev</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Lomonosov Moscow State University</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">M</forename><forename type="middle">G</forename><surname>Dalton</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Lomonosov Moscow State University</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">B</forename><forename type="middle">C</forename><surname>O'neill</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Lomonosov Moscow State University</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">Parallel algorithm for solving large-scale dynamic general equilibrium models</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">7B025F2FA5E690A08C9097BAADA91494</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T04:04+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>We present a parallel algorithm for computing an equilibrium path in a large-scale economic growth model. We exploit the special block structure of the nonlinear systems of equations common in such models. Our algorithm is based on an iterative method of Gauss-Seidel type with prices of different time periods calculated simultaneously rather than recursively. We have implemented the parallel algorithm in OpenMP and MPI programming environments. The numerical results show that speedup improves almost linearly as number of nodes increases. Different methods for solving an individual block: Newton-type methods, Krylov subspace methods and trust-region methods, give similar results for the speedup.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Dynamic general equilibrium (GE) models are used to quantify the effects on economy due to demographic, technological and climate change (see, e.g., <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2]</ref> and refs. therein). Dynamic GE is described by large-scale systems of nonlinear equations. One of the the most popular early methods for solving such nonlinear systems was the Fair-Taylor method <ref type="bibr" target="#b2">[3]</ref>, which is a variation of the Gauss-Seidel method. Recent contributions have concentrated on Newton-type methods that are now more widely used (see, e.g., <ref type="bibr" target="#b3">[4]</ref>).</p><p>In this paper, we give the Fair-Taylor method a fresh look. We use the special block structure of the equations system common in the dynamic GE models to describe and implement a parallel algorithm. The idea of taking block structure into account was used previously for multi-country models (see, e.g., <ref type="bibr" target="#b4">[5]</ref>). We implement parallel calculation of individual time blocks using OpenMP and MPI environments for systems with shared and distributed memory. For the solution of the time blocks, we compare the performance of different Newton-type methods: LU -factorization, Krylov methods and trust-region methods. The algorithm is demonstrated in the Population-Environmental-Technology (PET) model (see <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2]</ref> </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>and refs. therein).</head><p>The paper is organized as follows. In Section 2 we present the block Gauss-Seidel method and briefly describe the Newton-type methods that we use for solving the equations system of individual blocks. In section 3, by example of the PET model, we derive the nonlinear system of equations that determines the GE. In section 4 we implement the sequential and two parallel versions (OpenMP and MPI) of the algorithm. Finally, in section 5 we present and discuss the numerical results.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Block Gauss-Seidel method</head><p>First, we briefly recall the Gauss-Seidel method for a linear equation system Ax = b, where A is a nondegenerate n × n matrix with nonzero diagonal elements (see, e.g., <ref type="bibr" target="#b5">[6]</ref>). Assume that the kth iterate x k = (x k 1 , . . . , x k n ) T and i − 1 components x k+1 1 , . . . , x k+1 i−1 of the (k + 1)th iterate x k+1 have been determined. Then to obtain x k+1 i the ith equation of the system</p><formula xml:id="formula_0">i−1 j=1 a ij x k+1 j + a ii x i + n j=i+1 a ij x k j = b i ,</formula><p>Суперкомпьютерные дни в России 2015 // Russian Supercomputing Days 2015 // RussianSCDays.org is solved with respect to x i and the solution is taken as x k+1 i . We write method in a more compact form by splitting the matrix as A = D + L + U , where D is diagonal and L and U are strictly lower and upper triangular parts. Then</p><formula xml:id="formula_1">x k+1 = M x k + c, where M = −(D + L) −1 U is the iteration matrix and c = (D + L) −1 b. The usual termination criterium is ||r k ||/||b|| &lt; ε, where r k = b − Ax k is the residual. Since r k = (D + L)(x k+1 − x k ),</formula><p>we can use the properly scaled error x k+1 − x k to terminate iterations.</p><p>The same idea is generalized to a nonlinear equations system f (x) = 0, where f = (f 1 , . . . , f n ). To obtain x k+1 i the ith equation</p><formula xml:id="formula_2">f i (x k+1 1 , . . . , x k+1 i−1 , x i , x k i+1 , . . . , x k n ) = 0</formula><p>is solved with respect to x i and the solution is taken as x k+1</p><p>i . An advantage of the method is that one does not need to calculate the Jacobian of the system but only values of the functions f i (x). Now we assume that f i and x i are vectors:</p><formula xml:id="formula_3">f i = (f i1 , . . . , f ij , . . . , f il ), x i = (x i1 , . . . , x ij , . . . , x il ),</formula><p>where the blocks f ij and x ij are also vectors having the same dimension. Let the jth block of functions f ij depend only on the jth block of variables x ij . Then to obtain x k+1 i the equations</p><formula xml:id="formula_4">f ij (x k+1 1 , . . . , x k+1 i−1 , x, x k i+1 . . . , x k n ) = 0, j = 1, l,<label>(1)</label></formula><p>are solved independently with respect to x and the solution is taken as x k+1 ij . To solve the equations system for a separate block (1), which we denote F (x) = 0 for now, it is reasonable to use a method with superlinear convergence. Here, we describe three methods that are further used for comparison: Newton's method, Krylov subspace method and trust-region method. In each of the three methods, at the current iterate x k , the linearized system</p><formula xml:id="formula_5">F ′ (x k )s k = −F (x k )<label>(2)</label></formula><p>is solved with respect to the step s k in order to obtain the next iterate x k+1 = x k + s k . In Newton's method, a LU -factorization of the Jacobian F ′ (x k ) is used (see, e.g., <ref type="bibr" target="#b5">[6]</ref>). The inputs of the algorithm are the initial iterate x 0 and a termination tolerance tol. In the Newton Iterative Solver (NITSOL) <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b7">8]</ref>, instead of the direct method for solving the linearized system, an inexact Newton's method with backtracking is used. The termination criterium is</p><formula xml:id="formula_6">||F ′ (x k )s k + F (x k )|| ≤ η k ||F (x k )||,<label>(3)</label></formula><p>where η k ∈ [0, 1) is the tolerance. To obtain the step s k , the generalized minimal residual (GM-RES) Krylov subspace method is used (see, e.g., <ref type="bibr" target="#b8">[9]</ref>). Once an initial s k has been determined, it is tested and, if necessary, reduced in length through backtracking until an acceptable step is obtained. The parameter t ∈ (0, 1) is used to judge sufficient reduction θ ∈ (θ min , θ max ). The Jacobian is calculated using the difference approximation, just as in Newton's method.</p><p>input : x0, tol; ηmax ∈ [0, 1), t ∈ (0, 1), 0 &lt; θmin &lt; θmax &lt; 1 output : the solution x k</p><formula xml:id="formula_7">1 k ← 0; 2 while ||F (x k )|| &gt; tol do 3 solve<label>(2</label></formula><p>) with respect to s k using a Krylov subspace method (termination criterium (3)); The third method solves the system of nonlinear equations using the trust-region method to obtain the current iterate (see, e.g., <ref type="bibr" target="#b9">[10]</ref>). In the routine NEQBF, the following problem</p><formula xml:id="formula_8">4 while ||F (x k + s k )|| &gt; [1 − t(1 − η k )]||F (x k )|| do 5 η k ← 1 − θ(1 − η k ); 6 s k ← θs k , θ ∈ [θmin, θmax]; 7 end 8 x k+1 ← x k + s k ; 9 k ← k + 1;</formula><formula xml:id="formula_9">||F (x k ) + B k s k || 2 − → min s k ,<label>(4a)</label></formula><formula xml:id="formula_10">||s k || ≤ δ k .<label>(4b)</label></formula><p>is solved to get the direction s k , where δ k is the size of the trust region and B k is the approximate Jacobian evaluated at the current point x k . Then, the function at the point x k+1 = x k + s k is evaluated and used to decide whether the new point x k should be accepted. Problem ( <ref type="formula" target="#formula_9">4</ref>) is solved approximately by the double dogleg method. The Jacobian is approximated by Broyden's formula</p><formula xml:id="formula_11">B k+1 = B k + ([F (x k+1 ) − F (x k )] − B k s k )s T k s T k s k . input : x0, tol; δ0, θ ∈ (0, 1) output : the solution x k 1 k ← 0; 2 while ||F (x k )|| &lt; tol do 3 δ k ← δ0; 4 while f (x k+1 ) ≥ f (x k ) do 5 solve<label>(</label></formula><p>4) with respect to s k using the double dogleg method; Суперкомпьютерные дни в России 2015 // Russian Supercomputing Days 2015 // RussianSCDays.org</p><formula xml:id="formula_12">6 δ k ← θδ k ; 7 x k+1 ← x k + s k ; 8 end 9 k ← k + 1;</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Dynamic general equilibrium model</head><p>In this section, we show that a dynamic GE model leads to a block equations system. To be specific, we consider the one region PET model (see, e.g., <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2]</ref> for details). PET is a neoclassical growth models with three types of agents: consumers, producers and government. Consumers take the prices as given and solve the utility maximization problem over the infinite time-horizon under their budget constraints (rational expectations). Producers minimize costs given the levels of production and prices at each moment in time. The government plays a neutral role redistributing the capital through taxes and transfers. Prices are determined by the markets clearing conditions. The GE is defined as a solution to the nonlinear equations system that consists of the first-order optimality conditions for consumers and producers and supply-equals-demand conditions for markets.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Optimal economic growth</head><p>The consumer side of the model consists of N c heterogeneous groups of households. Demand of the ith consumer group is determined by the intertemporal utility function</p><formula xml:id="formula_13">U i (c i ) = 1 ψ ∞ t=0 β t n it u i (c it ),</formula><p>where t = 0, 1, ... is time, c it is the level of consumption (here and henceforth, small letters indicate the per capita values), β ∈ (0, 1) is the discount rate, ψ ∈ (−∞, 1) is the time substitution rate and n it is the size of population. The instantaneous utility function u i (c it ) is given by the constant-elasticity of substitution (CES) function</p><formula xml:id="formula_14">u i (c it ) =   Nc j=1 (µ ijt c ijt ) ρ   ψ ρ</formula><p>, with constant electivity σ = 1/(1−ρ). Here, the index j = 1, N c labels goods, ρ ∈ (−∞, 1)\ {0} is the substitution parameter among goods and µ ijt is the preference coefficient. Capital dynamics is described as</p><formula xml:id="formula_15">(1 + ν it ) k i,t+1 = (1 − δ) k it + x it , k i0 &gt; 0,<label>(5)</label></formula><p>where </p><formula xml:id="formula_16">x it is investment, δ ∈ (0, 1) is the capital depreciation coefficient, 1 + ν it = n i,</formula><formula xml:id="formula_17">p jt c ijt + q t x it = (1 − θ it ) ω t l it + (1 − φ it ) r t k it + g it ,<label>(6)</label></formula><p>where p jt is the price of jth consumer good, q t and r t are prices of investments and capital, ω t is the wage rate, g it is the government transfers, l it is the labor supply, θ it and φ it are the tax rates on capital and labor incomes. Variables k it , c ijt and x it are determined by maximizing</p><formula xml:id="formula_18">U i (c i ) − → max c it ,k it ,x it ,<label>(7)</label></formula><p>under constraints ( <ref type="formula" target="#formula_15">5</ref>) and <ref type="bibr" target="#b5">(6)</ref>. It is convenient to split the solution of problem ( <ref type="formula" target="#formula_15">5</ref>)-( <ref type="formula" target="#formula_18">7</ref>) into two steps. The first one is to maximize the utility u i (c it ) at time t given the expenditures m it : </p><formula xml:id="formula_19">  Nc j=1 (µ ij c ij ) ρ   1 ρ − → max c ij , Nc j=1 p j c ij = m i ,</formula><formula xml:id="formula_20">p j c ij − → min c ij ,   Nc j=1 (µ ij c ij ) ρ   1 ρ = c i , we obtain min   Nc j=1 p j c ij   = pi c i , pi =   Nc j=1 p j µ ij ρ ρ−1   ρ−1 ρ</formula><p>.</p><p>The second step is to solve problem ( <ref type="formula" target="#formula_15">5</ref>)-( <ref type="formula" target="#formula_18">7</ref>) for savings as a function of time (dynamics). Rewriting ( <ref type="formula" target="#formula_15">5</ref>)- <ref type="bibr" target="#b6">(7)</ref> in terms of pit and c it :</p><formula xml:id="formula_21">1 ψ ∞ t=0 β t n it c it ψ − → max c it ,k it , p it c it + q t x it = (1 − θ it ) ω t l it + (1 − φ it ) r t k it + g it , (1 + ν it ) k i,t+1 = (1 − δ) k it + x it ,</formula><p>we obtain the first-order optimality condition (Euler equation)</p><formula xml:id="formula_22">q t p it c it ψ−1 = β q t+1 (1 − δ) + (1 − φ i,t+1 )r t+1 p i,t+1 c i,t+1 ψ−1 .<label>(8)</label></formula><p>The transversality conditions</p><formula xml:id="formula_23">lim t− →∞ λ it k it = 0,<label>(9)</label></formula><p>where λ it is the Lagrange multiplier, ensures the existence of the optimal trajectory (see, e.g., <ref type="bibr" target="#b10">[11]</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Duality theory for production functions</head><p>The production side of the PET model consists of competitive firms divided into sectors. Each sector produces either a final good: N C consumer goods and "investment good", or an intermediate good: N E energy types and the rest, which is called materials (N X = N C +1+N E +1 is the total number of production sectors).</p><p>Production of the good X is determined by the the inputs of capital K, labor L, energy composite Ē and materials M according to a CES function</p><formula xml:id="formula_24">X = γ X α K (G K K) ρ X + α L (G L L) ρ X + α Ē (G Ē Ē) ρ X + α M (G M M ) ρ X 1 ρ X ,<label>(10)</label></formula><p>where G I is the productivity factor for I = K, L, Ē, M and γ X normalizes α I to sum to unity. The parameters α I and G I can be sector and time dependent. The producer of the good X minimizes the costs</p><formula xml:id="formula_25">P K K + P L L + P Ē Ē + (1 + τ M )P M M − → min K,L, Ē,M</formula><p>under the given level of production (10) (τ M is the ad-valorem tax on the use of materials). The solution has the form min</p><formula xml:id="formula_26">P K K + P L L + P Ē Ē + (1 + τ M )P M M = P X X,</formula><p>Суперкомпьютерные дни в России 2015 // Russian Supercomputing Days 2015 // RussianSCDays.org where the price P X is<ref type="foot" target="#foot_1">1</ref> </p><formula xml:id="formula_27">P X = 1 γ X α 1 1−ρ X K P K G K ρ X ρ X −1 + α 1 1−ρ X L P L G L ρ X ρ X −1 + + α 1 1−ρ X Ē P Ē G Ē ρ X ρ X −1 + α 1 1−ρ X M (1 + τ M )P M G M ρ X ρ X −1 .</formula><p>The input-output coefficients A I = I/X for I = K, L, Ē are given by</p><formula xml:id="formula_28">A I = 1 α I (γ X G I ) ρ X P I P X 1 ρ X −1 ,</formula><p>and for I = M by</p><formula xml:id="formula_29">A M = 1 α M (γ X G M ) ρ X (1 + τ M )P M P X 1 ρ X −1 .</formula><p>Similarly, production of the energy composite Ē is determined according to</p><formula xml:id="formula_30">Ē = γ E i α E i (G E i E i ) ρ E 1 ρ E ,<label>(11)</label></formula><p>where</p><formula xml:id="formula_31">E i , i = 1, N E are different energy types. Solving the cost-minimization problem i (1 + τ E i )P E i E i − → min E i</formula><p>under the given the level of production <ref type="bibr" target="#b10">(11)</ref>, we derive the prices</p><formula xml:id="formula_32">P Ē = 1 γ E i α 1 1−ρ E E i (1 + τ E i )P E i G E i ρ E ρ E −1 ρ E −1 ρ E</formula><p>and input-output shares A E i = E i / Ē for the energy use:</p><formula xml:id="formula_33">A E i = 1 α E i (γ E G E i ) ρ E (1 + τ E i )P E i P Ē 1 ρ E −1 ,</formula><p>where τ E i is the ad-valorem tax on the energy use.</p><p>Government revenues include capital and labor income taxes on households and specific taxes on output plus ad-valorem taxes on energy use and materials for each industry,</p><formula xml:id="formula_34">GREV t = N d i=1 n i (φ i rk i + θ i ωl i ) + N X j=1 τ X j X j + N E s=1 τ E S j P E S j E S j + τ M j M j = = N d i=1 (φ i P K K i + θ i P L L i ) + N X j=1 τ X j + N E s=1 τ Es j P Es j A Es j A Ēj + τ M j A M j X j .</formula><p>Government expenditures are the sum of purchases and transfers, GEXP t = GP t + GT t . Government purchases are assumed to be constant in the current prices: GP t /p t = (1 + ξ) t GP 0 /p 0 , where pt = (1/N d ) N d i pit . It is assumed that GP 0 is given by a Cobb-Douglas production function, so that the input-output coefficients are</p><formula xml:id="formula_35">A GP K = α K P K , A GP L = α L P L , A GP M = α M P M , A GP E i = α E i P E i , i = 1, N E .</formula><p>Similarly, government baseline transfers are neutral. There is additional lump-sum adjustment LSA t to balance the government budget:</p><formula xml:id="formula_36">GT t = pt p0 (1 + ξ) t N d i=1 n it g i0 + LSA t .</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Equilibrium conditions</head><p>Aggregate supply for capital and labor at each point in time are determined by summing over levels supplied by each consumer group:</p><formula xml:id="formula_37">K AS = N d i=1 n i k i , L AS = N d i=1 n i l i .</formula><p>Aggregate demand for capital and labor are</p><formula xml:id="formula_38">K AD = N X j=1 A j K X j + A GP K GP, L AD = N X j=1 A j L X j + A GP L GP.</formula><p>Similarly, for consumer goods and investment.</p><p>Total demand for intermediate goods z</p><formula xml:id="formula_39">= X AD 1 , . . . , X AD N E +1</formula><p>T is the sum of demands to produce other intermediate goods, and those to produce final goods. Total demand for intermediate inputs by final goods producers and government is equal to</p><formula xml:id="formula_40">y =   N X N E +2 A j 1 X AD j + A GP 1 GP, . . . , N X N E +2 A j N E +1 X AD j + A GP N E +1 GP   T = (Y 1 , . . . , Y N E +1 ) T .</formula><p>Since the production functions are homogeneous functions, equilibrium implies</p><formula xml:id="formula_41">X AD i = N E +1 j=1 A j i X AD j + N X j=N E +2 A j i X AD j + A GP i GP, i = 1, N E + 1.</formula><p>or in the compact form z = Az + y, where A = (A j i ) is the (N E + 1) × (N E + 1) matrix of input-output coefficients. Hence outputs of the intermediate goods sectors z are determined by the final goods production y as z = (I − A) −1 y, where I is the (N E + 1) × (N E + 1) unity matrix. A competitive equilibrium in each time moment is defined by markets clearing for factors of production, capital and labor, final goods and a balanced government budget:</p><formula xml:id="formula_42">K AD = K AS , L AD = L AS , X AD = X AS , GREV = GEXP.</formula><p>Note that the consumer prices p j and q are the gross (after-output-tax) prices P X of consumer goods and investment good in industry. The rental rate of capital r and wage rate w are equal to the prices of capital P K and labor P L in industry.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">Computation of equilibrium</head><p>The PET model, as well as other dynamic GE models used in application, evaluates changes during a transition period, and does not address possible effects on the long-run equilibrium. Therefore, the infinite time horizon is replaced by a sufficiently large finite time interval [0, T ]. As t ≥ T the system is assumed to follow the balanced growth path, i.e. the quantities x it , k it , l it , etc. grow exponentially with the growth rate ξ. Proceeding to the limit t − → ∞ in (8), we obtain</p><formula xml:id="formula_43">(1 + ξ) 1−ψ = β 1 − δ + (1 − φ)r q .<label>(12)</label></formula><p>The boundary condition <ref type="bibr" target="#b11">(12)</ref> at t = T is used instead of transversality condition at infinity <ref type="bibr" target="#b8">(9)</ref>. This way, we get a two-point (discrete) boundary-value problem for each consumer group. The nonlinear equations system describing the GE can be written as</p><formula xml:id="formula_44">f (K, P ) = 0,</formula><p>where K is the capital and P are prices (at all time moment t = 0, T ). For brevity, we do not write consumption, investment and transfers since they can be obtained from K and P . The Gauss-Seidel method is applied to this system as follows. Assume that the sth iteration of capital K s has been determined. Then solving the system f (K s , P ) = 0 with respect to P , we obtain the (s + 1)th iteration of prices, P s+1 = P ; the previous iteration P s is used as the initial point of the method. Different time blocks of this part of the algorithm, called the inner loops, can be calculated in parallel.</p><p>Once P s+1 has been determined, the next iteration of capital is obtained by solving the system f (K, P s+1 ) = 0 with respect to K and setting K s+1 = K. The part of the algorithm that calculates K s+1 from K s is called the outer loop (see <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b2">3]</ref> for details of the capital iteration).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Algorithm implementation</head><p>First, we describe the sequential version (Algorithm 4). The input is the initial guess of the capital K 0 as a function of time and the outputs are the capital K and prices P as functions of time. Here, tol is the tolerance and maxit is the maximum number of iterations in the outer loop; T is the length of the time interval. Two types of arrays are used to store capital, consumption, prices, etc. as functions of time. The arrays of the first type, called stor, contain the results of the it-th iterate of the outer loop over all time moments t. The arrays of the second type, called dyn, contain the results of the it-th iterate of the outer loop for at the current t. Variable dif f is the error at the current iterate of the outer loop.</p><p>In the OpenMP version, steps of the loop over time are performed in the parallel region (Algorithm 5). All dyn and stor arrays are shared by all threads. The parameters arrays are transferred through the copyin clause. The time steps are distributed into blocks using schedule(auto).</p><p>In the MPI version, we use only one communicator with number of nodes nsize. Each node has its own copy of the dyn and stor arrays. Moreover, each node works with its own part of the array, and the rest entries are set to zero. To synchronize the dyn and stor arrays, we use the collective communication routine MPI AllReduce with the operation MPI SUM that sums up different copies of an array and sends the result back to all nodes (lines 20 and 23 of Algorithm 6).  To calculate diff in the outer loop, we normalize the error using the variable norm. In the sequential version this variable is calculated in the first step of the loop over time by one of the functions. The variable normCounter is used to calculate the number of calls of this function. For better comparison of with the sequential version, we keep the same way of calculating norm in the MPI version. To this end, at each iteration of the outer loop, all nodes wait till the node with rank zero is completed. The zero node sends norm and normCounter using MPI SEND to all other nodes and they resume their work (lines 5 -7 of Algorithm 6). This approach does not affect the productivity much since most of the time is spent on collective operations (lines 20 and 23).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Суперкомпьютерные дни в</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Results and discussion</head><p>The input data of the PET model consists of projections over time and base year data. Projections are used for the population, i.e. the number of consumers in different groups, and for productivity growth coefficients over the interval [0, T ]. The base year data divides into production data from input-output tables, i.e. production shares, and consumer survey data, i.e. capital, expenditures, savings, etc. of different consumer groups.</p><p>The calculations were carried out over T = 200 years with the time step of one year. The tolerance is tol = 10 −5 for the outer loop and tol ′ = 10 −6 for the inner loops. The total number of production sectors is N X = 10. Then the number of variables is of the order T × N X ∼ 10 3 .</p><p>Суперкомпьютерные дни в России 2015 // Russian Supercomputing Days 2015 // RussianSCDays.org With a more detailed multisector production and several regions the number of variables can be of order of hundred thousands.</p><p>For the OpenMP runs, we used Intel Visual Fortran Compiler XE 12.0. Calculations were carried out on Intel Core i5 (4 cores, 2.70 GHz). For comparison, solution of the equations system in the inner loop was obtained by Newton's method, NITSOL and NEQBF. For each method we made runs with optimization (-o3) and without it. The speedup graphs are presented at Fig. <ref type="figure">7</ref>. As we see, parallelization with four cores gives a more than three times speedup.</p><p>Calculations with the MPI version were carried out using Fortran compiler IBM XL Fortran Advanced Edition. Calculations were carried out on Blue Gene/P V 11.1. Solution of the equations system in the inner loop was obtained by Newton's method and NITSOL. For each method we made runs with optimization (-o3) and without it. The speedup graphs are presented at Fig. <ref type="figure">8</ref>. Here, parallelization with 200 nodes and one rank per node gives a speedup of up to 45 times.</p><p>The numerical results imply that the parallel version of the algorithm gives a substantial improvement both with OpenMP and MPI. The efficiency of the parallel method is especially pronounced with the MPI version, where the number of ranks can be as large as the number of time steps. The increase of the speedup is almost linear and does not depend much on the iterative method used in the inner loop and compiler optimization.</p><p>The results allow us to analyze the scalability. We see that the parallelization method shows good strong scaling, i.e. the speedup increases with the number of processors for the fixed problem size. This justifies the use of parallelization in the Fair-Taylor method.</p><p>Work is underway to determine the reasons why some of the methods used in the inner loop perform better than others. Future work could address the effectiveness of the MPI version with several ranks per node and hybrid realizations, such as OpenMP+MPI, with several processes per node (see, e.g., <ref type="bibr" target="#b12">[13,</ref><ref type="bibr" target="#b13">14]</ref>). </p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>input : x0, tol output : the solution x k 1 k 3 solve ( 2 ) 4 x 5 k ← k + 1 ; 6 endFigure 1 .</head><label>13245161</label><figDesc>Figure 1. Newton's method.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>10 endFigure 2 .</head><label>102</label><figDesc>Figure 2. Inexact Newton with backtracking.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>10 endFigure 3 .</head><label>103</label><figDesc>Figure 3. Trust-region method.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 5 .</head><label>5</label><figDesc>Figure 5. OpenMP version.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 6 .</head><label>6</label><figDesc>Figure 6. MPI version.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 7 .Figure 8 .</head><label>78</label><figDesc>Figure 7. The speedup for the OpenMP parallel version of the algorithm.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head></head><label></label><figDesc>Суперкомпьютерные дни в России 2015 // Russian Supercomputing Days 2015 // RussianSCDays.org with respect to consumer demand of different types of goods c ij , i.e. a static problem (we omit the index t where it does not lead to confusion). Solving the dual problem</figDesc><table><row><cell>Nc</cell></row><row><cell>j=1</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head></head><label></label><figDesc>России 2015 // Russian Supercomputing Days 2015 // RussianSCDays.org input : K 0 output : K, P 1 marker: if diff &gt; tol and it &lt; numIt then</figDesc><table><row><cell>2</cell><cell>for t ← 0 to T do</cell></row><row><cell>3</cell><cell>Calculate prices P (t) (inner loop);</cell></row><row><cell>4</cell><cell>Update dyn arrays;</cell></row><row><cell>5</cell><cell>end</cell></row><row><cell>6</cell><cell>Update stor arrays;</cell></row><row><cell>7</cell><cell>it ← (it + 1);</cell></row><row><cell>8</cell><cell>diff ← update (K, P );</cell></row><row><cell>9 end</cell><cell></cell></row><row><cell cols="2">10 goto marker</cell></row><row><cell></cell><cell>Figure 4. Sequential version.</cell></row><row><cell cols="2">input : K 0</cell></row><row><cell cols="2">output : K, P</cell></row><row><cell cols="2">1 marker: if diff &gt; tol and it &lt; numIt then</cell></row><row><cell>2</cell><cell>omp parallel default(private)</cell></row><row><cell>3</cell><cell>omp shared(dyn arrays, stor arrays)</cell></row><row><cell>4</cell><cell>omp copyin(parameters)</cell></row><row><cell>5</cell><cell>omp for</cell></row><row><cell>6</cell><cell>for t ← 0 to T do</cell></row><row><cell>7</cell><cell>Calculate prices P (t) (inner loop);</cell></row><row><cell>8</cell><cell>Update dyn arrays;</cell></row><row><cell>9</cell><cell>end</cell></row><row><cell>10</cell><cell>omp end parallel</cell></row><row><cell>11</cell><cell>Update stor arrays;</cell></row><row><cell>12</cell><cell>it ← (it + 1);</cell></row><row><cell>13</cell><cell>diff ← update (K, P );</cell></row><row><cell>14 end</cell><cell></cell></row><row><cell cols="2">15 goto marker</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0">Суперкомпьютерные дни в России 2015 // Russian Supercomputing Days 2015 // RussianSCDays.org</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_1">By the envelope theorem, the price is identified with the marginal cost of production (see, e.g.,<ref type="bibr" target="#b11">[12]</ref>).Суперкомпьютерные дни в России</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2015" xml:id="foot_2">// Russian Supercomputing Days 2015 // RussianSCDays.org</note>
		</body>
		<back>

			<div type="funding">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>National Oceanic and Atmospheric Administration, USA 2 National Center for Atmospheric Research, USA <ref type="bibr" target="#b2">3</ref> </p></div>
			</div>

			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>input : K 0 output : K, P </p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Population aging and future carbon emissions in the United States</title>
		<author>
			<persName><forename type="first">M</forename><surname>Dalton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>O'neill</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Prskawetz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Pitkin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">/ Energy economics</title>
		<imprint>
			<biblScope unit="volume">30</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="642" to="675" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Accounting for the household heterogeneity in dynamic general equilibrium models</title>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">B</forename><surname>Melnikov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">C</forename><surname>O'neill</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">G</forename><surname>Dalton</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Energy economics</title>
		<imprint>
			<biblScope unit="volume">34</biblScope>
			<biblScope unit="issue">5</biblScope>
			<biblScope unit="page" from="1475" to="1483" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Solution and maximum likelihood estimation of dynamic nonlinear rational expectations models</title>
		<author>
			<persName><forename type="first">R</forename><surname>Fair</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Taylor</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Econometrica</title>
		<imprint>
			<biblScope unit="volume">51</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="1169" to="1185" />
			<date type="published" when="1983">1983</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">Two practical algorithms for solving rational expectations models // Finance and Economics Discussion Series</title>
		<author>
			<persName><forename type="first">F</forename><surname>Brayton</surname></persName>
		</author>
		<imprint>
			<publisher>U.S. Federal Reserve Board</publisher>
			<biblScope unit="page" from="2011" to="2044" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">A distributed block approach to solving near-block-diagonal systems with an application to a large macroeconometric model</title>
		<author>
			<persName><forename type="first">J</forename><surname>Faust</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Tryon</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computational Economics</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="303" to="316" />
			<date type="published" when="1995">1995</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Iterative Solution of Nonlinear Equations in Several Variables</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Ortega</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">C</forename><surname>Rheinboldt</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">SIAM</title>
		<imprint>
			<biblScope unit="page">572</biblScope>
			<date type="published" when="2000">2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Globally convergent inexact Newton methods</title>
		<author>
			<persName><forename type="first">S</forename><surname>Eisenstat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Walker</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">SIAM Journal on Optimization</title>
		<imprint>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="393" to="422" />
			<date type="published" when="1994">1994</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Nitsol: A Newton Iterative solver for Nonlinear systems</title>
		<author>
			<persName><forename type="first">M</forename><surname>Pernice</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Walker</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">SIAM Journal on Scientific Computing</title>
		<imprint>
			<biblScope unit="volume">19</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="302" to="318" />
			<date type="published" when="1998">1998</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Iterative Methods for Linear and Nonlinear Equations</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">T</forename><surname>Kelley</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">SIAM</title>
		<imprint>
			<biblScope unit="page">166</biblScope>
			<date type="published" when="1995">1995</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Trust-region methods</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">R</forename><surname>Conn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">I M</forename><surname>Gould</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">L</forename><surname>Toint</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">SIAM</title>
		<imprint>
			<biblScope unit="page">959</biblScope>
			<date type="published" when="2000">2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<title level="m" type="main">Recursive Methods in Economic Dynamics</title>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">L</forename><surname>Stokey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">E</forename><surname>Lucas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">C</forename><surname>Prescott</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1989">1989</date>
			<publisher>Harvard University Press</publisher>
			<biblScope unit="page">608</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m" type="main">Microeconomic Analysis</title>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">R</forename><surname>Varian</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1992">1992</date>
			<publisher>Norton</publisher>
			<biblScope unit="page">563</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<title level="m" type="main">CAF versus MPI -Applicability of Coarray Fortran to a Flow Solver</title>
		<author>
			<persName><forename type="first">M</forename><surname>Hasert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Klimach</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Roller</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2011">2011</date>
			<publisher>Springer</publisher>
			<biblScope unit="volume">6960</biblScope>
			<biblScope unit="page" from="228" to="236" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">MPI+MPI: A New Hybrid Approach to Parallel Programming with MPI Plus Shared Memory // Computing</title>
		<author>
			<persName><forename type="first">T</forename><surname>Hoefler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dinan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Buntinas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Balaji</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Barrett</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Brightwell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Gropp</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Kale</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Thakur</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="volume">95</biblScope>
			<biblScope unit="page" from="1121" to="1136" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<ptr target="//RussianSCDays.org" />
		<title level="m">Суперкомпьютерные дни в России 2015 // Russian Supercomputing Days 2015</title>
				<imprint/>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
