1. Introduction

The ensemble algorithm for estimation of model parameters in the problem of assessing greenhouse gases fluxes

Ekaterina G. Klimova

0 0 Federal Research Center for Information and Computational Technologies , Novosibirsk , Russia

288 295

The problem of assessing the greenhouse gases fluxes from the Earth's surface based on observations is currently very urgent. To solve it, it is customary to use data assimilation systems (or a more general concept - inverse modeling), which include the observations on the concentration of greenhouse gases and models of the transport and difusion. Since such problems involve large volumes of satellite data and the global model of transport and difusion, it has a huge dimension. For this reason, the development of efective algorithms to enable the practical implementation of the task is required. The paper discusses data assimilation algorithms based on the ensemble Kalman filter and ensemble Kalman smoothing, which can be used to solve the problem of estimating greenhouse gases fluxes. Economical algorithms for estimating a parameter that is constant over a given time interval are proposed.

eol>Data assimilation greenhouse gases fluxes ensemble Kalman smoother

1. Introduction

Assessment of the state of the environment based on observational data is one of the most urgent tasks at the present time. This assessment is performed using forecast models based on data assimilation systems. The data assimilation problem is the problem of the joint accounting of data and a mathematical model for the most accurate assessment of the spatial-temporal distribution of the used variables. The study of the changes in space and time of greenhouse gases, such as CO2 and CH4, as well as the assessment of fluxes from the Earth’s surface of these gases is one of the urgent tasks of monitoring the state of the environment. To solve this problem, it is customary to use data assimilation systems that include observational data and a mathematical model of the transport of gases in the atmosphere. One of the approaches to assessing greenhouse gases fluxes is an approach based on the assumption that the fluxes are constant in a given subdomain and on a given time interval (on the order of a week). This is due to both the need for an eficient implementation of the algorithm and the properties of the observational data used in such problems.

This paper discusses an algorithm for estimating a constant in time and space greenhouse gases fluxes by the observations from a given time interval. The proposed algorithm is a variant of the ensemble Kalman smoothing [ 1 ]. An approach based on the use of eficient local algorithms based on previously developed ensemble filtering and smoothing algorithms is considered [ 2, 3, 4, 5 ]. The results of model numerical experiments with a one-dimensional transport and difusion model are presented.

2. Ensemble algorithms for estimating greenhouse gases fluxes

There is a lot of papers devoted to estimating greenhouse gases fluxes using data assimilation procedures. All these works use an approach called “inverse modeling” [ 6 ].

Let’s consider a series of works carried out by a team of authors in which the ensemble Kalman filter is used [ 7, 8, 9, 10 ]. In these works, the estimated variable is the greenhouse gases flux from the Earth’s surface. The Earth’s surface is divided into squares of equal area (1000× 1000 km) and it is assumed that it is required to estimate the mean flux value over the subdomain. In addition, the average value over a given time period is estimated. The estimation of the values of the averaged fluxes over the subdomains from the observational data 0 for a given time interval and forecast is carried out according to the standard formulas of the Kalman filter [ 11 ]:

= + [︁0 − ( )]︁ , = (︁ + ︁) − 1 where , — vectors of analysis and forecast values at grid nodes; 0 — vector of observations; — operator that translates forecast values into observations; , — covariance matrices of forecast (preliminary estimate) and observations errors.

It is assumed that the forecast step of the algorithm has the form

+1 = , where is the time step number.

To implement the ensemble Kalman filter, an ensemble of perturbations of the estimated parameter is specified [ 7, 8, 9, 10 ]:

Then the matrix is estimated by the ensemble ∆ = √1

[∆ 1, . . . , ∆ ] , = ∆ (∆ ) , = ∆ (∆ ) [︀ ∆ (∆ ) + ]︀ − 1 ,

∆ = ( − ∆ ) − ( ), where operator includes model forecast at the time of observation, interpolation from grid nodes to observation points, and, in the case of satellite data, vertical averaging with known coeficients (“average kernel”), is the number of ensemble elements.

The period of time over which observations are used is called the assimilation window. In [ 7 ], the assimilation window, consisting of 12 cycles of 8 days, is considered, and the average values of the fluxes over 8 days are estimated.

From the point of view of the mathematical formulation of the problem, we highlight the following points: 1) the fluxes are assessed without specifying the concentrations; the solution of the problem in this formulation will not be optimal; 2) since the operator includes a mathematical model of impurity transport and difusion, this problem is a smoothing problem, not filtering [ 1 ].

In [ 7 ], an approach to the implementation of the algorithm based on the use of a transformation matrix is considered. In this case, computational dificulties arise associated with the large dimension of the matrices under consideration. From the point of view of practical implementation, to solve this problem, it is required to specify a matrix — the covariance matrix of the observation errors and the initial values of the fluxes.

3. Statistical optimization

Let’s consider the problem of estimating a parameter which is constant in time, from observations on a given time interval. Let the concentration value 0 in a given region at the time 0 and a preliminary estimate of the parameter at a given time interval are known. There are observations on the time interval [0, , ]. It is required to estimate the values according to the observations.

If we consider the finite-diference analogue of the transport-difusion equation in operator form +1 = + , where is the time step number, 0 is the concentration value, and are the greenhouse gases fluxes from the Earth’s surface, then

− 1 +1 = ∏︁ 0 + ∑︁ ∏︁ = 10 + 2 = ˜ , =1

=1 =1 where ˜ = (1, 2). Let the observations at the time be related to the “true” value using the operator :

0 = + 0.

Observations can be represented as

0 = (︀ 10 + 2 )︀ + 0 = 0˜ + 0, where 0 — interpolation to the observation point and averaging along the vertical (in the case of satellite data, this is the “average kernel”), 0 is the random observation error with zero mean value and covariance matrix .

Let = [0, . . . , ] — observations over the entire time interval, ˜ = [0, . . . , ] — “generalized” observation operator: = ˜ + 0, 0 = [︀ 00, . . . , 0 ]︀ — random observation errors. As in [ 1 ], we will seek an estimate based on the minimum of the functional [ ] = ( − ) 1( − ) + ( − ˜ ) 2( − ˜ ), where — preliminary estimate (forecast), −1 1 and −2 1 are the covariance matrixes of forecast and observation errors.

Let’s consider an ensemble approach to solving this problem. We will assume that an ensemble of forecasts { , = 1, . . . , } is given. Then the ensemble of analysis values { , = 1, . . . , } has the form [ 1 ]

1 = +− 1 ︁( ˜ ︁) [︂

1 − 1 ˜ ︁( ˜ ︁) +˜︂] − 1(︁ ˜ +0 − ˜ )︁ , (1) where = {︁ 1 , . . . , }︁, = − ¯ , ¯ = ∑︀ /, {0 , = 1, . . . , } — =1 ensemble of random perturbations of observations corresponding to the covariance matrix ˜. The following relation is used for the estimation ˜ :

˜ ( ) = ˜ ( + ) − ˜ ( ).

The implementation of the vector estimation algorithm according to formula (1) requires the calculation of the inverse matrix of high dimension. For more eficient computations, an algorithm for the implementation of the stochastic ensemble Kalman filter (ensemble algorithm) [ 2, 3 ] can be applied. The condition for applying the ensemble -algorithm is the fulfillment of the relation between the matrices of the Kalman filter = ( − ˜ ) , where and — covariance matrices of forecast (first guess) and analysis (estimate) [ 11 ]. From formula (1), one can obtain the ratio for deviations from the mean value of the ensemble of analyzes: = + (0 − ˜ ). Hence, we can conclude that the required relation between the matrices is satisfied if the errors of observation and the forecast do not correlate. In this case, you can use the relation = ˜ ˜− 1 [ 11 ] and the formulas of the -algorithm can be applied [ 2, 3 ].

The algorithm can be used to estimate only the parameter (if it is assumed that 0 is given). In the case when 0 is known, we get

( ) = [︁˜ (0, + ) − ˜ (0, )]︁ .

The ensemble -algorithm is a stochastic Kalman filter [ 6, 12 ] in which the analysis step is performed only for the ensemble mean: ¯ = ¯ +

1 − 1 ︁( ˜ ︁) ˜− 1 [︁ − ˜ ¯ ]︁ . (2)

The ensemble of analysis errors — matrix of dimension ( × ), columns of which are vectors of dimension { , = 1, . . . , }, is obtained by transforming the ensemble of forecast errors — a matrix with columns { , = 1, . . . , } : = ( + Π )− 1 while = ˜ ˜− 1(˜ + ) = 1 + 2. where — matrix whose columns are equal to the vector 0 — the ensemble of observation errors, — the identity matrix. More detailed calculations are given in [ 2, 3 ].

The elements of matrix Π are calculated for given matrices ˜ and ˜ over an ensemble of forecast errors and do not depend on the grid node. This makes it possible to implement the algorithm locally. An eficient method for implementing the ensemble -algorithm was proposed in [ 3 ].

As can be seen from the formulas of the ensemble -algorithm, in order to use it in the case of the implementation of the above method, the following steps must be taken. 1. Forecast by values = (0, ) from time 0 to a time . 2. Calculation of residuals ( − ˜ ). 3. Calculation of ˜ = {︁1 , . . . , }︁. 4. Calculation of matrix of -algorithm. Calculation can be carried out sequentially in time: ˜ ˜− 1˜ = ∑︀ ( ()) − 1 (), = { 1 , . . . , }.

=1 5. Calculation of ˜ .

6. Estimation of = (0, ) by the formula (2).

4. Numerical experiments with a one-dimensional transport and difusion model

With the proposed algorithm based on the method of statistical optimization, numerical experiments with a 1-dimensional model of the transport and difusion of a passive impurity were carried out. The following equation was considered: ˜ + ˜ = 2 2˜ 2 + ˜(, ), where ˜ — the predicted variable, ˜(, ) is an unknown source of passive impurity. To solve the equation, the semi-Lagrangian method was used, with an implicit scheme in time and a scheme of central diferences in space. To solve the finite-diference analogue of the difusion equation, the cyclic sweep method was used. The equation was solved on the space interval (0, 1), while the periodic boundary conditions were considered. 240 grid points were set, = 1, 2 = 0.6 × 10− 3.

Let consider a finite-diference analogue of this equation in the form

+1 = + , where — linear operator, — the time step number.

The following numerical experiments were carried out with model data. The given initial values 0 , 0 were considered “true”. To obtain the initial data 0, 0 for forecasting by the model, a disturbance was added to the “true” initial data 0 = 0 + , ∼ (0, 0), 0 = 0 + , ∼ (0, 0). (, ) denotes a random variable distributed according to the normal law with a mathematical expectation equal to and variance equal to .

To organize numerical experiments, the following were set: an ensemble of initial fields: 0 = 0 + , ∼ (0, 0), = 1, . . . , ; 0 = 0 + , ∼ (0, 0), = 1, . . . , ; observations 0 = 0 + 0; 0 ∼ (0, 0); ensemble of observations with perturbations 0 = 0 + 0, 0 ∼ (0, 0), = 1, . . . , . Through denotes the number of elements of the ensemble. The observations were considered to be known throughout the integration area. In all numerical experiments = 20 was considered. In the analysis at grid node , observational data from the interval ( − , + ) were taken. In this case, in the analysis at the grid node , instead of the matrix , we took the matrix ′ = ∘ − 0.5(/)2 , where — distance between grid node and observation, “∘ ” — element-wise multiplication sign. In the experiments, we took the values = 5, = 5∆ (∆ — the grid step). This algorithm is called -localization. It is commonly used in ensemble methods to suppress spurious covariances at large distances due to the small size of the sample (ensemble) [ 12 ].

Numerical experiments were carried out to estimate the average fluxes over a given time interval. For this, the “true” values of the fluxes were set, as well as the initial estimate of the lfuxes (both equal to zero and nonzero). The forecast ensembles were modeled for a given time period with perturbed flux values. In this case, the ensemble -algorithm was implemented. The application of the ensemble -algorithm in the problem of transport and difusion of a passive impurity is described in detail in [ 5 ].

The following numerical experiments were carried out. In numerical experiments, the “assimilation window” = 10 time steps was used. To estimate the parameter values on a time interval {, . . . , +}, we used the observations of this time interval. Numerical experiments were carried out for the values: 0 = 0 = 0.01, 0 = 0.1, = 40.

In the formulation of the parameter estimation problem, it is assumed that the parameter does not change at the forecasting step. In the experiments, the “true” value of the parameter was taken constant in time, namely, it was set in the form of a discrete analogue of the function 0(), where 0() = ︂{ 0.1, if 0.375 ≤ ≤ 0.625,

0, else.

Two series of numerical experiments were carried out. In the first, 0 was considered given, the ensemble { } was modeled. In the second series, an ensemble of initial values was set {0 }. a b In this case, the ensemble of deviations from the mean corresponds to the deviation of the estimate from “true”. It should be noted that this did not afect the result, since the residual values in (2) are kept the same in both cases. In both cases, the root-mean-square error of the estimate (the diference between “true” value and estimated value) was 0.009. The results of the first series of experiments are shown in Figure 1. Figure 1, a shows the “true” value of the time-averaged emission, Figure 1, b shows the restoration of the value after applying the procedure described in Section 3. In this case, at the initial moment of time, the value of the estimated parameter was set equal to zero. As can be seen from the figures, the proposed algorithm makes it possible to estimate the flux of a passive impurity based on observations and forecast using the transport-difusion model even if the information about it at the initial moment is absent.

5. Conclusion

The task of assessing the fluxes of greenhouse gases from the Earth’s surface is currently being solved using data assimilation systems. In this case, models of the transport and difusion of a passive impurity in the atmosphere and meteorological fields of wind speed, temperature, etc. are used. The ensemble Kalman filter and ensemble Kalman smoother are increasingly used as a mathematical formulation of the problem. The article discusses an algorithm for estimating time-averaged values of greenhouse gases fluxes based on the ensemble approach and the theory of statistical optimization. The algorithm is economical and can be implemented locally in each given subdomain.

[1] Evensen

Data assimilation. The ensemble Kalman filter . Berlin Heideberg: SprigerVerlag, 2009 .

[2] Klimova

A suboptimal data assimilation algorithm based on the ensemble Kalman filter //

Quarterly Journal of the Royal Meteorological Society . 2012 . Vol. 138 . P. 2079 - 2085 .

[3] Klimova

E.G.

The Kalman stochastic ensemble filter with transformation of perturbation ensemble // Numerical Analysis and Applications . 2019 . Vol. 12 . No. 1. P. 26 - 36 .

[4] Klimova

E.G.

Bayesian approach to data assimilation based on ensembles of forecasts and observations // IOP Conf. Series: Earth and

Environmental

Science . 2019 . DOI: 10 .1088/ 1755 - 1315/386/1/012038.

[5] Klimova

E.G.

An eficient algorithm for stochastic ensemble smoothing // Siberian J . Num. Math. 2020 . Vol. 23 . No. 4. P. 381 - 393 .

[6] Nakamura

, Potthast

. Inverse modeling // IOP Publishing. 2015 . DOI: 10 .1088/978-0- 7503 -1218-9.

[7] Feng

, Palmer

P.I.

, Bosch

, Dance

Estimating surface CO2 fluxes from space-borne CO2 dry air mole fraction observations using an ensemble Kalman filter // Atmospheric Chemistry and Physics. 2009 . Vol. 9 . P. 2619 - 2633 .

[8] Feng

et al. Evaluating a 3-D transport model of atmospheric CO2 using ground-based, aircraft, and space-borne data // Atmospheric Chemistry and Physics. 2011 . Vol. 11 . P. 2789 - 2803 .

[9] Feng

et al. Estimates of European uptake of CO2 inferred from GOSAT X2 retrievals: Sensitivity to measurement bias inside and outside Europe // Atmospheric Chemistry and Physics. 2016 . Vol. 16 . P. 1289 - 1302 .

[10] Feng

et al. Consistent regional fluxes of CH 4 and CO2 inferred from GOSAT proxy XCH4:XCO2 retrievals, 2010 -2014 // Atmospheric Chemistry and Physics. 2017 . Vol. 17 . P. 4781 - 4797 .

[11] Jazwinski

A.H.

Stochastic processes and filtering theory . N.Y.: Academic Press, 1970 .

[12] Houtekamer

H.L.

, Zhang F. Review of the ensemble Kalman filter for atmospheric data assimilation // Monthly Weather Review . 2016 . Vol. 144 . P. 4489 - 4532 .