Neural Network Emulator Optimizer: A Preliminary
 Study on Korean Microphysics Parameterization Model⋆

    Sojung An[0000−0002−0170−1031] , Inchae Na, Tae-Jin Oh⋆⋆ , and Junghan Kim
                  Korea Institute of Atmospheric Prediction Systems
                      35, Boramae-ro, Dongjak-gu, Seoul, Korea
                     {sojungan, icna, oht, jhkim}@kiaps.org


        Abstract. Recent years have witnessed great progress in emulators based
        on neural network (NN). Current state-of-the-art emulators methods of-
        ten apply shallow NN to attain high performance in physics system,
        which brings a faster speed processing on resource-constrained environ-
        ments. Although several works have focused on improving accuracy in
        physics emulators, an effective and efficient method for tackling time con-
        suming problem of existing system on high-resolution remains lacking. In
        this paper, we propose a optimum NN emulator of a microphysics (MPS)
        parameterization scheme to effectively solve the problem in numerical
        weather prediction (NWP) model, Korea Integrated Model (KIM) in
        particular. Specifically, we adopt a shallow NN to build an intelligent
        emulator, which can learn the feature map and estimate the vertical
        MPS forcing increment profiles. This study mainly relies on two tech-
        nical contributions: (1) Optimization: reviewing and improving models
        for simulating non-linear parameters; (2) Feasibility: efficient computa-
        tion with minimal loss of physical information. We validate the pro-
        posed model with four seasons (10-day forecast at 200 seconds interval)
        of KIM. Results indicate that the proposed single-layer network shows
        best performance for emulating MPS in KIM. Our analyses will provide
        a guideline for optimal physical parameterizations modeling.


                     ·         ·               ·
        Keywords: Emulator Microphysics Physical Parameterization Neu-
        ral Network Feature Extraction
                                                                           ·
1      Introduction
In recent decades, there have been considerable improvements in terms of pre-
dictability in numerical weather prediction (NWP) models. Advancement in
computer hardware technology is considered one of the big drivers account-
ing for improvement in this area. However, increase in spatial-temporal reso-
lution for NWP models, which is critical for better predictability, requires ex-
ponential increase of computational power. Thus, utilizing even more computa-
tional resource and/or achieving better model code optimization is continually
⋆
   Copyright  ©     by the paper’s authors. Use permitted under Creative Commons Li-
   cense Attribution 4.0 International (CC BY 4.0). In: N. D. Vo, O.-J. Lee, K.-H. N.
   Bui, H. G. Lim, H.-J. Jeon, P.-M. Nguyen, B. Q. Tuyen, J.-T. Kim, J. J. Jung, T. A.
   Vo (eds.): Proceedings of the 2nd International Conference on Human-centered Arti-
   ficial Intelligence (Computing4Human 2021), Da Nang, Viet Nam, 28-October-2021,
   published at http://ceur-ws.org
⋆⋆
   Corresponding author.
                 Neural Network Emulator Optimizer: A Preliminary Study        111

needed. Parameterization of subgrid-scale physical processes has been an active
research area ever since the birth of NWP. There are extensive research activi-
ties on physics parameterization [3, 5, 6, 9, 13] utilizing Korea Integrated Model
(KIM), currently the operational NWP model in Korea, which is built upon non-
hydrostatic governing equations and discretized with spectral element and finite
difference method in horizontal and vertical dimensions, respectively [4]. The
physics module parameterizes subgrid-scale physical processes which cannot be
resolved in the grid-scale in NWP model. Subgrid-scale physics parameterization
development in most cases relies on observational data which are parameterized
to make it fit the observation. Also, physics parameterization is computed in
vertical columns in terms of the three-dimensional data grid structure and is
agnostic to the neighboring horizontal grids. These properties make machine
learning an ideal tool for developing physics parameterization as approximating
complex nonlinear mappings can be done effectively with machine learning, e.g.,
neural networks..
     Previous studies of parameterization based on machine learning can be cat-
egorized into developing emulation-based methods for accelerated calculation
[7, 8, 10–12, 16, 17] and developing new empirical parameterizations based on ob-
servation or high resolution model data [1]. Recently, a shallow Neural Network
emulator which covers the entire suite of physical parameterization has been
developed [1] which is based on a single hidden layer. Their model showed sat-
isfactory accuracy with a much faster execution in simulating nonlinear physics
modules compared to the original code. By exploiting faster computation of the
NN based emulator approach, NWP models can be run effectively on a higher res-
olution. There are competent studies using NN as emulators (e.g., Krasnopolsky
et al., Nadiga et al.) but studies regarding optimization of network architecture
are yet to be found. When artificial intelligence recognizes patterns in physics,
the structure of neurons, depth of layers, and the choice of activation functions
are important factors. For example, the Rectified Linear Unit (ReLU) given by
max(0, x) is the most often used activation function in the deep learning com-
munity. However, when it comes to physics-informed neural networks, ReLU is
reported to result in spurious wiggles in the computed derivatives represent-
ing fluid flows [15]. Thus, understanding the underlying physics of your target
system in detail is critical for effective network design.
     In this study, we explore various types of NN structure that represent physics
of the atmosphere. Specifically, we compare a series of NN models to test its
effectiveness with the combination of the following options: (i) numbers of each
layer, (ii) neuron structure, and (iii) activation function.


2   Related Work

This section describes several studies based on emulation similar to our pro-
posed model. We classify these studies into two groups: (i) single-physics and
(ii) unified-physics emulation.
112     An et al.

2.1   Single-physics emulations
The state-of-the-art models for physics emulator include a step that optimizes
hidden layer based on machine learning, or approximates these weights based
on various activation functions. Many previous studies demonstrated that the
NN emulator approach can be applied successfully to speed up the computation
of a single physics module. Machine learning is first used to emulate longwave
radiation for the European Centre for Medium-Range Weather Forecasts models
[2]. Krasnopolsky reduced computation time by one to two orders of magnitude
of decadal climate simulations [7, 8]. Also, the authors verified that the speed
of longwave radiation emulators based on NN can be increased by 50 to 80
times [11].
     Aside from these, multi-perceptrons have been proposed for predicting non-
linear phenomena in the physics of the atmosphere influenced by a number of
factors. To accelerate expensive radiative transfer computations, deep NN has
been applied to predict vertical profiles of longwave and shortwave radiative
fluxes in weather and climate models [12]. Veerman developed a parametriza-
tion to emulate a radiation parametrization based on multi-perceptron and leaky
relu [17]. Roh evaluated the forecast performance of radiation emulators with 300
to 56 neurons with sigmoid activation for cloud-resolving simulation [16]. The
Emulators surpassed the speed of the existing model in a single physics process.
However, when integrated with the main NWP model, the speed up effect was
limited [10].

2.2   An unified-physics emulation
Traditional emulation-based methods were parameterized based on NN by divid-
ing the physical process separately. Some errors arising from these models had
a significant effect on the accuracy of the overall NWP model. As all physical
processes closely interacts with each another, one error source in a particular
physics module can cause larger errors in other sub-physics. Belochitski pro-
posed a shallow NN based emulator of a complete suite of atmospheric physics
parameterizations [1]. The paper aims to learn all domains of physics intimately
connected. These methods learn an encoder to extract the physical features, and
maintain the representation consistency through minimizing the reconstruction
error between all physics. Their implementation can handle long-term numerical
integration as well as providing 3 times faster computation than the original
physics module. However, these advantages come with a huge computational
cost, and it is hard to train high-resolution data. Fortunately, the paper achieved
good results at a high resolution of 25km, despite their model being trained with
a 100km-resolution source data.


3     A Network for Emulation-based Parameterizations
This paper studies the problem of physics from the perspective of a NN to
propose an emulation-based algorithm.
                  Neural Network Emulator Optimizer: A Preliminary Study           113

3.1   Defining the Weighted Matrix
Physics emulator is to learn input features on a network to map the variables
into next input. Let X(t) and Y (t) represent the input and output at time t,
respectively. Denote X = (x0 , x1 , · · · , xn )T ∈ Rn as the input information related
to physics observed on the network and Y ∈ Rm as the output information,
where n and m are the number of input and output dimension, respectively.
The purpose of training is to find θ that maximizes the conditional probability
of input-output pairs in the training sets. On each t step, the decoder receives
the features of the previous input. If the model is trained to be predicting the
output with constantly updated input presented in the previous phase, we are
training function Φ successively as:
                    h                       i   h                               i
                Φ = x(t)
                       0 , · · · , x
                                    (t)
                                    n   ; θ   →   x
                                                    (t+1)
                                                    0     , · · · , x
                                                                      (t+1)
                                                                      m     ; θ   , (1)

where t denotes time step. Inputs and outputs consist of the same attributes for
each step and each output attributes are connected in a recursive manner. For
any xi , the inputs are calculated according to the following formula:
                          
                          
                           1,             if x > max
                          
                               x − min
                          
                     x′ =               , if min ≤ x < max                   (2)
                          
                           max   − min
                          
                          0,              otherwise

The min and max range of each attribute are set manually as physically al-
lowable values in KIM. We define the emulator network as a weight vector W
representing the state of the entire attributes and concatenate each value asso-
ciated variables as W . W ∈ Rn×m represents the weighted matrix:
                                                    
                                w1,1 w1,2 · · · w1,n
                               w2,1 w2,2 · · · w2,m 
                        W = .                          ,                    (3)
                                                    
                                       .. . .     . 
                               ..      .     . .. 
                                 wn,1 wn,2 · · · wn,m   n×m

where wi,j is the weights between input xi and output yj of physics.

3.2   Parameterization Network of the Physical Feature Extraction
Since the network is a matrix, for any value wi,j in the matrix, its output yi is
defined as the following:
                                   XM
                             yi =     x′i wi,j + bj .                        (4)
                                      i=1

Using the definition of layers, we can obtain the output matrix of the size n × m.
Assume that there are L network layers, where the 0th layer is the input layer
and the lth (1 ≤ l ≤ L) layers are fully connected layers. For any fully connected
114     An et al.

layer l ∈ [1, · · · , L], the output of the lth (1 ≤ l ≤ L) fully connected layer is
calculated using the following equation:
                                             sl−1
                                             X
                                Hr(l) = σ(          fθl Hs(l−1) ),                   (5)
                                             s=1

                                                                               (l)
where sl−1 denotes the number of parameters in the (l − 1)th layer; Hs ∈ Rsl−1
is the sth physics hidden parameter in the lth layer and σ(·) is the activation
function. The neuron structure can be divided into four following structures
according to the change in the number of each neuron:

                              α : sL < {s2 , · · · , sl−1 } ≤ s1
                              β : sL < s1 < {s2 , · · · , sl−1 }
                                                                                     (6)
                              γ : {s2 , · · · , sl−1 } < sL < s1
                              δ : sL ≤ {s2 , · · · , sl−1 } < s1 ,

where {S2 , · · · , Sl−1 } denotes the number set of hidden neurons connected to
                 (1)
input layer Hs , and sl is a number of output neurons. Activation functions


                 Table 1: The mechanisms of five activation approaches
                Types          Structures of activation function * References
                                           eθ − e−θ
           Hyperbolic tangent                                      [1, 7, 8, 12]
                                           eθ + e−θ
                                     
                                       θ,       if θ > 0
             Leaky ReLU                                                [17]
                                       0.01θ, otherwise
                                                    1
                Sigmoid                                                 [16]
                                                1 + e−θ
                 Swish                        θ · sigmoid                 -
       *
           Throughout the table, σ(θ) is the activation function of parameter θ.


tested in this study are described in Table 1. The sl−1 , number of l, and σ(·),
defined in this section, are chosen as optimized formula by experiments in the
next section. Finally, the errors between actual scores and the predicted scores
are minimized by L1-norm.


4     Evaluation

In this section, we evaluate the NN model for emulating physics among the
models proposed in section 3.
                  Neural Network Emulator Optimizer: A Preliminary Study         115

4.1   Datasets and Implementation Details

To verify the efficacy of the proposed methods, we conducted experiments with
generated MPS dataset for every 200 seconds using ERA5 reanalysis by latest
KIM version 3.6. MPS has been selected for this first study because the non-
linearity of MPS is hard to predict and it is one of the most time-consuming
part of the atmospheric physical processes. The input and output attributes
are shown in Table 2. The dataset consists of 4 sets with each season to con-
sider the temporal features. Dynamical core of the NWP is a spectral element
cubed-sphere nonhydrostatic model with a horizontally quasi-uniform resolution
of 100km, with 91 vertical layers (˜0.01hpa) on a hybrid sigma-pressure vertical
coordinate. Data is extracted randomly in each KIM forecasting data, and com-
posed of 6,998,688 training sets. We use the other 1,749,672 sets for evaluation.
Our methods are implemented using PyTorch [14] and optimized using Adam
(λ = 103 ). For fairness, we train all networks with a batch size of 128 for 500
epochs, on random sets.

Table 2: Summary of datasets used in MPS. The data consists of 822 dimension input
and 548 dimension output. Output, a part of the inputs, is changed every time-step in
MPS.
                                Input Layers                     Output Layers
       Attribute         # of index    # of profile      # of index      # of profile
     Grid distance            1               1                -              -
     Precipitation            2               1               1               1
          Snow                3               1               2               1
        Pressure            4:94            1:91               -              -
     Depth of layer        95:184           1:91               -              -
  Virtual temperature     186:276           1:91               -              -
     Temperature          277:367           1:91             3:93           1:91
   Specific humidity      368:458           1:91            94:184          1:91
     Mixing ratio *       459:822          1:364           185:548         1:364
* Mixing ratio of cloud, rain, ice, and snow


    We evaluate our emulation both quantitatively and qualitatively for estimat-
ing the optimal emulator. The results of this experiment are evaluated with three
metrics shown in Eq. (7): Root Mean Square Error (RMSE), mean absolute er-
ror (MAE), and Peak Signal-to-noise ratio (PSNR). Given NN output to ei and
KIM output to e¯i , the evaluation function can be written as:
                                    n
                             1X
                       MAE =       |ei − e¯i |
                             n i=1
                             v
                             u n
                             u1 X
                      RMSE = t        (ei − e¯i )2                               (7)
                               n i=1
                                                n
                                         max2 X
                      PSNR = 10 · log(          (ei − e¯i )2 ),
                                          n i=1
116      An et al.

4.2    Case Analysis


For comparison, we set a benchmark case that consists of 2-layers NN based on
a hyperbolic tangent and the neuron structure of γ. Table. 3 depicts the average
accuracy of emulations with different approaches. In our first experiment, we


               Table 3: MPS output result between KIM and NN emulator
                Network Structure (S)           Depth of Layer (L)     Ativation Function (σ)
                 α     β      γ       δ     1       2      3     4   tanh   relu sigmoid swish
MAE (E-04)     2.46 1.44 3.49       2.08    0.74 2.08 5.22 7.45 2.08 3.67         8.68    2.14
RMSE (E-04)    9.96 0.67 12.27      7.71    1.72 7.71 17.52 24.81 7.71 11.74     25.53    7.79
PSNR          30.56 32.61 29.55    31.83   38.39 31.83 27.96 26.41 31.83 29.66   26.14   31.72
* α, β, γ, and δ are structures of neuron described in section 3.


evaluate the impact of learning each structure for estimating the optimal depth
of NN layers. As L is 2 in the benchmark case, we set the hidden neurons to
αs2 → 822, βs2 → 1096, γs2 → 400, and δs2 → 548, respectively. The PSNR
result indicates that the performances of emulation approaches by applying the
β method increased from 29.55 (the γ method commonly used in emulation) to
32.61 of the average accuracy of emulation. The β structure also shows overall
best performance compared to other network structures as shown in the evalu-
ation metric scores. When the number of hidden neurons decreases relative to
the input, such as in the γ structure, loss of latent features is inevitable. These
losses can cause uncertainties of physics, so the hidden neurons must be greater
than or equal to the number of output neurons.
    Let us compare the impact of learning according to layer depth and activa-
tion function. All networks to which the deep layer was applied except for the
single-layer applied the fully connected layer, and then proceeded with setting
the activation function and dropout (0.2). The result of approximation errors
presented above shows that the shallow NN is capable of providing NN emu-
lations with small error. Especially, single-layer yields best performance com-
pared to other cases. As for the activation, the Leaky ReLU with a bend in the
slope shows the worst performance, consistent with previous studies. The results
demonstrate that hyperbolic tangent activation for emulating MPS is effective,
providing accurate and visually promising results. The swish activation func-
tion operated well as hyperbolic tangent with 2.14(·10−4 ) error. Generally, good
results are obtained from functions with a soft and high gradient.
    Finally, we compare the original KIM MPS output with the NN emulator
output. Fig. 1 illustrates the correlations between outputs from the NN-based
emulation with the outputs of the MPS. The single-layer-based emulator, which
is the best result in Table 3, was used to emulate the output of the MPS. The fig-
ure is outputs according to 5-day input data integrated by KIM in consideration
of spin-up.
                   Neural Network Emulator Optimizer: A Preliminary Study            117


Fig. 1: Scatter plot (physics output vs. NN output) for output attributes. The scatter
plot compares the first profile of the attributes shown in Table 2. The x-axis is physics
and the y-axis is the NN value.
    Average of Correlation coefficients is 0.998 (Precipitation: 0.989, Snow: 0.989,
Temperature: 0.999, Specific humidity: 0.999, and Ratio: 997). Emulators with
shallow NN also verified similar results. KIM MPS based, NN emulation, and
their difference in precipitation distribution is shown in Fig. 2. Precipitation
simulations of KIM and NN are shown in (a) and (b); (c) is the difference of the
simulations, with single-layer emulator. The precipitation distributions for the
KIM and NN emulator runs are very similar showing little difference as shown
in (c) in Fig. 2.


Fig. 2: The output of precipitation and the difference between the output and MPS
precipitation output. The range of the color bar on the left is 0 to 5(·10−2 ), and the
color bar on the other side is −1.8(·10−4 ) to 1.8(·10−4 ).
118     An et al.

5     Conclusion and Future Work
As information technology evolves, the tasks of accelerating physics parameter-
ization and increasing the resolution of NWP have been gaining importance.
Traditional emulator methods tend to design only on the overall composition
such as a kind of input data. This paper focused on building an optimal NN tar-
geted according to the detail of network settings for physics emulation. Various
experiments were carried out to design the detailed structure of the network,
and we tried to design a network that can understand the characteristics of the
physics. Specifically, to overcome the drawbacks of existing models, we have at-
tempted methods for optimizing NN details (i.e., neuron structure, number of
layers, and activation function). Through experiments parameterization of the
MPS using a single layer showed the best performance.
    In the case of MPS, the usage of deep-layer does not bring the best result but
it may bring great results in other physics parameterizations. The depth of layer
can be changed according to physics patterns and other elements (e.g. number
of profiles, resolution, parameterization type, so on) used by the NWP model.
So we want an emulation that feasible not only single physics, but also across
physics module. Ideally, we will achieve unified-physics emulation by considering
different physical patterns, leveraging dynamic NN modeling.


Acknowledgement
This work was carried out through the R&D project ”Development of the Next-
generation Operational System of Korea Institute of Atmospheric Prediction
Systems (KIAPS)”, funded by Korea Meteorological Administration (KMA2020-
02213).


References
1. Belochitski, A., and Krasnopolsky, V.: Stable Emulation of an Entire Suite of
   Model Physics in a State-of-the-Art GCM using a Neural Network, arXiv preprint
   arXiv:2103.07028, (2021).
2. Chevallier, F., Chéruy, F., Scott, N. A., and Chédin, A.: A neural network approach
   for a fast and accurate computation of a longwave radiative budget. Journal of
   applied meteorology, 37(11), 1385-1397 (1998).
3. Han, J. Y., Hong, S. Y., Sunny Lim, K. S., and Han, J.: Sensitivity of a cumulus
   parameterization scheme to precipitation production representation and its impact
   on a heavy rain event over Korea. Monthly Weather Review, 144(6), 2125-2135
   (2016). 144, 2125-2135
4. Hong, S. Y. et al.,: The Korean Integrated Model (KIM) system for global weather
   forecasting. Asia-Pacific Journal of Atmospheric Sciences, 54(1), 267-292 (2018)
5. Kim, E. J., and Hong, S. Y.: Impact of air-sea interaction on East Asian sum-
   mer monsoon climate in WRF. Journal of Geophysical Research: Atmospheres,
   115(D19) (2010).
6. Koo, M. S., Choi, H. J., and Han, J. Y.: A parameterization of turbulent-scale and
   mesoscale orographic drag in a global atmospheric model. Journal of Geophysical
   Research: Atmospheres, 123(16), 8400-8417 (2018).
                  Neural Network Emulator Optimizer: A Preliminary Study            119

7. Krasnopolsky, V. M., Fox-Rabinovitz, M. S., and Chalikov, D. V.: New Approach
   to Calculation of Atmospheric Model Physics: Accurate and Fast Neural Network
   Emulation of Longwave Radiation in a Climate Model, Monthly Weather Review
   133(5), 1370–1383 (2005).
8. Krasnopolsky, V. M., Fox-Rabinovitz, M. S., and Belochitski, A. A.: Decadal Cli-
   mate Simulations Using Accurate and Fast Neural Network Emulation of Full,
   Longwave and Shortwave, Radiation, Monthly Weather Review, 136(10), 3683–3695
   (2008).
9. Lee, E. H., Lee, E., Park, R., Kwon, Y. C., and Hong, S. Y.: Impact of turbulent mix-
   ing in the stratocumulus-topped boundary layer on numerical weather prediction.
   Asia-Pacific Journal of Atmospheric Sciences, 54(1), 371-384 (2018).
10. Morcrette, J. J., Mozdzynski, G., and Leutbecher, M.: A reduced radiation grid
   for the ECMWF Integrated Forecasting System, Monthly weather review, 136(12),
   4760-4772 (2008).
11. Nadiga, S., Krasnopolsky, V., Bayler, E. J., Mehra, A., Kim, H. C., and Behringer,
   D.: Neural Network Technique For:(a) Gap-Filling Of Satellite Ocean Color Obser-
   vations, And (b) Bridging Multiple Satellite Ocean Color Missions. In AGU Fall
   Meeting Abstracts, 2015, IN43C-1755 (2015).
12. Pal, A., Mahajan, S., and Norman, M. R.: Using deep neural networks as cost-
   effective surrogate models for super-parameterized E3SM radiative transfe,. Geo-
   physical Research Letters, 46(11), 6069-6079 (2019).
13. Park, R. S., Chae, J. H., and Hong, S. Y.: A revised prognostic cloud fraction
   scheme in a global forecasting system. Monthly Weather Review, 144(3), 1219-1229
   (2016).
14. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., and Chintala,
   S.: Pytorch: An imperative style, high-performance deep learning library, Advances
   in neural information processing systems, 32, 8026-8037 (2019).
15. Raissi, M., Yazdani, A., and Karniadakis, G. E.: Hidden fluid mechanics: Learning
   velocity and pressure fields from flow visualizations. Science, 367(6481), 1026-1030
   (2020).
16. Roh, S., and Song, H. J.: Evaluation of neural network emulations for radiation
   parameterization in cloud resolving model. Geophysical Research Letters, 47(21),
   e2020GL089444 (2020).
17. Veerman, M. A., Pincus, R., Stoffer, R., Van Leeuwen, C. M., Podareanu, D.,
   and Van Heerwaarden, C. C.: Predicting atmospheric optical properties for radia-
   tive transfer computations using neural networks. Philosophical Transactions of the
   Royal Society A, 379(2194), 20200095 (2021).