Extrapolation Problems in Kinetic Modelling of Catalytic
                                Reactions with Neural Networks
                                Aleksandr Fedorov∗,† , David Linke†
                                Leibniz-Institut für Katalyse e. V., Albert-Einstein-Straße 29a, 18059 Rostock, Germany


                                                  Abstract
                                                  In the present study we investigated the behaviour of different types of neural networks in data extrapolation. The application
                                                  is modelling a tubular chemical reactor in which a heterogeneous catalytic reaction (CO2 hydrogenation to methanol) is
                                                  performed. Since data are slow and expensive to measure we focused on small data sets for training. The different models
                                                  (feed-forward neural network (NN), physics-informed NN, neural ordinary differential equation (ODE), kinetics-constrained
                                                  neural ODE) were trained in a way to achieve approximately the same values of loss function in the cross-validation. Although
                                                  the obtained models have the same generalization ability, the extrapolation capability varies significantly. Wherein, a neural
                                                  network model that is additionally constrained by the general chemical and chemical engineering knowledge demonstrated
                                                  much better extrapolation ability compared to unconstrained models. Methods how to validate the generalization of the
                                                  neural network kinetic models without using additional experimental data were suggested and discussed.

                                                  Keywords
                                                  neural ODE, physics-informed neural networks, kinetics-constrained neural ODE, kinetic modelling, CO2 hydrogenation


                                1. Introduction                                                                                                  development of a wide range of different types of neural
                                                                                                                                                 network architectures [4, 5, 6, 7, 8] that are used in var-
                                The kinetic model development of catalytic reactions is ious fields. However, the overfitting problem of neural
                                one of the difficult but important part of chemical engi- network models limits their wide usage especially in the
                                neering for the process and industrial plant simulation case of small data that is typical available in the kinetic
                                [1]. The kinetic models of chemical reactions are ide- model development. Strategies for improving the neural
                                ally based on the knowledge of the reaction mechanisms. network models significantly have been proposed, for
                                Due to complexity of the reaction mechanisms and the example, by using the knowledge about the process via
                                existence of a lot of parameters which have to be es- modifying the loss function [5, 9, 10] or by constraining
                                timated from the data in data-driven kinetic modelling, the architecture of the models [11, 12, 13]. In our recently
                                several assumptions (postulating rate-determining stages, published work [13], we suggested a new approach for
                                quasi-equilibrium approximation etc.) are usually used kinetic modelling with NNs. The approach is based on
                                to convert the mechanisms to a set of reaction rate ex- constraining neural ODE models by the general chemical
                                pressions for describing the dynamic of the process [2]. and chemical engineering knowledge. This enabled us to
                                The complexity of this traditional approach is related obtain reliable NN models describing the complex process
                                to the necessity of having deep knowledge of the reac- of CO2 hydrogenation using only small data. During the
                                tion mechanism as well as the difficulty of performing method development we have found that different neural
                                the screening of different possible assumptions. Notably, network models demonstrating the similar generaliza-
                                solving the inverse kinetic task (the estimation of kinetic tion ability in the cross validation differed significantly
                                model parameters from the data) is still a nontrivial task in their data prediction and extrapolation capabilities.
                                in the modelling process [3]. For these reasons, a ma- Here, it is worth noting the difference between the gen-
                                chine learning approach is an attractive alternative for eralization and extrapolation ability of the NN models.
                                kinetic model development because only the data are The generalization refers the ability to fit the test data
                                needed to develop the models.                                                                                    that lies within the bounds of training dataset. By the
                                              Neural networks are ones of popular methods of ma- extrapolation we mean in general the ability to predict
                                chine learning due to their flexibility which led to the the data at different residence time that lies out of the
                                                                                                                                                 bounds of training data.
                                ITAT’24: Information technologies - Applications and Theory, Septem-
                                ber 20–24, 2024, Drienica, Slovakia                                                                                 To investigate the observed phenomena in more de-
                                ∗
                                     Corresponding author.                                                                                       tails, in the present study we investigated the ability of
                                †
                                    These authors contributed equally.                                                                           different types of neural networks in the extrapolation
                                Envelope-Open Aleksandr.Fedorov@catalysis.de (A. Fedorov);                                                       of the kinetic catalytic data. For this, we generated small
                                David.Linke@catalysis.de (D. Linke)                                                                              and large data sets for model training, validation, as well
                                Orcid 0000-0001-6434-6623 (A. Fedorov); 0000-0002-5898-1820
                                                                                                                                                 as for testing the model ability to extrapolate. The follow-
                                (D. Linke)
                                                    © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License ing different types of NN model were used: feed-forward
                                            Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
neural networks [4], physics-informed neural networks
                                                                                                 2073
[5], neural ordinary differential equations (neural ODEs)                      log10 𝐾𝑒𝑞,1 = −        + 2.029,          (7)
                                                                                                   𝑇
[6], kinetics-constrained neural ODE [13]. To compare
the neural networks based modelling approach with the                                     3066
                                                                               log10 𝐾𝑒𝑞,2 =    − 10.592.           (8)
traditional one, we also developed a simple power-law                                       𝑇
kinetic model assuming first order with respect to the             The rates constant were presented in the form of re-
reagents in the reaction rate expressions.                       parameterized Arrhenius equation:
                                                                                                 𝐸 1   1
2. Experimental part                                                         𝑘𝑖 (𝑇 ) = 𝑘𝑖,0 exp ( 𝑖 ( − )),             (9)
                                                                                                 𝑅 𝑇0 𝑇
2.1. Data generation                                   where 𝑘𝑖,0 is the rate constant of reaction 𝑖 at the reference
                                                       temperature 𝑇0 that was set to 573.15 K; 𝐸𝑖 - activation en-
For the data generation, the kinetic model of 𝐶𝑂2 hy- ergy of the reaction constant 𝑖; R - universal gas constant;
drogenation to methanol was used from the work [14]. T - temperature. The adsorption equilibrium constants
This kinetic model was based on the following chemical were presented in the form of Van’t Hoff equation:
reactions:
                                                                                        Δ𝐻 1        1
                                                                    𝐾𝑖 (𝑇 ) = 𝐾𝑖,0 exp ( 𝑖 ( − )),                (10)
             𝐶𝑂2 + 3𝐻2 ⇆ 𝐶𝐻3 𝑂𝐻 + 𝐻2 𝑂,            (1)                                   𝑅 𝑇0 𝑇

                                                                 where 𝐾𝑖,0 is the adsorption equilibrium constant at the
                 𝐶𝑂2 + 𝐻2 ⇆ 𝐶𝑂 + 𝐻2 𝑂,                     (2)   reference temperature 𝑇0 and Δ𝐻𝑖 is the molar change of
                                                                 the enthalpy. The parameters of the model were taken
               𝐶𝑂2 + 4𝐻2 ⇆ 𝐶𝐻4 + 2𝐻2 𝑂.                    (3)   from the original work [14].
                                                                    To generate the data for the present study, the follow-
   A dual-site Langmuir-Hinshelwood kinetic model was
                                                                 ing system of ODE was integrated:
used for describing the dynamics of the suggested reac-
tions:                                                                                 𝑑𝐹𝑖    𝑁
                                                                                           = ∑ 𝜈𝑖𝑗 𝑟𝑗 ,                (11)
                                                                                       𝑑𝜏    𝑗=1
                                     𝑃𝐶𝐻3 𝑂𝐻 𝑃𝐻2 𝑂
                  𝑘1 (𝑃𝐶𝑂2 𝑃𝐻3 2 −      𝐾𝑒𝑞,1
                                                   )            where 𝐹𝑖 is the molar flow of the compound 𝑖; 𝜏 is the res-
       𝑟1 =                                            ,   (4)  idence time, 𝑁 - is the number of the chemical reactions;
               𝑃𝐻2 2 (1 + 𝐾𝐶𝑂2 ⋅ 𝑃𝐶𝑂2 ) (1 + √𝐾𝐻2 ⋅ 𝑃𝐻2 )       𝜈𝑖𝑗 is the stoichiometric coefficient of the compound 𝑖 in
                                                                the reaction 𝑗 (positive for the products and negative for
                                         𝑃𝐶𝑂 𝑃𝐻2 𝑂              reagents). The system of ODE represents the mathemati-
                         𝑘2 (𝑃𝐶𝑂2 𝑃𝐻2 − 𝐾          )            cal model of 1D plug flow reactor for catalytic reaction
                                            𝑒𝑞,2
       𝑟2 =                                               , (5) assuming the absence of heat and mass transfer limita-
              √𝑃𝐻2 (1 + 𝐾𝐶𝑂2 ⋅ 𝑃𝐶𝑂2 ) (1 + √𝐾𝐻2 ⋅ 𝑃𝐻2 )         tions. The integration of the system 11 allows one to
                                                                estimate the molar flows of compounds along the reactor
                               𝑘3 √𝑃𝐶𝑂2 𝑃𝐻2
           𝑟3 =                                        ,    (6) (at different residence time).
                 (1 + 𝐾𝐶𝑂2 ⋅ 𝑃𝐶𝑂2 ) (1 + √𝐾𝐻2 ⋅ 𝑃𝐻2 )               Table 1 shows the reaction conditions used for gener-
                                                                ating the training data. For each reaction conditions, 5
where 𝑟𝑖 is the rate of the chemical reaction 𝑖; 𝑃𝑖 is the par-
                                                                points of the residence time (0.05, 0.1, 0.2, 0.4, 1.0) were
tial pressure of compound 𝑖; 𝑘𝑖 is the rate constant of the
                                                                used for the data generation. Thus, the total number of
reaction 𝑖; 𝐾𝑒𝑞,𝑖 - the equilibrium constant of the reaction
                                                                different reaction conditions was 13 ⋅5 = 65. The total
𝑖; 𝐾𝐶𝑂2 - the equilibrium constant of 𝐶𝑂2 adsorption; 𝐾𝐻2
                                                                inlet molar flow was set to a constant value of 1 for each
- the equilibrium constant of dissociative adsorption of
                                                                reaction conditions (only CO2 and H2 were used in the
𝐻2 . It is worth mentioning that the thermodynamic part
                                                                inlet flow). Thus, the dataset represent the dependencies
of the reaction 3 was removed from the equation in the
                                                                between the outlet molar flows of each compound (ob-
present study because the equilibrium constant of this re-
                               ∘                 ∘              tained by the integration of the system of ODE 11 ) and
action is very high (Δ𝑟 𝐺 < 0 and |Δ𝑟 𝐺 |/𝑅𝑇 ≈ 19.6 ≫ 1,
                                                                the reaction condition (initial molar flows of 𝐶𝑂2 and 𝐻2 ,
𝐾𝑒𝑞,3 = exp (−Δ𝑟 𝐺 ∘ /𝑅𝑇) ≈ 3.3 ⋅ 108 ). For comparison, the
                                                                temperature, total pressure, and the residence time). The
values of equilibrium constants of the reactions 1 and 2
              −2               −6                               partial pressure of compounds (required for the estima-
are 2.6 ⋅ 10 and 5.7 ⋅ 10 , respectively. The values are
                                                                tion the reaction rates 4, 5, and 6.) was estimated by the
given for 573.15 K. The following equations were used to
                                                                following equation assuming the ideal gas law:
estimate the values of the equilibrium constants of the
reactions 1 and 2:
Table 1                                                          note that the architecture and parameters of NN models
The reaction conditions used for the training data generation.   (the number of layers and neurons, a type of activation
                                                                 function etc.) were chosen to achieve approximately the
   #    Temperature, ∘ C    Pressure, bar    H2 :CO2 ratio
                                                                 same value of loss function (Equation 17 ) for 10-fold
  1            200                30               3             cross-validation during the model training.
  2            250                30               3                The first model was the simple feed-forward neural
  3            300                30               3             network denoted as NN. This model has 4 inputs (temper-
  4            350                30               3             ature, pressure, the inlet fraction of CO2 and the residence
  5            400                30               3
                                                                 time). Temperature was transformed in the following
  6            300                20               3
  7            300                25               3             form and used as an input for neural network models:
  8            300                35               3
  9            300                40               3
                                                                                            1.1 ⋅ 104 1  1
                                                                              𝑇𝑖𝑛 = exp [            ( − )].            (15)
  10           300                30              1.5                                           𝑅     𝑇0 𝑇
  11           300                30               2             By presenting the temperature in a such way, 𝑇𝑖𝑛 was in
  12           300                30               4             a range of ≈ 0.61-1.41. The inlet molar fraction of carbon
  13           300                30               6                      𝑖𝑛 was calculated by the following equation:
                                                                 dioxide 𝑥𝐶𝑂 2

                                                                                        𝑖𝑛 =        1
                                                                                       𝑥𝐶𝑂               ,              (16)
                                                                                           2     1+𝑚
                               𝑃 ⋅ 𝐹𝑖
                         𝑃𝑖 =         ,                 (12) where 𝑚 is the H2 :CO2 ratio. The pressure was normal-
                               ∑𝑗 𝐹𝑗                         ized by dividing by the maximum value of 40 bar. The
where 𝑃 is the total pressure; 𝑃𝑖 - the partial pressure of NN model has 4 outputs for the molar flow of CO2 , CO,
compound 𝑖; 𝐹𝑖 - the molar flow of compound 𝑖.               CH4 , and CH3 OH. The NN model has 1 hidden layer with
   The 𝐶𝑂2 conversion 𝑋𝐶𝑂2 and selectivity of 𝑖- 20 nodes and hyperbolic tangent activation function as
compound 𝑆𝑖 , which are the commonly used measures of well as the output layer with sigmoid activation function.
reactor and catalyst performance, were estimated by the To train the NN model, the following loss function was
following equations that relate the state at reactor inlet used:
to the state at reactor outlet:
                                                                                                     𝑝𝑟𝑒𝑑        𝑒𝑥𝑝 2
                                                                                                   𝐹𝑖      − 𝐹𝑖
                              𝑖𝑛 − 𝐹 𝑜𝑢𝑡
                             𝐹𝐶𝑂                                        𝑙𝑜𝑠𝑠1 =        ∑         (                  ) , (17)
                                 2      𝐶𝑂2                                                             𝐹𝑖,𝑠𝑐𝑎𝑙𝑒
                   𝑋𝐶𝑂2 =          𝑖𝑛
                                            ,           (13)                      𝑖∈training set
                                 𝐹𝐶𝑂  2                               𝑝𝑟𝑒𝑑
                                                             where 𝐹𝑖       is the molar flow of the compound 𝑖 in the
                               𝐹𝑖𝑜𝑢𝑡                         outlet predicted
                                                                                                       𝑒𝑥𝑝
                                                                                 by the model; 𝐹𝑖 is the molar flow of
                     𝑆𝑖 = 𝑖𝑛          𝑜𝑢𝑡
                                          .             (14)
                           𝐹𝐶𝑂2 − 𝐹𝐶𝑂2                       the compound 𝑖 in the outlet simulated by the kinetic
                                                             model; 𝐹𝑖,𝑠𝑐𝑎𝑙𝑒 is a characteristic scale for compound 𝑖 that
   To validate and investigate the extrapolation ability was calculated based on the maximum value for the molar
of the kinetic models, 4 different test datasets were gen- flow of compound 𝑖 in the training dataset. This scaling
erated. The 4 test datasets differed in the range of the was also helpful to mitigate the stiffness of neural ODE
residence time selection (0-1, 1-2, 2-5, and 5-10) to inves- models [15] during the model training.
tigate the extrapolation ability of the investigated kinetic    The second model was a physics-informed neural net-
models. Each dataset had 1000 points which were gener- work (PINN). The architecture of this model was similar
ated by randomly selecting values from the ranges of the to the NN model except for a modification of the outlet
reaction conditions (temperature in 200-400 ∘ C, pressure of the NN model:
in 20-40 bar, H2 :CO2 ratio in 1.5-6.0) using a uniform
distribution. To train physics-informed neural network                                         𝑜𝑢𝑡𝑖𝑁 𝑁 ⋅ 𝑥𝐶𝑂𝑖𝑛
model, an additional dataset (labeled as the zero set) was                      𝑜𝑢𝑡𝑖𝑃𝐼 𝑁 𝑁 =              𝑁𝑁
                                                                                                               2
                                                                                                                 ,      (18)
created. The zero set was generated by a similar pro-                                            ∑ 𝑗  𝑜𝑢𝑡𝑗
cedure applied for the test data generation except for
                                                             where 𝑜𝑢𝑡𝑖𝑃𝐼 𝑁 𝑁 is the 𝑖-outlet of PINN model; 𝑜𝑢𝑡𝑖𝑁 𝑁 is
setting the residence time to be 0. The zero dataset had
                                                             the 𝑖-outlet of NN model. For the training PINN model,
also 1000 points.
                                                             we used the knowledge that there is no change in the
                                                             molar flows of compounds when the residence time is to
2.2. Model architecture and training                         equal to 0. For this, an additional zero dataset was also
                                                             used for training of PINN model and the corresponding
Different variants of artificial neural networks were used
                                                             loss function was:
to fit the data to CO2 hydrogenation. It is important to
                                                                       conventional kinetic model that is based on the assump-
                                              𝑝𝑟𝑒𝑑     𝑒𝑥𝑝 2           tion that first order kinetics can sufficiently describe the
                                            𝐹𝑖     − 𝐹𝑖
        𝑙𝑜𝑠𝑠2 = 𝑙𝑜𝑠𝑠1 +      ∑          (                 ) .   (19)   reactions. The PL model was selected as baseline model
                           𝑖∈zero set          𝐹𝑖,𝑠𝑐𝑎𝑙𝑒
                                                                       and used for the comparison with the NN models. The
   The next model (NODE) was based on the neural ODE                   loss function 17 was used for training KCNODE and PL
[6] that uses neural networks for approximating the right              models.
part of ODEs:

                    𝑑𝐹𝑖
                                                                       2.3. Hardware and software specifications
                        = ANN(𝑃𝑖 , 𝑇 ),             (20)
                    𝑑𝜏                                                 Dormand-Prince-Shampine method (DOPRI5) method
where ANN is the feed-forwards neural network. The                     was used for the integration of the neural ODE [16].
inputs of ANN were the partial pressure of compounds                   Training neural network models was carried out by min-
and the re-parameterized temperature (Equation 15). The                imizing the loss functions using ADAM [17] optimizer
NODE model has 7 inputs (partial pressure of compounds                 with a learning rate of 0.005. L2 -regularization was used.
- CO2 , CO, CH4 , CH3 OH, H2 , H2 O as well as the re-                 The parameter of the regularization was set to 10−6 . All
parameterized temperature (Equation 15)). The NODE                     calculation were implemented in the Python program-
model consists of two hidden layers with hyperbolic tan-               ming language (version 3.9.12) [18]. The scientific li-
gent and exponential activation functions. The number                  braries NumPy [19] (version 1.23.0), SciPy [20] (version
of nodes was 3 for both layers. The output of the ANN                  1.8.1), Pandas [21] (version 1.23.0), Scikit-learn [22] (ver-
was the linear function. The loss function 17 was used                 sion 1.1.1) were used for data analysis and evaluation.
for training NODE model.                                               Pytorch [23] (version 1.12.0) and Torchdyn [24] (version
   The kinetics-constrained neural ODE approach was                    1.0.3) were used for building and training neural net-
used for developing the fourth model (KCNODE). The                     works models. Matplotlib [25] (version 3.7.2) was used
idea of the approach is to use the general knowledge                   to visualize the results.
about the process and approximate the rates of chemical
reactions by the following equation:
                                                                       3. Results and Discussion
                                     𝑄                        The first step of the present study was to obtain the
                       𝑗
    𝑟𝑖 = 𝑘𝑖 (𝑇 ) ⋅ ∏ (𝑃𝑖 ) ⋅ (1 −         ) ⋅ ANN(𝑃𝑖 , 𝑇 ),     (21)
                                                              trained neural network models with the similar gener-
                                    𝐾𝑒𝑞,𝑖
                  𝑗                                           alization ability. In our work, we have chosen the value
                                                              of the loss function in the 10-folds cross validation (CV)
where 𝑘𝑖 (𝑇 ) is the rate constant of the reaction 𝑖 defined
                    𝑗                                         as a metric of the generalization. Firstly, we estimated
by Equation 9; 𝑃𝑖 is the partial pressure of the reagent 𝑗 the CV value for the PL model which was 1.7 ⋅10−3 . To
in the reaction 𝑖. 𝑄 - the reaction quotient. For example, achieve a similar CV value of the loss function for all
in the case of reaction 1 the corresponding quotient is:      neural network models, we trained our models by mini-
                           𝑃𝐶𝐻3 𝑂𝐻 𝑃𝐻2 𝑂                      mizing the corresponding loss function. Figure 1 shows
                      𝑄1 =          3
                                         .               (22) the dependency between the value of 10-folds CV of the
                             𝑃𝐶𝑂2 𝑃𝐻2                         loss function and the number of epochs. It can be seen
                                                              that the CV loss function decreases with increasing num-
   The KCNODE model consists of 1 hidden layer with ber of epochs for all the neural network models except
3 nodes (hyperbolic tangent activation function) and the for the NODE one where we observe a significant rise in
outlet layer which has sigmoid activation function. The the value of the loss function after around 2000 epochs.
sigmoid function is chosen for the output layer because From the obtained data we have found the number of
it ensures positive value, and, thus, Equation 21 aligns iterations which is needed for training to achieve the cor-
with thermodynamic. So, if (𝑄 < 𝐾𝑒𝑞 ), the rate is positive responding value of loss function (1.7 ⋅10−3 in our case).
(forward reaction), and if (𝑄 > 𝐾𝑒𝑞 ), the rate is negative It is worth noting that we did not manage to achieve
(backward reaction).                                          this target value for the NODE model. For this model,
   It is worth noting that the activation energies and the the minimum value of the loss function (2.1 ⋅10−3 ) was
rate constants were also parameters of the KCNODE therefore chosen which is still close to the targeted value.
model and were varied during the training of the neural Thus, we managed to obtain a set of different neural
network model, along with the weights and biases of the network models with similar generalization ability.
neural network layers.                                           To validate the generalization ability of machine learn-
   The last model denoted as Power-Law (PL) model re- ing models, an additional test dataset is typically used.
places the ANN part in the KCNODE model by simple However, for kinetic model development, chemical and
power law rate expressions. Thus, it represents a simple
                                                                models describe the solution of the system of ODE but
                                                                not the kinetic model represented in the form of rate
                                                                equations. This imposes serious restrictions on using
                                                                the resulting models for up-scaling (modelling another
                                                                type of reactor or extending the models by adding the
                                                                diffusion/heat transfer).
                                                                   The neural ODE approach does not have such a lim-
                                                                itation since the neural ODE models represent the ap-
                                                                proximation of the right part of ODE. From Table 2 one
                                                                can see that better extrapolation ability is observed for
                                                                neural ODE models (NODE and KCNODE) compared
                                                                to NN and PINN ones. Wherein, the loss function for
                                                                the dataset generated with a residence time range of 5-
                                                                10 was 0.66 for KCNODE model and around 5 times
                                                                lower then the one for NODE model (3.4). Thus, the neu-
                                                                ral ODE model additionally constrained by the general
Figure 1: The dependencies between the value of 10-fold chemical and chemical engineering knowledge demon-
CV and the number of epochs during training for different strates much better extrapolation ability. To compare
models. In the case of neural ODE-based models (NODE and the KCNODE model with a traditional approach, Ta-
KCNODE), the number of epochs was multiplied with 10 for
                                                                ble 2 shows the values of loss function for PL model.
better visualization. The training process was repeated 5 times
to estimate the standard deviation that is shown as error bar
                                                                One can see that both KCNODE and PL models show
in the figure.                                                  similar predicting ability in fitting the test data. In ad-
                                                                dition, another kinetics-constrained neural ODE model
                                                                (KCNODE*) was obtained by training after 10000 epochs
                                                                to achieve the minimum value of the CV loss function to
engineering knowledge can be utilized for validation. compare with other models. The values of loss function
We simulated virtual data using the obtained models and for different test data are also presented in Table 2. On
compared them in Figure2. The analysis revealed predic- can see, that the the loss function for the dataset gener-
tion issues with the NN and NODE models, particularly ated with a range of residence time in 5-10 is around 0.2
with the carbon balance and molar flow predictions. The and decreased compared to KCNODE model.
NN model predicted non-zero CH3 OH flow at zero res-
idence time, and the NODE model predicted negative
molar flows. These issues are illustrated by plotting CO2 4. Conclusions
conversion and product selectivity, which should range
from 0 to 1 but did not for NN and NODE models. In In the present work, we investigated the behaviour of dif-
contrast, knowledge-integrating models (PINN and KC- ferent neural network models (feed-forward NN, physics-
NODE) did not exhibit these problems.                           informed NN, neural ODE, kinetics-constrained neural
   To assess the approximation ability of the obtained ODE) when applied for modelling catalytic data describ-
models, the series of the different test datasets was gen- ing the process of CO2 hydrogenation to methanol. To
erated. The datasets differed in the residence time range compare models under the same conditions, we trained
(from 0-1 to 5-10). The values of estimated loss function the models in a way to achieve a similar value of the
for the test datasets are presented in Table 2. One can loss function in the 10-folds cross validation. Although
see that the values of the loss function for the dataset the obtained models had similar generalization capabil-
with the residence time range of 0-1 are similar for all the ity, we showed that only neural ODE model additionally
models. However, when we try to predict data outside of constrained by the general chemical and chemical engi-
the training dataset range, we observe that the values of neering knowledge demonstrate a good fitting of the data
the loss function increase with increasing residence time in the context of the residence time. Moreover, its capa-
range for all the models. This increase varies significantly bility to extrapolate was comparable to the traditional
from model to model. The highest errors in the data pre- modelling approach. Due to the extrapolation ability, the
diction for a residence time range of 5-10 were found for application of neural ODE models for accelerated kinetic
the NN and PINN models. This is due to the fact that the model development can be expected to grow.
models were trained on the dataset where the residence
time varied in the range of 0-1. Another limitation of the
NN and PINN models is that both models only represent
the solution of the reactor model. It means that these
Figure 2: The observed (dots) and fitting (lines) dependencies between: the normalized molar flows of compounds 𝐹𝑖 /𝐹𝑖,𝑠𝑐𝑎𝑙𝑒
and the residence times 𝜏 (top); the CO2 conversion and product selectivity (bottom) estimated by Equations 14 and 13,
respectively. Carbon balance was defined as the sum of the molar flow of all carbon-containing compounds. The following
reaction condition was used for the simulation: temperature was 375 ∘ C, the total pressure was 32 bar, H2 :CO2 ratio was 2.5.


Table 2
The comparison of the obtained models in the predicting the test datasets differing in the range of residence time selection.
The standard deviation (relative value) is given in brackets

                                                                 loss function
         model
                             CV                   0-1                    1-2                  2-5              5-10
         NN           1.7 ⋅ 10−3 (±38%)     2.6 ⋅ 10−3 (±18%)        0.15 (±65%)          11 (±89%)          75 (±82%)
         PINN         1.7 ⋅ 10−3 (±44%)     1.7 ⋅ 10−3 (±18%)     9.6 ⋅ 10−2 (±25%)       11 (±81%)         186 (±85%)
     .
         NODE         2.1 ⋅ 10−3 (±4.1%)    2.8 ⋅ 10−3 (±17%)    2.0 ⋅ 10−2 (±7.9%)      0.48 (±21%)         3.4 (±27%)
         KCNODE       1.7 ⋅ 10−3 (±9.8%)    4.7 ⋅ 10−3 (±14%)     2.5 ⋅ 10−2 (±11%)      0.13 (±8.0%)       0.66 (±7.2%)
         PL           1.7 ⋅ 10−3 (±0.1%)   5.3 ⋅ 10−3 (±0.09%)   2.5 ⋅ 10−2 (±0.07%)     0.13 (±0.4%)       0.67 (±0.5%)
         KCNODE∗      5.0 ⋅ 10−5 (±23%)    4.4 ⋅ 10−4 (±7.7%)     3.9 ⋅ 10−3 (±5.2%)   3.5 ⋅ 10−2 (±4.7%)   0.20 (±4.7%)


5. Acknowledgments                                                    simulations, and parameter estimation for heteroge-
                                                                      neous catalysis, ACS Catalysis 9 (2019) 6624–6647.
Financial support from German Research Foundation                     doi:10.1021/acscatal.9b01234 .
(DFG) through the project NFDI4Cat (DFG no. 441926934)            [4] G. Bebis, M. Georgiopoulos, Feed-forward neural
is gratefully acknowledged.                                           networks, IEEE Potentials 13 (1994) 27–31. doi:10.
                                                                      1109/45.329294 .
                                                                  [5] M. Raissi, P. Perdikaris, G. Karniadakis, Physics-
References                                                            informed neural networks: A deep learning
 [1] Yablonskii, G. S. Yablonskii, V. I. Bykov, V. I. Elokhin,        framework for solving forward and inverse prob-
     A. N. Gorban, Kinetic models of catalytic reactions,             lems involving nonlinear partial differential equa-
     Comprehensive Chemical Kinetics, Elsevier Science                tions, Journal of Computational Physics 378
     & Technology, 2014.                                              (2019) 686–707. doi:https://doi.org/10.1016/j.
 [2] L. Brübach, D. Hodonj, P. Pfeifer, Kinetic analysis of           jcp.2018.10.045 .
     co2 hydrogenation to long-chain hydrocarbons on              [6] R. T. Q. Chen, Y. Rubanova, J. Bettencourt, D. K.
     a supported iron catalyst, Industrial & Engineering              Duvenaud, Neural ordinary differential equations,
     Chemistry Research 61 (2022) 1644–1654. doi:10.                  in: S. Bengio, H. Wallach, H. Larochelle, K. Grau-
     1021/acs.iecr.1c04018 .                                          man, N. Cesa-Bianchi, R. Garnett (Eds.), Advances
 [3] S. Matera, W. F. Schneider, A. Heyden, A. Savara,                in Neural Information Processing Systems, vol-
     Progress in accurate chemical kinetic modeling,                  ume 31, Curran Associates, Inc., 2018. URL: https:
     //proceedings.neurips.cc/paper_files/paper/2018/              optimization, 2017. arXiv:1412.6980 .
     file/69386f6bb1dfed68692a24c8686939b9-Paper.pdf.         [18] G. Van Rossum, F. L. Drake, Python 3 Reference
 [7] G. Corso, H. Stark, S. Jegelka, T. Jaakkola, R. Barzi-        Manual, CreateSpace, Scotts Valley, CA, 2009.
     lay, Graph neural networks, Nat. Rev. Methods            [19] C. R. Harris, K. J. Millman, S. J. van der Walt, R. Gom-
     Primers 4 (2024).                                             mers, P. Virtanen, D. Cournapeau, E. Wieser, J. Tay-
 [8] Y. Yu, X. Si, C. Hu, J. Zhang, A Review of Recurrent          lor, S. Berg, N. J. Smith, R. Kern, M. Picus, S. Hoyer,
     Neural Networks: LSTM Cells and Network Archi-                M. H. van Kerkwijk, M. Brett, A. Haldane, J. F.
     tectures, Neural Computation 31 (2019) 1235–1270.             Del Río, M. Wiebe, P. Peterson, P. Gérard-Marchant,
     doi:10.1162/neco_a_01199 .                                    K. Sheppard, T. Reddy, W. Weckesser, H. Abbasi,
 [9] G. S. Gusmão, A. P. Retnanto, S. C. da Cunha,                 C. Gohlke, T. E. Oliphant, Array programming
     A. J. Medford, Kinetics-informed neural networks,             with NumPy, Nature 585 (2020) 357–362.
     Catalysis Today 417 (2023) 113701. doi:https://          [20] P. Virtanen, R. Gommers, T. E. Oliphant, M. Haber-
     doi.org/10.1016/j.cattod.2022.04.002 , tran-                  land, T. Reddy, D. Cournapeau, E. Burovski, P. Pe-
     sient Kinetics Seminar.                                       terson, W. Weckesser, J. Bright, S. J. van der Walt,
[10] Y. Weng, D. Zhou, Multiscale physics-informed                 M. Brett, J. Wilson, K. J. Millman, N. Mayorov, A. R. J.
     neural networks for stiff chemical kinetics, The              Nelson, E. Jones, R. Kern, E. Larson, C. J. Carey, İ. Po-
     Journal of Physical Chemistry A 126 (2022)                    lat, Y. Feng, E. W. Moore, J. VanderPlas, D. Laxalde,
     8534–8543.         doi:10.1021/acs.jpca.2c06513 ,             J. Perktold, R. Cimrman, I. Henriksen, E. A. Quin-
     pMID: 36322833.                                               tero, C. R. Harris, A. M. Archibald, A. H. Ribeiro,
[11] F. Sorourifar, Y. Peng, I. Castillo, L. Bui, J. Vene-         F. Pedregosa, P. van Mulbregt, SciPy 1.0 Contribu-
     gas, J. A. Paulson, Physics-enhanced neural ordi-             tors, SciPy 1.0: fundamental algorithms for scien-
     nary differential equations: Application to indus-            tific computing in python, Nat. Methods 17 (2020)
     trial chemical reaction systems, Industrial & Engi-           261–272.
     neering Chemistry Research 62 (2023) 15563–15577.        [21] W. McKinney, et al., Data structures for statistical
     doi:10.1021/acs.iecr.3c01471 .                                computing in python, in: Proceedings of the 9th
[12] T. Kircher, F. A. Döppel, M. Votsmeier, Global reac-          Python in Science Conference, volume 445, Austin,
     tion neural networks with embedded stoichiometry              TX, 2010, pp. 51–56.
     and thermodynamics for learning kinetics from re-        [22] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel,
     actor data, Chemical Engineering Journal 485 (2024)           B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer,
     149863. URL: https://www.sciencedirect.com/                   R. Weiss, V. Dubourg, et al., Scikit-learn: Machine
     science/article/pii/S1385894724013482. doi:https:             learning in python, Journal of machine learning
     //doi.org/10.1016/j.cej.2024.149863 .                         research 12 (2011) 2825–2830.
[13] A. Fedorov, A. Perechodjuk, D. Linke, Kinetics-          [23] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Brad-
     constrained neural ordinary differential equations:           bury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein,
     Artificial neural network models tailored for small           L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito,
     data to boost kinetic model development, Chemical             M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner,
     Engineering Journal 477 (2023) 146869. doi:https:             L. Fang, J. Bai, S. Chintala, Pytorch: An im-
     //doi.org/10.1016/j.cej.2023.146869 .                         perative style, high-performance deep learning li-
[14] S. Ghosh, J. Sebastian, L. Olsson, D. Creaser,                brary, in: Advances in Neural Information Process-
     Experimental and kinetic modeling studies of                  ing Systems 32, Curran Associates, Inc., 2019, pp.
     methanol synthesis from co2 hydrogenation using               8024–8035.
     in2 o3 catalyst, Chemical Engineering Journal 416        [24] M. Poli, S. Massaroli, A. Yamashita, H. Asama,
     (2021) 129120. doi:https://doi.org/10.1016/j.                 J. Park, S. Ermon, Torchdyn: Implicit models and
     cej.2021.129120 .                                             neural numerical methods in pytorch (????).
[15] S. Kim, W. Ji, S. Deng, Y. Ma, C. Rackauckas, Stiff      [25] J. D. Hunter, Matplotlib: A 2d graphics environ-
     neural ordinary differential equations, Chaos: An             ment, Computing in science & engineering 9 (2007)
     Interdisciplinary Journal of Nonlinear Science 31             90–95.
     (2021) 093122. doi:10.1063/5.0060697 .
[16] J. Dormand, P. Prince, A family of embedded
     runge-kutta formulae,           Journal of Compu-
     tational and Applied Mathematics 6 (1980)
     19–26. URL: https://www.sciencedirect.com/
     science/article/pii/0771050X80900133. doi:https:
     //doi.org/10.1016/0771- 050X(80)90013- 3 .
[17] D. P. Kingma, J. Ba, Adam: A method for stochastic