Extrapolation Problems in Kinetic Modelling of Catalytic Reactions with Neural Networks Aleksandr Fedorov∗,† , David Linke† Leibniz-Institut für Katalyse e. V., Albert-Einstein-Straße 29a, 18059 Rostock, Germany Abstract In the present study we investigated the behaviour of different types of neural networks in data extrapolation. The application is modelling a tubular chemical reactor in which a heterogeneous catalytic reaction (CO2 hydrogenation to methanol) is performed. Since data are slow and expensive to measure we focused on small data sets for training. The different models (feed-forward neural network (NN), physics-informed NN, neural ordinary differential equation (ODE), kinetics-constrained neural ODE) were trained in a way to achieve approximately the same values of loss function in the cross-validation. Although the obtained models have the same generalization ability, the extrapolation capability varies significantly. Wherein, a neural network model that is additionally constrained by the general chemical and chemical engineering knowledge demonstrated much better extrapolation ability compared to unconstrained models. Methods how to validate the generalization of the neural network kinetic models without using additional experimental data were suggested and discussed. Keywords neural ODE, physics-informed neural networks, kinetics-constrained neural ODE, kinetic modelling, CO2 hydrogenation 1. Introduction development of a wide range of different types of neural network architectures [4, 5, 6, 7, 8] that are used in var- The kinetic model development of catalytic reactions is ious fields. However, the overfitting problem of neural one of the difficult but important part of chemical engi- network models limits their wide usage especially in the neering for the process and industrial plant simulation case of small data that is typical available in the kinetic [1]. The kinetic models of chemical reactions are ide- model development. Strategies for improving the neural ally based on the knowledge of the reaction mechanisms. network models significantly have been proposed, for Due to complexity of the reaction mechanisms and the example, by using the knowledge about the process via existence of a lot of parameters which have to be es- modifying the loss function [5, 9, 10] or by constraining timated from the data in data-driven kinetic modelling, the architecture of the models [11, 12, 13]. In our recently several assumptions (postulating rate-determining stages, published work [13], we suggested a new approach for quasi-equilibrium approximation etc.) are usually used kinetic modelling with NNs. The approach is based on to convert the mechanisms to a set of reaction rate ex- constraining neural ODE models by the general chemical pressions for describing the dynamic of the process [2]. and chemical engineering knowledge. This enabled us to The complexity of this traditional approach is related obtain reliable NN models describing the complex process to the necessity of having deep knowledge of the reac- of CO2 hydrogenation using only small data. During the tion mechanism as well as the difficulty of performing method development we have found that different neural the screening of different possible assumptions. Notably, network models demonstrating the similar generaliza- solving the inverse kinetic task (the estimation of kinetic tion ability in the cross validation differed significantly model parameters from the data) is still a nontrivial task in their data prediction and extrapolation capabilities. in the modelling process [3]. For these reasons, a ma- Here, it is worth noting the difference between the gen- chine learning approach is an attractive alternative for eralization and extrapolation ability of the NN models. kinetic model development because only the data are The generalization refers the ability to fit the test data needed to develop the models. that lies within the bounds of training dataset. By the Neural networks are ones of popular methods of ma- extrapolation we mean in general the ability to predict chine learning due to their flexibility which led to the the data at different residence time that lies out of the bounds of training data. ITAT’24: Information technologies - Applications and Theory, Septem- ber 20–24, 2024, Drienica, Slovakia To investigate the observed phenomena in more de- ∗ Corresponding author. tails, in the present study we investigated the ability of † These authors contributed equally. different types of neural networks in the extrapolation Envelope-Open Aleksandr.Fedorov@catalysis.de (A. Fedorov); of the kinetic catalytic data. For this, we generated small David.Linke@catalysis.de (D. Linke) and large data sets for model training, validation, as well Orcid 0000-0001-6434-6623 (A. Fedorov); 0000-0002-5898-1820 as for testing the model ability to extrapolate. The follow- (D. Linke) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License ing different types of NN model were used: feed-forward Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings neural networks [4], physics-informed neural networks 2073 [5], neural ordinary differential equations (neural ODEs) log10 𝐾𝑒𝑞,1 = − + 2.029, (7) 𝑇 [6], kinetics-constrained neural ODE [13]. To compare the neural networks based modelling approach with the 3066 log10 𝐾𝑒𝑞,2 = − 10.592. (8) traditional one, we also developed a simple power-law 𝑇 kinetic model assuming first order with respect to the The rates constant were presented in the form of re- reagents in the reaction rate expressions. parameterized Arrhenius equation: 𝐸 1 1 2. Experimental part 𝑘𝑖 (𝑇 ) = 𝑘𝑖,0 exp ( 𝑖 ( − )), (9) 𝑅 𝑇0 𝑇 2.1. Data generation where 𝑘𝑖,0 is the rate constant of reaction 𝑖 at the reference temperature 𝑇0 that was set to 573.15 K; 𝐸𝑖 - activation en- For the data generation, the kinetic model of 𝐶𝑂2 hy- ergy of the reaction constant 𝑖; R - universal gas constant; drogenation to methanol was used from the work [14]. T - temperature. The adsorption equilibrium constants This kinetic model was based on the following chemical were presented in the form of Van’t Hoff equation: reactions: Δ𝐻 1 1 𝐾𝑖 (𝑇 ) = 𝐾𝑖,0 exp ( 𝑖 ( − )), (10) 𝐶𝑂2 + 3𝐻2 ⇆ 𝐶𝐻3 𝑂𝐻 + 𝐻2 𝑂, (1) 𝑅 𝑇0 𝑇 where 𝐾𝑖,0 is the adsorption equilibrium constant at the 𝐶𝑂2 + 𝐻2 ⇆ 𝐶𝑂 + 𝐻2 𝑂, (2) reference temperature 𝑇0 and Δ𝐻𝑖 is the molar change of the enthalpy. The parameters of the model were taken 𝐶𝑂2 + 4𝐻2 ⇆ 𝐶𝐻4 + 2𝐻2 𝑂. (3) from the original work [14]. To generate the data for the present study, the follow- A dual-site Langmuir-Hinshelwood kinetic model was ing system of ODE was integrated: used for describing the dynamics of the suggested reac- tions: 𝑑𝐹𝑖 𝑁 = ∑ 𝜈𝑖𝑗 𝑟𝑗 , (11) 𝑑𝜏 𝑗=1 𝑃𝐶𝐻3 𝑂𝐻 𝑃𝐻2 𝑂 𝑘1 (𝑃𝐶𝑂2 𝑃𝐻3 2 − 𝐾𝑒𝑞,1 ) where 𝐹𝑖 is the molar flow of the compound 𝑖; 𝜏 is the res- 𝑟1 = , (4) idence time, 𝑁 - is the number of the chemical reactions; 𝑃𝐻2 2 (1 + 𝐾𝐶𝑂2 ⋅ 𝑃𝐶𝑂2 ) (1 + √𝐾𝐻2 ⋅ 𝑃𝐻2 ) 𝜈𝑖𝑗 is the stoichiometric coefficient of the compound 𝑖 in the reaction 𝑗 (positive for the products and negative for 𝑃𝐶𝑂 𝑃𝐻2 𝑂 reagents). The system of ODE represents the mathemati- 𝑘2 (𝑃𝐶𝑂2 𝑃𝐻2 − 𝐾 ) cal model of 1D plug flow reactor for catalytic reaction 𝑒𝑞,2 𝑟2 = , (5) assuming the absence of heat and mass transfer limita- √𝑃𝐻2 (1 + 𝐾𝐶𝑂2 ⋅ 𝑃𝐶𝑂2 ) (1 + √𝐾𝐻2 ⋅ 𝑃𝐻2 ) tions. The integration of the system 11 allows one to estimate the molar flows of compounds along the reactor 𝑘3 √𝑃𝐶𝑂2 𝑃𝐻2 𝑟3 = , (6) (at different residence time). (1 + 𝐾𝐶𝑂2 ⋅ 𝑃𝐶𝑂2 ) (1 + √𝐾𝐻2 ⋅ 𝑃𝐻2 ) Table 1 shows the reaction conditions used for gener- ating the training data. For each reaction conditions, 5 where 𝑟𝑖 is the rate of the chemical reaction 𝑖; 𝑃𝑖 is the par- points of the residence time (0.05, 0.1, 0.2, 0.4, 1.0) were tial pressure of compound 𝑖; 𝑘𝑖 is the rate constant of the used for the data generation. Thus, the total number of reaction 𝑖; 𝐾𝑒𝑞,𝑖 - the equilibrium constant of the reaction different reaction conditions was 13 ⋅5 = 65. The total 𝑖; 𝐾𝐶𝑂2 - the equilibrium constant of 𝐶𝑂2 adsorption; 𝐾𝐻2 inlet molar flow was set to a constant value of 1 for each - the equilibrium constant of dissociative adsorption of reaction conditions (only CO2 and H2 were used in the 𝐻2 . It is worth mentioning that the thermodynamic part inlet flow). Thus, the dataset represent the dependencies of the reaction 3 was removed from the equation in the between the outlet molar flows of each compound (ob- present study because the equilibrium constant of this re- ∘ ∘ tained by the integration of the system of ODE 11 ) and action is very high (Δ𝑟 𝐺 < 0 and |Δ𝑟 𝐺 |/𝑅𝑇 ≈ 19.6 ≫ 1, the reaction condition (initial molar flows of 𝐶𝑂2 and 𝐻2 , 𝐾𝑒𝑞,3 = exp (−Δ𝑟 𝐺 ∘ /𝑅𝑇) ≈ 3.3 ⋅ 108 ). For comparison, the temperature, total pressure, and the residence time). The values of equilibrium constants of the reactions 1 and 2 −2 −6 partial pressure of compounds (required for the estima- are 2.6 ⋅ 10 and 5.7 ⋅ 10 , respectively. The values are tion the reaction rates 4, 5, and 6.) was estimated by the given for 573.15 K. The following equations were used to following equation assuming the ideal gas law: estimate the values of the equilibrium constants of the reactions 1 and 2: Table 1 note that the architecture and parameters of NN models The reaction conditions used for the training data generation. (the number of layers and neurons, a type of activation function etc.) were chosen to achieve approximately the # Temperature, ∘ C Pressure, bar H2 :CO2 ratio same value of loss function (Equation 17 ) for 10-fold 1 200 30 3 cross-validation during the model training. 2 250 30 3 The first model was the simple feed-forward neural 3 300 30 3 network denoted as NN. This model has 4 inputs (temper- 4 350 30 3 ature, pressure, the inlet fraction of CO2 and the residence 5 400 30 3 time). Temperature was transformed in the following 6 300 20 3 7 300 25 3 form and used as an input for neural network models: 8 300 35 3 9 300 40 3 1.1 ⋅ 104 1 1 𝑇𝑖𝑛 = exp [ ( − )]. (15) 10 300 30 1.5 𝑅 𝑇0 𝑇 11 300 30 2 By presenting the temperature in a such way, 𝑇𝑖𝑛 was in 12 300 30 4 a range of ≈ 0.61-1.41. The inlet molar fraction of carbon 13 300 30 6 𝑖𝑛 was calculated by the following equation: dioxide 𝑥𝐶𝑂 2 𝑖𝑛 = 1 𝑥𝐶𝑂 , (16) 2 1+𝑚 𝑃 ⋅ 𝐹𝑖 𝑃𝑖 = , (12) where 𝑚 is the H2 :CO2 ratio. The pressure was normal- ∑𝑗 𝐹𝑗 ized by dividing by the maximum value of 40 bar. The where 𝑃 is the total pressure; 𝑃𝑖 - the partial pressure of NN model has 4 outputs for the molar flow of CO2 , CO, compound 𝑖; 𝐹𝑖 - the molar flow of compound 𝑖. CH4 , and CH3 OH. The NN model has 1 hidden layer with The 𝐶𝑂2 conversion 𝑋𝐶𝑂2 and selectivity of 𝑖- 20 nodes and hyperbolic tangent activation function as compound 𝑆𝑖 , which are the commonly used measures of well as the output layer with sigmoid activation function. reactor and catalyst performance, were estimated by the To train the NN model, the following loss function was following equations that relate the state at reactor inlet used: to the state at reactor outlet: 𝑝𝑟𝑒𝑑 𝑒𝑥𝑝 2 𝐹𝑖 − 𝐹𝑖 𝑖𝑛 − 𝐹 𝑜𝑢𝑡 𝐹𝐶𝑂 𝑙𝑜𝑠𝑠1 = ∑ ( ) , (17) 2 𝐶𝑂2 𝐹𝑖,𝑠𝑐𝑎𝑙𝑒 𝑋𝐶𝑂2 = 𝑖𝑛 , (13) 𝑖∈training set 𝐹𝐶𝑂 2 𝑝𝑟𝑒𝑑 where 𝐹𝑖 is the molar flow of the compound 𝑖 in the 𝐹𝑖𝑜𝑢𝑡 outlet predicted 𝑒𝑥𝑝 by the model; 𝐹𝑖 is the molar flow of 𝑆𝑖 = 𝑖𝑛 𝑜𝑢𝑡 . (14) 𝐹𝐶𝑂2 − 𝐹𝐶𝑂2 the compound 𝑖 in the outlet simulated by the kinetic model; 𝐹𝑖,𝑠𝑐𝑎𝑙𝑒 is a characteristic scale for compound 𝑖 that To validate and investigate the extrapolation ability was calculated based on the maximum value for the molar of the kinetic models, 4 different test datasets were gen- flow of compound 𝑖 in the training dataset. This scaling erated. The 4 test datasets differed in the range of the was also helpful to mitigate the stiffness of neural ODE residence time selection (0-1, 1-2, 2-5, and 5-10) to inves- models [15] during the model training. tigate the extrapolation ability of the investigated kinetic The second model was a physics-informed neural net- models. Each dataset had 1000 points which were gener- work (PINN). The architecture of this model was similar ated by randomly selecting values from the ranges of the to the NN model except for a modification of the outlet reaction conditions (temperature in 200-400 ∘ C, pressure of the NN model: in 20-40 bar, H2 :CO2 ratio in 1.5-6.0) using a uniform distribution. To train physics-informed neural network 𝑜𝑢𝑡𝑖𝑁 𝑁 ⋅ 𝑥𝐶𝑂𝑖𝑛 model, an additional dataset (labeled as the zero set) was 𝑜𝑢𝑡𝑖𝑃𝐼 𝑁 𝑁 = 𝑁𝑁 2 , (18) created. The zero set was generated by a similar pro- ∑ 𝑗 𝑜𝑢𝑡𝑗 cedure applied for the test data generation except for where 𝑜𝑢𝑡𝑖𝑃𝐼 𝑁 𝑁 is the 𝑖-outlet of PINN model; 𝑜𝑢𝑡𝑖𝑁 𝑁 is setting the residence time to be 0. The zero dataset had the 𝑖-outlet of NN model. For the training PINN model, also 1000 points. we used the knowledge that there is no change in the molar flows of compounds when the residence time is to 2.2. Model architecture and training equal to 0. For this, an additional zero dataset was also used for training of PINN model and the corresponding Different variants of artificial neural networks were used loss function was: to fit the data to CO2 hydrogenation. It is important to conventional kinetic model that is based on the assump- 𝑝𝑟𝑒𝑑 𝑒𝑥𝑝 2 tion that first order kinetics can sufficiently describe the 𝐹𝑖 − 𝐹𝑖 𝑙𝑜𝑠𝑠2 = 𝑙𝑜𝑠𝑠1 + ∑ ( ) . (19) reactions. The PL model was selected as baseline model 𝑖∈zero set 𝐹𝑖,𝑠𝑐𝑎𝑙𝑒 and used for the comparison with the NN models. The The next model (NODE) was based on the neural ODE loss function 17 was used for training KCNODE and PL [6] that uses neural networks for approximating the right models. part of ODEs: 𝑑𝐹𝑖 2.3. Hardware and software specifications = ANN(𝑃𝑖 , 𝑇 ), (20) 𝑑𝜏 Dormand-Prince-Shampine method (DOPRI5) method where ANN is the feed-forwards neural network. The was used for the integration of the neural ODE [16]. inputs of ANN were the partial pressure of compounds Training neural network models was carried out by min- and the re-parameterized temperature (Equation 15). The imizing the loss functions using ADAM [17] optimizer NODE model has 7 inputs (partial pressure of compounds with a learning rate of 0.005. L2 -regularization was used. - CO2 , CO, CH4 , CH3 OH, H2 , H2 O as well as the re- The parameter of the regularization was set to 10−6 . All parameterized temperature (Equation 15)). The NODE calculation were implemented in the Python program- model consists of two hidden layers with hyperbolic tan- ming language (version 3.9.12) [18]. The scientific li- gent and exponential activation functions. The number braries NumPy [19] (version 1.23.0), SciPy [20] (version of nodes was 3 for both layers. The output of the ANN 1.8.1), Pandas [21] (version 1.23.0), Scikit-learn [22] (ver- was the linear function. The loss function 17 was used sion 1.1.1) were used for data analysis and evaluation. for training NODE model. Pytorch [23] (version 1.12.0) and Torchdyn [24] (version The kinetics-constrained neural ODE approach was 1.0.3) were used for building and training neural net- used for developing the fourth model (KCNODE). The works models. Matplotlib [25] (version 3.7.2) was used idea of the approach is to use the general knowledge to visualize the results. about the process and approximate the rates of chemical reactions by the following equation: 3. Results and Discussion 𝑄 The first step of the present study was to obtain the 𝑗 𝑟𝑖 = 𝑘𝑖 (𝑇 ) ⋅ ∏ (𝑃𝑖 ) ⋅ (1 − ) ⋅ ANN(𝑃𝑖 , 𝑇 ), (21) trained neural network models with the similar gener- 𝐾𝑒𝑞,𝑖 𝑗 alization ability. In our work, we have chosen the value of the loss function in the 10-folds cross validation (CV) where 𝑘𝑖 (𝑇 ) is the rate constant of the reaction 𝑖 defined 𝑗 as a metric of the generalization. Firstly, we estimated by Equation 9; 𝑃𝑖 is the partial pressure of the reagent 𝑗 the CV value for the PL model which was 1.7 ⋅10−3 . To in the reaction 𝑖. 𝑄 - the reaction quotient. For example, achieve a similar CV value of the loss function for all in the case of reaction 1 the corresponding quotient is: neural network models, we trained our models by mini- 𝑃𝐶𝐻3 𝑂𝐻 𝑃𝐻2 𝑂 mizing the corresponding loss function. Figure 1 shows 𝑄1 = 3 . (22) the dependency between the value of 10-folds CV of the 𝑃𝐶𝑂2 𝑃𝐻2 loss function and the number of epochs. It can be seen that the CV loss function decreases with increasing num- The KCNODE model consists of 1 hidden layer with ber of epochs for all the neural network models except 3 nodes (hyperbolic tangent activation function) and the for the NODE one where we observe a significant rise in outlet layer which has sigmoid activation function. The the value of the loss function after around 2000 epochs. sigmoid function is chosen for the output layer because From the obtained data we have found the number of it ensures positive value, and, thus, Equation 21 aligns iterations which is needed for training to achieve the cor- with thermodynamic. So, if (𝑄 < 𝐾𝑒𝑞 ), the rate is positive responding value of loss function (1.7 ⋅10−3 in our case). (forward reaction), and if (𝑄 > 𝐾𝑒𝑞 ), the rate is negative It is worth noting that we did not manage to achieve (backward reaction). this target value for the NODE model. For this model, It is worth noting that the activation energies and the the minimum value of the loss function (2.1 ⋅10−3 ) was rate constants were also parameters of the KCNODE therefore chosen which is still close to the targeted value. model and were varied during the training of the neural Thus, we managed to obtain a set of different neural network model, along with the weights and biases of the network models with similar generalization ability. neural network layers. To validate the generalization ability of machine learn- The last model denoted as Power-Law (PL) model re- ing models, an additional test dataset is typically used. places the ANN part in the KCNODE model by simple However, for kinetic model development, chemical and power law rate expressions. Thus, it represents a simple models describe the solution of the system of ODE but not the kinetic model represented in the form of rate equations. This imposes serious restrictions on using the resulting models for up-scaling (modelling another type of reactor or extending the models by adding the diffusion/heat transfer). The neural ODE approach does not have such a lim- itation since the neural ODE models represent the ap- proximation of the right part of ODE. From Table 2 one can see that better extrapolation ability is observed for neural ODE models (NODE and KCNODE) compared to NN and PINN ones. Wherein, the loss function for the dataset generated with a residence time range of 5- 10 was 0.66 for KCNODE model and around 5 times lower then the one for NODE model (3.4). Thus, the neu- ral ODE model additionally constrained by the general Figure 1: The dependencies between the value of 10-fold chemical and chemical engineering knowledge demon- CV and the number of epochs during training for different strates much better extrapolation ability. To compare models. In the case of neural ODE-based models (NODE and the KCNODE model with a traditional approach, Ta- KCNODE), the number of epochs was multiplied with 10 for ble 2 shows the values of loss function for PL model. better visualization. The training process was repeated 5 times to estimate the standard deviation that is shown as error bar One can see that both KCNODE and PL models show in the figure. similar predicting ability in fitting the test data. In ad- dition, another kinetics-constrained neural ODE model (KCNODE*) was obtained by training after 10000 epochs to achieve the minimum value of the CV loss function to engineering knowledge can be utilized for validation. compare with other models. The values of loss function We simulated virtual data using the obtained models and for different test data are also presented in Table 2. On compared them in Figure2. The analysis revealed predic- can see, that the the loss function for the dataset gener- tion issues with the NN and NODE models, particularly ated with a range of residence time in 5-10 is around 0.2 with the carbon balance and molar flow predictions. The and decreased compared to KCNODE model. NN model predicted non-zero CH3 OH flow at zero res- idence time, and the NODE model predicted negative molar flows. These issues are illustrated by plotting CO2 4. Conclusions conversion and product selectivity, which should range from 0 to 1 but did not for NN and NODE models. In In the present work, we investigated the behaviour of dif- contrast, knowledge-integrating models (PINN and KC- ferent neural network models (feed-forward NN, physics- NODE) did not exhibit these problems. informed NN, neural ODE, kinetics-constrained neural To assess the approximation ability of the obtained ODE) when applied for modelling catalytic data describ- models, the series of the different test datasets was gen- ing the process of CO2 hydrogenation to methanol. To erated. The datasets differed in the residence time range compare models under the same conditions, we trained (from 0-1 to 5-10). The values of estimated loss function the models in a way to achieve a similar value of the for the test datasets are presented in Table 2. One can loss function in the 10-folds cross validation. Although see that the values of the loss function for the dataset the obtained models had similar generalization capabil- with the residence time range of 0-1 are similar for all the ity, we showed that only neural ODE model additionally models. However, when we try to predict data outside of constrained by the general chemical and chemical engi- the training dataset range, we observe that the values of neering knowledge demonstrate a good fitting of the data the loss function increase with increasing residence time in the context of the residence time. Moreover, its capa- range for all the models. This increase varies significantly bility to extrapolate was comparable to the traditional from model to model. The highest errors in the data pre- modelling approach. Due to the extrapolation ability, the diction for a residence time range of 5-10 were found for application of neural ODE models for accelerated kinetic the NN and PINN models. This is due to the fact that the model development can be expected to grow. models were trained on the dataset where the residence time varied in the range of 0-1. Another limitation of the NN and PINN models is that both models only represent the solution of the reactor model. It means that these Figure 2: The observed (dots) and fitting (lines) dependencies between: the normalized molar flows of compounds 𝐹𝑖 /𝐹𝑖,𝑠𝑐𝑎𝑙𝑒 and the residence times 𝜏 (top); the CO2 conversion and product selectivity (bottom) estimated by Equations 14 and 13, respectively. Carbon balance was defined as the sum of the molar flow of all carbon-containing compounds. The following reaction condition was used for the simulation: temperature was 375 ∘ C, the total pressure was 32 bar, H2 :CO2 ratio was 2.5. Table 2 The comparison of the obtained models in the predicting the test datasets differing in the range of residence time selection. The standard deviation (relative value) is given in brackets loss function model CV 0-1 1-2 2-5 5-10 NN 1.7 ⋅ 10−3 (±38%) 2.6 ⋅ 10−3 (±18%) 0.15 (±65%) 11 (±89%) 75 (±82%) PINN 1.7 ⋅ 10−3 (±44%) 1.7 ⋅ 10−3 (±18%) 9.6 ⋅ 10−2 (±25%) 11 (±81%) 186 (±85%) . NODE 2.1 ⋅ 10−3 (±4.1%) 2.8 ⋅ 10−3 (±17%) 2.0 ⋅ 10−2 (±7.9%) 0.48 (±21%) 3.4 (±27%) KCNODE 1.7 ⋅ 10−3 (±9.8%) 4.7 ⋅ 10−3 (±14%) 2.5 ⋅ 10−2 (±11%) 0.13 (±8.0%) 0.66 (±7.2%) PL 1.7 ⋅ 10−3 (±0.1%) 5.3 ⋅ 10−3 (±0.09%) 2.5 ⋅ 10−2 (±0.07%) 0.13 (±0.4%) 0.67 (±0.5%) KCNODE∗ 5.0 ⋅ 10−5 (±23%) 4.4 ⋅ 10−4 (±7.7%) 3.9 ⋅ 10−3 (±5.2%) 3.5 ⋅ 10−2 (±4.7%) 0.20 (±4.7%) 5. Acknowledgments simulations, and parameter estimation for heteroge- neous catalysis, ACS Catalysis 9 (2019) 6624–6647. Financial support from German Research Foundation doi:10.1021/acscatal.9b01234 . (DFG) through the project NFDI4Cat (DFG no. 441926934) [4] G. Bebis, M. Georgiopoulos, Feed-forward neural is gratefully acknowledged. networks, IEEE Potentials 13 (1994) 27–31. doi:10. 1109/45.329294 . [5] M. Raissi, P. Perdikaris, G. Karniadakis, Physics- References informed neural networks: A deep learning [1] Yablonskii, G. S. Yablonskii, V. I. Bykov, V. I. Elokhin, framework for solving forward and inverse prob- A. N. Gorban, Kinetic models of catalytic reactions, lems involving nonlinear partial differential equa- Comprehensive Chemical Kinetics, Elsevier Science tions, Journal of Computational Physics 378 & Technology, 2014. (2019) 686–707. doi:https://doi.org/10.1016/j. [2] L. Brübach, D. Hodonj, P. Pfeifer, Kinetic analysis of jcp.2018.10.045 . co2 hydrogenation to long-chain hydrocarbons on [6] R. T. Q. Chen, Y. Rubanova, J. Bettencourt, D. K. a supported iron catalyst, Industrial & Engineering Duvenaud, Neural ordinary differential equations, Chemistry Research 61 (2022) 1644–1654. doi:10. in: S. Bengio, H. Wallach, H. Larochelle, K. Grau- 1021/acs.iecr.1c04018 . man, N. Cesa-Bianchi, R. Garnett (Eds.), Advances [3] S. Matera, W. F. Schneider, A. Heyden, A. Savara, in Neural Information Processing Systems, vol- Progress in accurate chemical kinetic modeling, ume 31, Curran Associates, Inc., 2018. URL: https: //proceedings.neurips.cc/paper_files/paper/2018/ optimization, 2017. arXiv:1412.6980 . file/69386f6bb1dfed68692a24c8686939b9-Paper.pdf. [18] G. Van Rossum, F. L. Drake, Python 3 Reference [7] G. Corso, H. Stark, S. Jegelka, T. Jaakkola, R. Barzi- Manual, CreateSpace, Scotts Valley, CA, 2009. lay, Graph neural networks, Nat. Rev. Methods [19] C. R. Harris, K. J. Millman, S. J. van der Walt, R. Gom- Primers 4 (2024). mers, P. Virtanen, D. Cournapeau, E. Wieser, J. Tay- [8] Y. Yu, X. Si, C. Hu, J. Zhang, A Review of Recurrent lor, S. Berg, N. J. Smith, R. Kern, M. Picus, S. Hoyer, Neural Networks: LSTM Cells and Network Archi- M. H. van Kerkwijk, M. Brett, A. Haldane, J. F. tectures, Neural Computation 31 (2019) 1235–1270. Del Río, M. Wiebe, P. Peterson, P. Gérard-Marchant, doi:10.1162/neco_a_01199 . K. Sheppard, T. Reddy, W. Weckesser, H. Abbasi, [9] G. S. Gusmão, A. P. Retnanto, S. C. da Cunha, C. Gohlke, T. E. Oliphant, Array programming A. J. Medford, Kinetics-informed neural networks, with NumPy, Nature 585 (2020) 357–362. Catalysis Today 417 (2023) 113701. doi:https:// [20] P. Virtanen, R. Gommers, T. E. Oliphant, M. Haber- doi.org/10.1016/j.cattod.2022.04.002 , tran- land, T. Reddy, D. Cournapeau, E. Burovski, P. Pe- sient Kinetics Seminar. terson, W. Weckesser, J. Bright, S. J. van der Walt, [10] Y. Weng, D. Zhou, Multiscale physics-informed M. Brett, J. Wilson, K. J. Millman, N. Mayorov, A. R. J. neural networks for stiff chemical kinetics, The Nelson, E. Jones, R. Kern, E. Larson, C. J. Carey, İ. Po- Journal of Physical Chemistry A 126 (2022) lat, Y. Feng, E. W. Moore, J. VanderPlas, D. Laxalde, 8534–8543. doi:10.1021/acs.jpca.2c06513 , J. Perktold, R. Cimrman, I. Henriksen, E. A. Quin- pMID: 36322833. tero, C. R. Harris, A. M. Archibald, A. H. Ribeiro, [11] F. Sorourifar, Y. Peng, I. Castillo, L. Bui, J. Vene- F. Pedregosa, P. van Mulbregt, SciPy 1.0 Contribu- gas, J. A. Paulson, Physics-enhanced neural ordi- tors, SciPy 1.0: fundamental algorithms for scien- nary differential equations: Application to indus- tific computing in python, Nat. Methods 17 (2020) trial chemical reaction systems, Industrial & Engi- 261–272. neering Chemistry Research 62 (2023) 15563–15577. [21] W. McKinney, et al., Data structures for statistical doi:10.1021/acs.iecr.3c01471 . computing in python, in: Proceedings of the 9th [12] T. Kircher, F. A. Döppel, M. Votsmeier, Global reac- Python in Science Conference, volume 445, Austin, tion neural networks with embedded stoichiometry TX, 2010, pp. 51–56. and thermodynamics for learning kinetics from re- [22] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, actor data, Chemical Engineering Journal 485 (2024) B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, 149863. URL: https://www.sciencedirect.com/ R. Weiss, V. Dubourg, et al., Scikit-learn: Machine science/article/pii/S1385894724013482. doi:https: learning in python, Journal of machine learning //doi.org/10.1016/j.cej.2024.149863 . research 12 (2011) 2825–2830. [13] A. Fedorov, A. Perechodjuk, D. Linke, Kinetics- [23] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Brad- constrained neural ordinary differential equations: bury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, Artificial neural network models tailored for small L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, data to boost kinetic model development, Chemical M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, Engineering Journal 477 (2023) 146869. doi:https: L. Fang, J. Bai, S. Chintala, Pytorch: An im- //doi.org/10.1016/j.cej.2023.146869 . perative style, high-performance deep learning li- [14] S. Ghosh, J. Sebastian, L. Olsson, D. Creaser, brary, in: Advances in Neural Information Process- Experimental and kinetic modeling studies of ing Systems 32, Curran Associates, Inc., 2019, pp. methanol synthesis from co2 hydrogenation using 8024–8035. in2 o3 catalyst, Chemical Engineering Journal 416 [24] M. Poli, S. Massaroli, A. Yamashita, H. Asama, (2021) 129120. doi:https://doi.org/10.1016/j. J. Park, S. Ermon, Torchdyn: Implicit models and cej.2021.129120 . neural numerical methods in pytorch (????). [15] S. Kim, W. Ji, S. Deng, Y. Ma, C. Rackauckas, Stiff [25] J. D. Hunter, Matplotlib: A 2d graphics environ- neural ordinary differential equations, Chaos: An ment, Computing in science & engineering 9 (2007) Interdisciplinary Journal of Nonlinear Science 31 90–95. (2021) 093122. doi:10.1063/5.0060697 . [16] J. Dormand, P. Prince, A family of embedded runge-kutta formulae, Journal of Compu- tational and Applied Mathematics 6 (1980) 19–26. URL: https://www.sciencedirect.com/ science/article/pii/0771050X80900133. doi:https: //doi.org/10.1016/0771- 050X(80)90013- 3 . [17] D. P. Kingma, J. Ba, Adam: A method for stochastic