Mathematical model library for recombinant e.coli cultivation process Mantas Butkus Vytautas Galvanauskas Kaunas University of Technology, Kaunas University of Technology, Department of Automation, Department of Automation, Kaunas, Lithuania Kaunas, Lithuania mantas.butkus@ktu.edu vytautas.galvanauskas@ktu.lt Abstract—Biotechnological processes are among the most and various models that have been used in the selected complicated control objects that require deep knowledge about researches are presented. Section IV provides the process. These systems have nonlinear relationships between recommendations on which models should be used process variables and properties that vary over time. Usually depending on the process knowledge and availability of such processes are hard to model and require exceptional knowledge and experience in this field. In this review article experimental data. studies conducted within the last five years in the biotechnology II. METHODS field, that used various model types (mechanistic models, neural networks, fuzzy models) to model cultivation processes were This review was conducted using Google Scholar analyzed. Recommendations on what type of models should be database. Google Scholar is an open access scholarly search used taking into account available process knowledge and engine that consists of full-text journal articles, books, and experimental data were provided. Mechanistic models are best other scholarly documents. Even though this database has suited if there is a lack in experience in this field, advanced been criticized by many scholars because of its shortcomings models like neural networks, fuzzy logic or hybrid models on bibliometric purposes [15, 16], it is still one of the mainly should be used if there is enough experimental data and process used databases because of its broader coverage. Relevant knowledge since these models tend to model the process more articles were filtered out and processed according to the precisely and take in to account parameters or phenomena that following rules and criteria: cannot be described by mechanistic models.  „Modeling“ is mentioned in the topic of the article. Keywords—biotechnological processes, neural networks,  The article was published after 2014. fuzzy logic, cell growth modeling.  Biotechnological cultivation processes are only I. INTRODUCTION analyzed. Biotechnological processes are among the most  E.coli cultivation processes are preferred. complicated control objects that are characterized by all the  Article is an open access resource. properties complicating control: nonlinear relationships  Article is within the first 30 pages of Google Scholar between the process variables, dynamic properties of such search. processes significantly change with time, the processes lack in reliable sensors for state monitoring [13]. Therefore, In Figure 1 the structure of the selection of articles is development of effective control systems is a relevant described. bioengineering task. Most of the control systems these days Initial search rely on mathematical models that are well-known but not always describe the process well or simplify the process. E. 203000+ results coli is mostly used in biotechnology, since it is well-known and researched [13]. However, there are no clear After applying rules recommendations what kind of models should be used in 26000+ different cases. In order to enrich the understanding of After removing articles biotechnological modeling and selecting the best suited writen till 2014 13000 model the authors compiled a review on the methods used to model E. coli cultivation. After applying criteria The aim of this article is to present various kinetic models 300 for recombinant cultivation processes and recommendations on what kind of models to use depending on the process and Open access and title gathered data. In Section II the process how studies were 30 reading reading selected and analyzed is presented. In Section III, an Final 12 explanation how, biotechnological processes are modeled articles © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0) Fig. 1. Flow diagram of literature search 9 After selection of the relevant papers twelve articles [1-12] number of cells is to determine their total mass, i.e. to were selected and analyzed to determine what kind of models calculate biomass amount. Growing biomass creates new are used to model recombinant E.coli cultivation processes cells that utilize nutrients and release vital products. within the last five years. Therefore, it is common to express these specific rates for biomass. In the 1930s, Monod described the growth of III. BIOTECHNOLOGICAL PROCESS MODELLING biomass at the specific rate of biomass growth, which is In order to model biotechnological processes, mass and expressed by [13]: energy balance equations for the modeled process should be created [13]. The balance equations are created in accordance µ= 1 𝑑(𝑥𝑉) = 1 𝑑𝑋 , (4) with the mass conservation law. This means that the mass 𝑥𝑉 𝑑𝑡 𝑋 𝑑𝑡 change in the bioreactor occurs due to:  chemical reactions that occur in the bioreactor thus where X is the biomass amount, µ is defined as the relative increase in biomass per unit time. This quantity is not creating new products; constant during the process and depends on various  quantity of material supplied by external material parameters: flows;  physiological state of microorganism culture,  the amount of culture medium containing the  biomass concentration in the medium, material in question is removed from the bioreactor.  concentration of substrates,  pH of medium, The equation for mass balance of materials is described by:  temperature,  pressure, etc. 𝑑(𝐶1 𝑉) = 𝑞1 𝐶2 𝑉 + 𝐶𝑖𝑛1 𝐹𝑖𝑛 − 𝐶1 𝐹𝑜𝑢𝑡 , (1) 𝑑𝑡 The equation (4) can be used to determine the experimental biomass measurement data, but the modeling of where, C1 is the concentration of the material in the reactor, the biomass balance equation is usually a function of certain V is the volume of the medium. The amount of material in the variables. Below, the most often used kinetic models are medium will be equal to the product of these two variables. presented. q1 is the specific reaction rate relative to the concentration of C2 material, in other words, this value indicates the amount of material C1 formed per unit of mass C2 per one-time unit. A. Monod kinetics Fin and Fout are the input and output flows. The change in Monod kinetics is the most commonly used µ relationship volume of the medium can only occur due to the flows into in biotechnological process modelling. The specific reaction and out of the reactor. It can be described by the following rate depends on the concentration of the main substrate and differential equation: is described by the formula: 𝑑𝑉 𝑠 = 𝐹𝑖𝑛 − 𝐹𝑜𝑢𝑡 𝐹𝑜𝑢𝑡 (2) µ = µ𝑚𝑎𝑥 (5) 𝑑𝑡 𝐾𝑠 +𝑠 After the transformations of the equations (1)-(2) one gets where μ is the specific growth rate of the microorganisms, final differential equation for the mass balance: μmax is the maximum specific growth rate of the microorganisms, s is the concentration of 𝑑𝐶1 𝐹 the limiting substrate for growth, Ks is the “half-velocity = 𝑞1 𝐶2 + 𝑖𝑛 (𝐶𝑖𝑛1 − 𝐶1 ) (3) 𝑑𝑡 𝑉 constant” — the value of s when μ/μmax = 0.5 μmax and Ks are empirical coefficients to the Monod equation. They will differ between species and based on the ambient environmental The change in concentration is not directly dependent on the conditions [1]. This kinetic model is usually used if the outflow flow, and after taking a small sample, the kinetics of the process is not well-known. In a study concentration of the substance C1 will not change drastically, conducted by Papic et al. [2] Monod kinetics was used since However, the outflow determines the volumetric variation of the relationship between the produced dsRNA and biomass the medium, while the volume is already included in the are unknown. The results showed a 37% increase in the equation. process productivity. Similary the Monod kinetics was used The specific reaction rates in the previously discussed by S. Limoes [3] when modelling recombinant cellulase mass balance equations can be modeled by different types of cultivation. In [7] several Monod kinetic models were used to models. The authors will further cover the mechanistic model a multi-substrate environment. In all presented studies models of these reaction rates. The main growth indicator for the model was sufficient and fit the experimental data. microorganisms is the growth rate. For example, a new E. coli cell, using substrate, is generated in about 40 minutes if the temperature is 37 degrees Celsius, and some other types of B. Moser kinetics bacteria divide even faster [13]. Naturally, the question is Another well-known Monod kinetic modification is the how to measure the number of cells that are formed. It is model proposed by Hermann Moser [4]. Moser added another possible to estimate their number, but as the cells grow and variable n, which integrates the microorganism mutation. divide, it is decided that the best way to characterize the 10 𝑠𝑛 inaccuracies (noise), sampling-based approaches have µ = µ𝑚𝑎𝑥 (8) 𝐾𝑠 +𝑠𝑛 become popular to yield surrogates for missing knowledge in parameter values [8]. In one of the studies [9] the researchers d used random forest and neural networks for biomass and In the analyzed studies [5, 6] Moser model was used to study recombinant protein modeling in Escherichia coli fed‐batch the kinetic behavior of the culture since the microorganism fermentations. The applicability of two machine learning was not well-known. Results showed, that the Moser model is methods, random forest and neural networks, for the inferior compared with other classical kinetic models. prediction of cell dry mass and recombinant protein based on online available process parameters and two‐dimensional C. Powell kinetics multi‐wavelength fluorescence spectroscopy was investigated. The researched models solely based on The original Monod equation was modified by Powell, routinely measured process variables gave a satisfying introducing the terms of maintenance rate m which takes into prediction accuracy of about ± 4% for the cell dry mass, while account some of the limitations of Monod model. The Powell additional spectroscopic information allows for an estimation kinetic model is described by the equation: of the protein concentration within ±12% [9]. These studies showed that hybrid models are capable of modeling complex biotechnological systems. According to [10] hybrid models 𝑠 µ = (µ𝑚𝑎𝑥 + 𝑚) −𝑚 (10) have the following advantages over classical models: 𝐾𝑠 +𝑠  potentially fewer experiments required for process All of the described models are mostly used where no development and optimization; additional data from the process is gathered and are  allow to study impact of certain variables without considered as “classical” models that should be used if the the execution of experiments, e.g., for the initial processes are not well-known and there is not much biomass concentration; experience gathered.  may provide good extra- and interpolation properties. D. Blackbox and hybrid kinetics E. Fuzzy logic models Hybrid modelling techniques have emerged as an alternative to classical modelling techniques. Recently, these An important feature of fuzzy logic is that it is possible models are particularly widely used in the field of to divide information into vague areas using non-specific sets biotechnological process optimization [10,14]. Hybrid [12]. In contrast to the classical set theory, where, according models include mechanistic models, artificial neural to a defined feature, the element is strictly assigned to one of networks, fuzzy systems, and expert knowledge-based the sets, the non-expressive set provides an opportunity to models into a single system, based on principled process define a gradual transition from one set to another using management rules and new information. Mechanistic models membership functions. A model of fuzzy sets usually are based on the application of fundamental principles and the associates input and output variables by compiling if-rules use of certain simplistic assumptions to model phenomena in such as: the process. Using engineering correlations, one can create different types of empirical models that describe well the IF the substrate concentration is low nonlinear process properties. Using artificial neural AND the specific rate of biomass growth is medium networks, it is possible to successfully model functional AND the concentration of dissolved oxygen is low relationships when there is a lot of measurement to identify a THEN the speed of product production is medium. data model, and fundamental functional relationships between individual modelled state variables are not These kinds of models can also be used to model the cell completely clear. In hybrid models, different parts of specific growth rate or can be used for model identification. biotechnological processes are modelled in different ways. In a study conducted by Ilkova [11] fuzzy logics were used to The main goal of modelling is to improve both process develop a structural and parametric identification of an E. coli management and quality. Therefore, the aim is to model each fed-batch laboratory process. In this study the authors process parameter as best as possible. Because process presented an approach for multicriteria decision-making – parameters are described in a variety of relationships, one InterCriteria Analysis to mathematical modelling of a way to model nonlinear relationships is to use artificial neural fermentation process. It is based on the apparatus of index networks. An artificial neural network can be understood as matrices and intuitionistic fuzzy sets. The approach for a set of certain nonlinear mathematical relationships such as multicriteria analysis makes it possible to compare certain hyperbolic tangents, logarithmic or sigmoidal functions. criteria or estimated by them objects. Basic relationships between different criteria in fed batch fermentation – Another method, that is widely used, is the ensemble biomass, substrate, oxygen and carbon dioxide were method [8]. It consists on building an ensemble of alternative explored. This allowed to create an adequate model that was models that comply with experimental observations. In able to predict the experimental data. particular, models with different complexity are generated In a study conducted by Liu [12] fuzzy stochastic Petri and compared with respect to their ability to reproduce key nets for modeling biotechnological systems with uncertain features of the data. To overcome data scarcity and kinetic parameters were analyzed. In this research the authors 11 applied fuzzy stochastic Petri nets by combining the strength of stochastic Petri nets to model stochastic systems with the strength of fuzzy sets to deal with uncertain information, ACKNOWLEDGMENT taking into account the fact that in biological systems some This research was funded by the European Regional kinetic parameters may be uncertain due to incomplete, vague Development Fund according to the supported activity or missing kinetic data, or naturally vary, e.g., between “Research Projects Implemented by World-class Researcher different individuals, experimental conditions, etc.. An Groups” under Measure No. 01.2.2-LMT-K-718. application of fuzzy stochastic Petri nets was demonstrated. REFERENCES In summary, their approach is useful to integrate qualitative experimental findings into a quantitative model and to explore the system under study from the quantitative point of [1] C. P. Jr. Grady, L. J. Harlow and R. R. Riesing, “Effects of the Growth Rate and Influent Substrate Concentration on Effluent Quality from view. Fuzzy stochastic Petri nets provide a good means to Chemostats Containing Bacteria in Pure and Mixed Culture”, consider parameter uncertainties in a model and to efficiently Biotechnol. Bioeng., vol. 14, no. 3, pp. 391–410, 1972. analyze how uncertain parameters affect the outputs of a [2] L. Papić, J. Rivas, S. Toledo and J. Romero, “Double-stranded RNA model. production and the kinetics of recombinant Escherichia coli HT115 in fed-batch culture”, Biotechnol. Rep., e00292, 2018 [3] S. Limoes, S. F. Rahman, S. Setyahadi and M. Gozan, “Kinetic study of Escherichia coli BPPTCC-EgRK2 to produce recombinant cellulase IV. CONCLUSIONS for ethanol production from oil palm empty fruit bunch”, IOP Conf. After the analysis, the following recommendations can Ser. Earth Environ. Sci., vol. 141, no. 1, 2018. [4] Y. Pomerleau and M. Perrier, “Estimation of multiple specific growth be taken into account when modeling biotechnological rates in bioprocesses”, AlChE J., vol. 36, no. 2, pp. 207–215, 1990. processes. It can be concluded, that the best suited model [5] F. Ardestani, F. Rezvani and G. D. Najafpour, “Fermentative lactic acid depends on the experience of the researcher and available production by lactobacilli: Moser and gompertz kinetic models”, measurement data: Journal of Food Biosciences and Technology, vol. 7, no. 2, pp. 67–74, 2017. 1. If there is little experience and lack of knowledge [6] J. C. Leyva-Díaz, L. M. Poyatos, P. Barghini, S. Gorrasi and M. Fenice, about the process, then mechanistic models should “Kinetic modeling of Shewanella baltica KB30 growth on different be used to model the process and its dynamics. substrates through respirometry”, Microb. Cell Fact., vol. 16, no. 1, 189, 2017. Monod kinetics are usually used to model [7] M. E. Poccia, A. J. Beccaria, R. G. Dondo, “Modeling the microbial biotechnological process biomass growth. growth of two Escherichia coli strains in a multi-substrate 2. If there is sufficient experimental data, hybrid environment”, Braz. J. Chem. Eng., vol. 31, no. 2, pp. 347–354, 2014. [8] E. Vasilakou, D. Machado, A. Theorell, I. Rocha, K. Nöh, et al., models that implement machine learning methods “Current state and challenges for dynamic metabolic modeling”, Curr. like neural networks and classical mechanistic Opin. Microbiol., vol. 33, pp. 97–104, 2016. models to model the researched process can be used, [9] M. Melcher, T. Scharl, B. Spangl, M. Luchner, M. Cserjan, at el., “The potential of random forest and neural networks for biomass and since these models consider processes parameters or recombinant protein modeling in Escherichia coli fed‐batch dynamics that are not described or left out in fermentations”, Biotechnol. J., vol. 10, no. 11, pp. 1770–1782, 2015. [10] M. von Stosch, S. Davy, K. Francois, V. Galvanauskas, J. Hamelink, mechanistic models. This type of models requires et al., “Hybrid modeling for quality by design and PAT‐benefits and large sets of experimental data. challenges of applications in biopharmaceutical industry”, Biotechnol. 3. If there are experts, that have very high process J., vol. 9, no. 6, pp. 719–726, 2014. [11] T. S. Ilkova and M. M. Petrov, “Intercriteria analysis for identification knowledge, fuzzy models can be also used, since of Escherichia coli fed-batch mathematical model”, J. Int. Sci. Publ.: they consider atypical process behavior. By Mater., Meth. Technol, vol. 9, pp. 598–608. (2015). assessing the verbal knowledge of the experts [12] F. Liu, M. Heiner and M. Yang, “Fuzzy stochastic petri nets for modeling biological systems with uncertain kinetic parameters”, PLoS complicated systems can be modeled and One, vol. 11, no. 2, e0149674, 2016. researched. [13] V. Galvanauskas and D. Levišauskas, “Biotechnologinių procesų modeliavimas, optimizavimas ir valdymas”, Vilniaus pedagoginio These recommendations can be used while deciding what universiteto leidykla, ISBN 978-9955-20-261-5, pp. 1–112, 2008. kind of methods to use creating a biotechnological process [14] V. Galvanauskas, R. Simutis and A. Lübbert, “Hybrid process models for process optimisation, monitoring and control”, Bioprocess Biosyst. model. Eng., vol. 26, no. 6, pp. 393–400, 2004. [15] Napoli, C., Pappalardo, G. and Tramontana, E., 2016. A mathematical model for file fragment diffusion and a neural predictor to manage priority queues over BitTorrent. International Journal of Applied Mathematics and Computer Science, 26(1), pp.147-160. [16] P. Jacsó, “Google Scholar duped and deduped–the aura of “robometrics”.” Online Information Review 35.1pp. 154-160, 2011. [17] I.F. Aguillo, “Is Google Scholar useful for bibliometrics? A webometric analysis.” Scientometrics 91.2, pp. 343-351, 2012. 12