<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>bioreactor control⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Oleksandr Petrovskyi</string-name>
          <email>oleksandr.s.petrovskyi@lpnu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yevgen Martyn</string-name>
          <email>evmartyn@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Lviv Polytechnic National University</institution>
          ,
          <addr-line>S. Bandery 12, 79013, Lviv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Lviv State University of Life Safety</institution>
          ,
          <addr-line>Kleparivska 35, 79007, Lviv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Bioreactors play a crucial role in making industrial biotechnological processes possible by providing a tool for creating and maintaining an optimal environment for proteins, cells, and cell cultures to grow and function in. In most cases, the overall technical and economic performance of an industrial biotechnological process heavily depends on the efficiency of utilized bioreactors; thus, optimizing their operation is of great theoretical and practical interest. However, due to the highly complex and stochastic nature of bioprocesses, many issues arise during the development and implementation of their autonomous control systems. This article analyzes the main challenges associated with the development of autonomous bioreactor controllers, reviews the most prominent ways of tackling them with reinforcement learning, and presents an implementation of an offline-to-online memory-based RL controller applied for the custom simulation of a backer yeast fed-batch bioreactor with a partially observable state. Results show that pretraining on approximate simulations can be successfully applied to increase the generalization capabilities and convergence speed of a memory-based RL agent in the context of partially observable bioprocess control, reducing the time required to reach high rewards. However, such questions as avoiding harmful overfitting during the pretraining and implementing an efficient memory mechanism for an agent remain open.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;bioprocess control</kwd>
        <kwd>fed-batch bioreactor</kwd>
        <kwd>reinforcement learning</kwd>
        <kwd>offline-to-online</kwd>
        <kwd>RSAC</kwd>
        <kwd>LSTM</kwd>
        <kwd>POMDP</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Bioreactors are indispensable in industrial biotechnology: they are used to produce
vaccines, antibiotics, biofuels, food, and beverages, synthesize complex proteins and
enzymes, process waste, grow tissues, organs, and more [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3</xref>
        ]. In the vast majority of
cases, it is the efficiency of the bioreactor that has a decisive impact on the overall
technical and economic performance of the production process [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. That is why creating
flexible and effective bioreactor control systems is a pressing problem the solution of
which can affect many areas of human life.
      </p>
      <p>
        However, in practice, a functioning bioreactor is an extremely complex stochastic
dynamic system with a high degree of nonlinearity. Many factors can influence the course
of a bioprocess both at the macroscopic level (substrate absorption, oxygen saturation,
accumulation of growth inhibitors, etc.) and at the microscopic level of individual cells [
        <xref ref-type="bibr" rid="ref5 ref6">5,
6</xref>
        ]; there are significant limitations in the ability to measure the actual state of the system
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]; each bioprocess requires its specific parameters to be taken into account, and each
bioreactor is different from the others, as it is specially designed to solve particular tasks
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. In addition, we never know the exact model of the underlying bioprocess, and finding
an approximate model that aligns with the limited experimental data can be quite
challenging [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. That is why such modern process control methods as
ProportionalIntegral-Derivative (PID), Fuzzy Logic Control (FLC), and Model Predictive Control (MPC),
despite their widespread use, are often unable to provide optimal control of bioprocesses
due to their limitations or require an accurate mathematical model, which in most cases is
extremely difficult to build [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        This has led to growing scientific interest in reinforcement learning (RL) in the context
of bioreactor control [
        <xref ref-type="bibr" rid="ref10 ref11 ref9">9, 10, 11</xref>
        ], thanks to which the optimal control problem can be
considered as a Markov Decision Process (MDP) in which an agent tries to learn to
maximize the cumulative reward it receives when interacting with a complex, uncertain
environment [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Extensive experience in applying RL in various industrial environments
has shown that RL controllers are able to outperform traditional methods in control
accuracy and speed, have an outstanding ability to generalize and learn without prior
knowledge and models built by experts [13].
      </p>
      <p>Therefore, this article aims to review the modern experience of using reinforcement
learning in the context of autonomous bioreactor control, as well as common related
problems and ways to solve them, synthesize a practical approach to implementing an
RLcontroller, and test it on the simulation of a fed-batch baker yeast bioreactor with a
partially observable state.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>Treloar et al. [14] investigated the possibility of utilizing DQN to effectively control
bioprocesses where several microbial cultures coexist simultaneously. The authors
empirically proved the algorithm's robustness to different initial conditions, demonstrated
the possibility of obtaining a sufficiently good policy in 24 hours with the help of the
parallel use of 5 bioreactors, and even showed the ability of an RL agent to perform control
with higher efficiency than a traditional PID controller as measurement frequency
decreases.</p>
      <p>Authors of [15] have successfully applied Deep Deterministic Policy Gradient (DDPG)
to control the temperature of an ethanol fermenter (simulation), demonstrating faster
convergence and higher control precision in comparison with DQN, as well as its ability to
effectively react to random disturbances in the force and temperature of the incoming flow,
quickly returning the system to the desired state. Several sources describe the successful
application of an improved version of DDPG – T3D (Twin Deep Delayed Deterministic
Policy Gradient) for controlling bioreactors and wastewater treatment systems
(simulations) [16, 17, 18] and its ability to converge to better policies.</p>
      <p>Soft Actor-Critic is a widely used RL algorithm that often demonstrates state-of-the-art
performance [19]. SAC effectively works with continuous observation and action spaces,
utilizes the stochastic policy, which increases the robustness to uncertainties, and the
maximum entropy framework with automatic temperature adjustment, which provides a
more efficient environment exploration strategy. In addition, it uses all the successful
architectural decisions of the algorithms discussed above: off-policy learning, replay buffer,
Actor-Critic design, double Q-functions, target networks, soft update, etc.</p>
      <p>Performance comparison of PG, DQN, A2C, DDPG, and SAC algorithms applied to such
bioreactor control tasks as valuable product maximization and maintaining the biomass
concentration at a fixed level demonstrated that the Actor-Critic architecture is
significantly more effective than value-based and policy-based methods separately, and
among all the algorithms implementing it, SAC had the best performance in terms of the
convergence speed and control efficiency [20]. In another study, SAC outperformed TRPO,
PPO, and TD3 in the task of HVAC control [21].</p>
      <p>
        In practice, despite the availability of powerful algorithms, there is often a lack of
available data for offline training of RL controllers, while the cost of online training is too
high. The situation is further complicated by the complexity of bioprocesses, which makes
it challenging to create high-quality mechanistic models and simulations of them.
Therefore, hybrid approaches that can provide a compromise by combining the best of
mechanistic and data-driven approaches are of particular interest, despite the fact that, as
pointed out by Monteiro and Kontoravdi, “yet it is unclear how best to integrate these two
components and how to account for plant-model mismatch that characterizes
bioprocesses” [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>One promising way to solve this problem is offline-to-online reinforcement learning,
which combines the stage of offline pretraining on available data or approximate
simulation with online adaptation in the real system. And while pretraining is widely used
in other areas of ML, offline-to-online reinforcement learning is a relatively new area with
many unique problems that are being explored in detail in [22].</p>
      <p>In the context of this paradigm, an interesting approach is proposed by Pandian et al.
[23]. The authors combine inverse neural networks (INN) and RL, but unlike previous
works on similar topics, the separately trained inverse network is not used as a direct
policy-function but as a tool to initialize the agent's Q-table. This hybrid approach
simultaneously reduces the requirements for the amount of existing data (a disadvantage
of INN) and speeds up the convergence of the agent. The authors demonstrated the
effectiveness of the described approach in a real-world laboratory setup. However, the
disadvantages of the proposed algorithm are the requirements to use tabular Q-Learning
and discretize continuous variables, which makes it hard to apply for solving more
complex continuous control problems due to the so-called curse of dimensionality.</p>
      <p>Petsagkourakis et al. [24] also propose the use of the staged integration of the RL
controller, in which, in the first stage, a simple mechanistic approximation of the target
process model is used for offline pretraining of the Policy-Gradient agent ; in the second,
some network parameters are frozen and the network is additionally trained on data
obtained from the real system; and in the third, the pretrained RL controller is used to
control a real system in online mode.</p>
      <p>
        Another critical problem that arises in the development of autonomous bioprocess
controllers is the inability to fully measure the actual state of the system. [
        <xref ref-type="bibr" rid="ref6">6, 25</xref>
        ]. This
transforms the control task from MDP into Partially Observable Markov Decision Process
(POMDP), which is formulated as a 6-tuple , where are defined as in MDP, O is a set of
possible observations, and is an emission function that determines which observations are
available for an agent, given the current state and the chosen action. Thus, the RL agent is
faced with finding an optimal policy that maximizes the expected sum of rewards under
conditions of incomplete information.
      </p>
      <p>The dominant way to solve this problem is via the memory-based reinforcement
learning – augmentation of the RL agent with a sequence model, mainly in the form of a
recurrent neural network [26], or a transformer [27], which provides the agent with
“memory”, thanks to which it can approximate the system’s dynamics and the current
state based on the history of observations. Some of the algorithms developed with this
principle in mind are LSTM-TD3 [28], BSAC [29], and an open-sourced family of recurrent
versions of the most popular RL algorithms, which are described and compared in [30].
Out of the latter, RSAC with LSTM layers demonstrated the best performance.</p>
      <p>Therefore, considering the effectiveness of the offline-to-online training approach and
the ability of memory-based RL algorithms to capture the latent system’s dynamics, as well
as the advantages of Soft Actor-Critic (and the frequent emergence of new modifications of
this algorithm), the problem of developing a method for the staged integration of RSAC for
autonomous bioreactor control becomes relevant.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Combining Offline-to-online and POMDP</title>
      <p>3.1.</p>
      <sec id="sec-3-1">
        <title>Staged Integration</title>
        <p>The staged integration of the Recurrent Soft Actor-Critic bioreactor controller is proposed
to be conducted in the following way:
1. Convert the approximate mechanistic model of the target bioprocess to the form of
POMDP and construct an RL environment based on it. Many industrial bioreactors
are already operated by MPC-controllers that rely on empirically-derived
mathematical models, but if the approximate model is missing, it must be created.
2. Create the RSAC agent and tune its hyperparameters, e.g., the number and size of
hidden layers, learning rates, trajectory size, rate of weight updates, etc.
3. Pretrain the agent on several deterministic simulations with simpler dynamics or
slightly different parameters to promote a better generalization of biomass
concentration change laws to obtain a more flexible initial policy.
4. Pretrain the agent on the original simulation with added random noises and
disturbances.
5. If real data is available, pretrain the agent using it via loading it into the agent’s
replay buffer in the form of 4-tuple
6. Integrate the RL controller into the real system and for online finetuning.</p>
        <p>This approach allows for taking maximum advantage of the available knowledge about
the real system. Thanks to the pretraining a good initial approximation of an ideal control
policy is obtained, which is able to speed up the algorithm’s convergence, improve its
robustness to random noises and disturbances, lower the need for costly exploration of the
environment and probability of bringing the bioprocess to irreversible critical states,
because the agent, thanks to prior knowledge, will avoid completely unpromising actions
even during exploration.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Custom POMDP Environment</title>
        <p>Approbation of the proposed method's effectiveness in maintaining a biomass
concentration at a fixed level was conducted with the utilization of a fed-batch baker’s
yeast bioreactor simulation based on a mathematical model described and used by Pandian
and Noel in [27].</p>
        <p>This model was chosen because despite the small number of its parameters and
controlled variables, it still preserves all the characteristic features of a complex nonlinear
system, is easy to modify, and is based on the Monod equation, which describes the
cellular growth dynamics and is widely used in environment engineering.</p>
        <p>Model’s dynamics is defined by the system of two coupled differential equations (1), the
first of which describes the rate of change of biomass concentration x1, and the second –
substrate concentration x2. Table 1 contains the description and values of the
model parameters we used during the method’s approbation.</p>
        <p>Considering the problem as a Markov Decision Process, the system’s state consists of
biomass and substrate concentrations in the vessel while the action is
substrate concentration in the feed . As in the original work, the reward
function R is defined as an absolute error between the real and desired values of biomass
concentration (2).</p>
        <p>However, unlike the original implementation of the environment, we remove the
assumption that the current substrate concentration in the bioreactor is an observed
variable, turning the control task from MDP to POMDP. This way, the set of possible
observations becomes
and emission function can be defined as in (3).
(1)
(2)
(3)
where ϵ is random Gaussian noise with mean 0 and standard deviation σ Z.</p>
        <p>Also, in contrast with the original work, during training the environment is initialized
with random values at the beginning of every episode, where
is a continuous uniform distribution bounded by the interval Such an
approach allows the agent to learn how to quickly restore the system's desired state in case
of any random disturbances.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>4.1.</p>
      <sec id="sec-4-1">
        <title>Bioprocess simulations</title>
        <p>We used three simulations to test our method: one stochastic (TRUE), which imitates the
real bioprocess, and two auxiliary deterministic (AUX1, AUX2), parameters of which were
slightly changed with respect to TRUE. Simulations were implemented with the help of
Gymnasium and odeint function from scipy Python package as a differential equation
system solver. Values of the parameters of each simulation are listed in Table 2.
At first, to assess the general ability to maintain the biomass concentration at a fixed level
under conditions of limited observability, the performance of SAC and RSAC algorithms
was compared when applied to fully and partially observable versions of TRUE simulation
respectively. In both cases, training lasted 50 episodes with an environment rollout length
of 160 and batch/trajectory size of 16. Both environments were randomly initialized at the
beginning of each episode to improve the flexibility of the policy and its robustness to
disturbances and untypical states of the system.</p>
        <p>To evaluate how quick deviation from the desired state can be eliminated in both cases
after sufficient training, 1000 test environment rollouts were performed using the frozen
latest policies. The mean average error (MAE) between actual and desired biomass
concentration was recorded during each rollout. The mean and standard deviation of MAE
of 1000 tests are listed in Table 3.
To check the convergence speed of the algorithm with the proposed approach, sequential
pretraining of the RSAC agent was performed with the update period of 4 steps (hours) on
deterministic environments AUX1 and AUX2 during seven episodes (week equivalent) 24
steps each (day equivalent), and its application to TRUE environment lasting 48 steps.</p>
        <p>The 4-hour weight update period was chosen as it was the smallest trajectory size the
algorithm could converge with. The higher update frequency decreases the agent’s
reaction time, which determines how quickly it can adapt to changes in the environment.
However, if the update interval is too small, trajectories in the replay buffer become too
short for the sequence model to accurately approximate the environment dynamics from.</p>
        <p>For comparison, RSAC without pretraining was also applied to the TRUE environment.
Figure 2 illustrates the reward dynamics of each algorithm during the online training (left)
and the test 200-step rollout of their frozen latest policies (right).</p>
        <p>It can be seen that the agent pretrained on AUX1 and AUX2 simulations reached the
reward plateau after 6 hours of operation of the simulated bioreactor, which took 14 hours
for an agent without pretraining. The performance indicators of the algorithms with and
without pretraining are given in Table 4.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>This article reviews key challenges in developing autonomous bioreactor control
systems, including high complexity, non-linearity, stochasticity, limited system
observability, measurement errors, and insufficient data—factors that complicate accurate
modeling and control. The novelty of the study lies in the introduction of a hybrid
approach to RL-based smart controller development that can help resolve the
abovementioned problems via the combination of memory-based RL, staged
offline-toonline integration, and (Recurrent) Soft Actor-Critic algorithm. Approbation of the
effectiveness of the proposed method was performed by applying an RSAC agent,
pretrained on two approximate deterministic simulations, to control a simulation of a
fedbatch baker’s yeast bioreactor with a partially observable state. Experiments demonstrated
the ability of the agent created this way to adapt to the real environment and bring the
system to the desired state much faster (6 hours vs. 14 without pretraining).</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>The authors have not employed any Generative AI tools.
[13] N. Nievas, A. Pagès-Bernaus, F. Bonada, L. Echeverria, X. Domingo Albin,
Reinforcement learning for autonomous process control in industry 4.0: advantages
and challenges, Appl. Artif. Intell. 38 (2024). doi:10.1080/08839514.2024.2383101.
[14] N. Treloar, A. Fedorec, B. Ingalls, C. Barnes, Deep reinforcement learning for the
control of microbial co-cultures in bioreactors, PLOS Comput. Biol. 16 (2020) e1007783.
doi:10.1371/journal.pcbi.1007783.
[15] R. Sekhar, T. Radhakrishnan, S. Naina Mohamed, Deep deterministic policy gradient
reinforcement learning based temperature control of a fermentation bioreactor for
ethanol production, J. Indian Chem. Soc. 102 (2025) 101575.
doi:10.1016/j.jics.2025.101575.
[16] R. Sekhar, T. Radhakrishnan, S. Naina Mohamed, Reinforcement learning based
temperature control of a fermentation bioreactor for ethanol production, Biotechnol.</p>
      <p>Bioeng. 121 (2024) 3114–3127. doi:10.1002/bit.28784.
[17] Z. Klawikowska, M. Grochowski, Optimizing control of wastewater treatment plant
with reinforcement learning: technical evaluation of twin-delayed deep deterministic
policy gradient agent, IEEE Access PP (2024) 1–1. doi:10.1109/ACCESS.2024.3458186.
[18] H. Croll, K. Ikuma, S. Ong, S. Sarkar, Systematic performance evaluation of
reinforcement learning algorithms applied to wastewater treatment control
optimization, Environ. Sci. Technol. 57 (2023). doi:10.1021/acs.est.3c00353.
[19] T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V. Kumar, H. Zhu,
A. Gupta, P. Abbeel, et al., Soft actor-critic algorithms and applications, 2018.
doi:10.48550/arXiv.1812.05905.
[20] W. Zhu, I. Castillo, Z. Wang, R. Rendall, L. Chiang, P. Hayot, J. Romagnoli, Benchmark
study of reinforcement learning in controlling and optimizing batch processes, J. Adv.</p>
      <p>Manuf. Process. 4 (2022). doi:10.1002/amp2.10113.
[21] M. Biemann, F. Scheller, X. Liu, L. Huang, Experimental evaluation of model-free
reinforcement learning algorithms for continuous HVAC control, Appl. Energy 298
(2021) 117164. doi:10.1016/j.apenergy.2021.117164.
[22] Z. Xie, Z. Lin, J. Li, S. Li, D. Ye, Pretraining in deep reinforcement learning: A survey,
2022. doi:10.48550/arXiv.2211.03959.
[23] J. Pandian, M. Noel, Control of a bioreactor using a new partially supervised
reinforcement learning algorithm, J. Process Control 69 (2018) Pages 16–29.
doi:10.1016/j.jprocont.2018.07.013.
[24] P. Petsagkourakis, E. Bradford, D. Zhang, E. del Rio-Chanona, Reinforcement learning
for batch bioprocess optimization, Comput. Chem. Eng. 133 (2019) 106649.
doi:10.1016/j.compchemeng.2019.106649.
[25] H. Li, T. Qiu, F. You, AI-based optimal control of fed-batch biopharmaceutical process
leveraging deep reinforcement learning, Chem. Eng. Sci. 292 (2024) 119990.
doi:10.1016/j.ces.2024.119990.
[26] C. Luis, A. Bottero, J. Vinogradska, F. Berkenkamp, J. Peters, Uncertainty
representations in state-space layers for deep reinforcement learning under partial
observability, 2024. doi:10.48550/arXiv.2409.16824.
[27] M. Weissenbacher, A. Borovykh, G. Rigas, Reinforcement learning of chaotic systems
control in partially observable environments, Flow, Turbul. Combust. (2025) 1–22.
doi:10.1007/s10494-024-00632-5.
[28] M. Lingheng, R. Gorbet, D. Kulic, Memory-based deep reinforcement learning for</p>
      <p>POMDPs, 2021. doi:10.1109/IROS51168.2021.9636140.
[29] Y. Yang, Y. Jiang, J. Chen, S. Li, Z. Gu, Y. Yin, Q. Zhang, K. Yu, Belief state actor-critic
algorithm from separation principle for POMDP, 2023.
doi:10.23919/ACC55779.2023.10155792.
[30] Z. Yang, H. Nguyen, Recurrent off-policy baselines for memory-based continuous
control, 2021. doi:10.48550/arXiv.2110.12628.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
             
            <surname>Spier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
             
            <surname>Vandenberghe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
             
            <surname>Medeiros</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
             
            <surname>Soccol</surname>
          </string-name>
          ,
          <article-title>Application of different types of bioreactors in bioprocesses</article-title>
          ,
          <source>Bioreactors</source>
          (
          <year>2011</year>
          )
          <fpage>53</fpage>
          -
          <lpage>87</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>G.</surname>
          </string-name>
           
          <article-title>Regonesi, Bioreactors: A complete review</article-title>
          ,
          <year>2023</year>
          . doi:
          <volume>10</volume>
          .13140/RG.2.2.11630.79685.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F.</given-names>
             
            <surname>Palladino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
             
            <surname>Schlogl</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
           Jose, R. Rodrigues,
          <string-name>
            <given-names>D.</given-names>
             
            <surname>Fabrino</surname>
          </string-name>
          ,
          <string-name>
            <surname>I.</surname>
          </string-name>
           Santos,
          <string-name>
            <given-names>C.</given-names>
             
            <surname>Rosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
             
            <surname>Mario</surname>
          </string-name>
          ,
          <article-title>Bioreactors: applications and innovations for a sustainable and healthy future-a critical review</article-title>
          ,
          <source>Appl. Sci</source>
          .
          <volume>14</volume>
          (
          <year>2024</year>
          )
          <article-title>9346</article-title>
          . doi:
          <volume>10</volume>
          .3390/app14209346.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>F.</given-names>
             
            <surname>Raganati</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
           
          <article-title>Procentese, Special issue on “bioreactor system: design, modeling and continuous production process”</article-title>
          ,
          <source>Processes</source>
          <volume>10</volume>
          (
          <year>2022</year>
          )
          <year>1936</year>
          . doi:
          <volume>10</volume>
          .3390/pr10101936.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
             
            <surname>Soni</surname>
          </string-name>
          ,
          <string-name>
            <surname>R.</surname>
          </string-name>
           Parker,
          <article-title>Closed-Loop control of fed-batch bioreactors: A shrinking-horizon approach</article-title>
          ,
          <source>Ind. Eng. Chem. Res. - IND ENG CHEM RES 43</source>
          (
          <year>2004</year>
          ). doi:
          <volume>10</volume>
          .1021/ie030535b.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P.</given-names>
             
            <surname>López-Pérez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
             
            <surname>Aguilar-López</surname>
          </string-name>
          ,
          <string-name>
            <surname>R.</surname>
          </string-name>
           Femat,
          <article-title>Control in bioengineering and bioprocessing: modeling, estimation and the use of soft sensors</article-title>
          .,
          <year>2020</year>
          . doi:
          <volume>10</volume>
          .1002/9781119296317.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Y.</given-names>
             
            <surname>Ma</surname>
          </string-name>
          , D. 
          <string-name>
            <surname>Noreña-Caro</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
           
          <string-name>
            <surname>Adams</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
           
          <string-name>
            <surname>Brentzel</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
           Romagnoli,
          <string-name>
            <surname>M.</surname>
          </string-name>
           Benton,
          <article-title>MachineLearning-Based simulation and fed-batch control of cyanobacterial-phycocyanin production in plectonema by artificial neural network and deep reinforcement learning</article-title>
          ,
          <source>Comput. Chem. Eng</source>
          .
          <volume>142</volume>
          (
          <year>2020</year>
          )
          <article-title>107016</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.compchemeng.
          <year>2020</year>
          .
          <volume>107016</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>E.</given-names>
             
            <surname>Bolmanis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
             
            <surname>Dubencovs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
             
            <surname>Suleiko</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
           Vanags,
          <article-title>Model predictive control-a stand out among competitors for fed-batch fermentation improvement</article-title>
          ,
          <source>Fermentation</source>
          <volume>9</volume>
          (
          <year>2023</year>
          )
          <article-title>206</article-title>
          . doi:
          <volume>10</volume>
          .3390/fermentation9030206.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
             
            <surname>Monteiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
             
            <surname>Kontoravdi</surname>
          </string-name>
          ,
          <article-title>Bioprocess control: A shift in methodology towards reinforcement learning</article-title>
          ,
          <year>2024</year>
          , pp. 
          <fpage>2851</fpage>
          -
          <lpage>2856</lpage>
          . doi:
          <volume>10</volume>
          .1016/B978-0
          <source>-443-28824-1</source>
          .
          <fpage>50476</fpage>
          -
          <lpage>2</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>T.</surname>
          </string-name>
           
          <article-title>Oh, Quantitative comparison of reinforcement learning and data-driven model predictive control for chemical and biological processes</article-title>
          ,
          <source>Comput. Chem. Eng</source>
          .
          <volume>181</volume>
          (
          <year>2023</year>
          )
          <article-title>108558</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.compchemeng.
          <year>2023</year>
          .
          <volume>108558</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Y.</given-names>
             
            <surname>Haeun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
             
            <surname>Byun</surname>
          </string-name>
          , D. Han,
          <string-name>
            <given-names>J</given-names>
            . 
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>Reinforcement learning for batch process control: Review and perspectives</article-title>
          ,
          <source>Annu. Rev. Control</source>
          <volume>52</volume>
          (
          <year>2021</year>
          ). doi:
          <volume>10</volume>
          .1016/j.arcontrol.
          <year>2021</year>
          .
          <volume>10</volume>
          .006.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>R. S.</given-names>
            <surname>Sutton</surname>
          </string-name>
          and
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Barto</surname>
          </string-name>
          ,
          <article-title>Reinforcement Learning, second edition: An Introduction</article-title>
          . MIT Press,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>