<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Addressing Catastrophic Forgetting and Beyond: Key Challenges in Continual Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rui Teng</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aihui Wang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hengyi Li</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jinkang Dong</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yao Yao</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xueying Hu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>The School of Automation and Electrical Engineering, Zhongyuan University of Technology</institution>
          ,
          <addr-line>450007 Zhengzhou</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Current artificial intelligence relies on a one-time training process based on a predefined data set, which remains static during the subsequent reasoning and operation stages. However, a true artificial intelligence system needs to demonstrate the ability to continual learning, that is, to dynamically adapt to the changing environment and new information and to continuously evolve. In the continual learning scenario, catastrophic forgetting is the core problem it encounters. Therefore, this paper first systematically investigates various methods to deal with catastrophic forgetting; secondly, it classifies various methods and deeply analyzes their theoretical basis, specific cases, advantages and disadvantages; ifnally, it proposes the key challenges and future development directions currently facing continual learning, laying a solid foundation for building an artificial intelligence system with adaptive and self-improvement capabilities.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Artificial Intelligence</kwd>
        <kwd>Continual learning</kwd>
        <kwd>Catastrophic forgetting</kwd>
        <kwd>Stability -Plasticity</kwd>
        <kwd>Experience replay</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Artificial intelligence enables machines to simulate human intelligent behavior to perceive the
environment, recognize information and make reasoning decisions [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. As an important branch
of artificial intelligence, deep learning enables automatic extraction of multi-level features
directly from raw inputs by building and training multi-layer neural networks to achieve
intelligent tasks such as pattern recognition, prediction and decision-making [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Deep learning
has found extensive applications in natural language processing, image recognition, autonomous
driving and other fields [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. However, current deep learning usually performs one-time training
in a static environment, which means that the model parameters are no longer updated and
are unable to adapt to constantly changing dynamic scenarios [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In addition, model training
demands extensive labeling of data samples, which makes its generalization ability for a small
number of samples weaker [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. To address these shortcomings, intelligent systems need to
continuously acquire, update, accumulate and utilize knowledge during their life cycle. This
ability is called continual learning [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        The primary objective of continual learning is to design algorithms that are able to learn and
adapt to continuous data streams [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. However, when a model sequentially learns new tasks, it
usually overwrites the parameters of previous tasks [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], leading to impaired performance on
tasks learned earlier, a phenomenon often referred to as "catastrophic forgetting" [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. This is
mainly because in a multi-task environment, the same set of parameters needs to serve both new
and previous tasks, resulting in conflicts between the optimal solutions for new and previous
tasks when updating parameters [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>
        To suppress catastrophic forgetting within the continual learning framework, Researchers
have explored diferent tactics [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], mainly including dynamic architecture-based methods,
regularization-based methods, and replay-based methods [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>
        The dynamic architecture-based method is to separate the parameters of diferent tasks by
expanding the model structure when faced with new tasks, to avoid parameter conflicts between
the new and previous tasks [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. As an illustration, Iman et al. proposed a continuous and
progressive learning system for deep transfer learning - EXPANSE [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. The regularization-based
method prevents drastic parameter changes by adding penalty terms for important parameters
of previous tasks in the loss function. For example, Wakelin et al. proposed an analysis of
current continual learning algorithms when addressing the image classification problem [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
Generative replay is to reconstruct the data of previous tasks by training generative models
[
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. For example, Shin et al. proposed deep generative replay [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
      </p>
      <p>This paper systematically reviews methods for preventing catastrophic forgetting in
continual learning, summarize the theoretical basis and specific cases of various methods, analyze
their advantages and disadvantages, and finally discuss the future research direction of
continual learning to provide practical references for promoting the stable application of artificial
intelligence in dynamic environments.</p>
      <p>The following sections are arranged as follows: Section II classifies the typical methods for
solving the problem of catastrophic forgetting; Section III delivers an extensive analysis of
the principal challenges and prospective research directions in continual learning; Section IV
summarizes the main results of this paper.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Taxonomy of Methods</title>
      <p>To counteract catastrophic forgetting within continual learning frameworks, this section
systematically reviews the primary approaches proposed in relative literature, analyzes their
theoretical foundations along with representative cases (Table 1), and evaluates the strengths
and weaknesses of each approach.</p>
      <sec id="sec-2-1">
        <title>2.1. Dynamic Architecture-Based Methods</title>
        <p>
          The dynamic architecture–based method modifies the neural network’s structure to enable the
model to adaptively learn new knowledge while ensuring that previously acquired knowledge
remains intact, thereby alleviating the phenomenon of catastrophic forgetting [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. This
approach achieves its goal by designing networks on demand, assigning independent parameters
to each task, introducing adaptive submodules, and dividing the model into shared and dedicated
components [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ].
        </p>
        <p>
          Progressive Neural Network (PNN) is the most common method based on dynamic
architecture expansion. PNN has a multi-column architecture, and each new task corresponds to
a separate network branch. When learning a new task, each layer within the newly added
column reuses features extracted by the previous column through the adapter lateral connection
to achieve knowledge transfer [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. PNN starts with one base column, assume a deep neural
network with  layers, hidden activations ℎ(1) ∈ , let  denote the neuron count of layer
 ≤  . Its parameters  (1) are trained to convergence. At the initiation of a new task, the
previous-task parameters  (1) are frozen and the new column’s parameters  (2) are randomly
initialized. The activation ℎ(2) in layer  then takes input from its own previous layer in the
column, ℎ(12) and the corresponding layer in the preceding column, ℎ(11) via lateral connections
− −
[
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]. More generally, for the -th task, the activation in layer  is given by,
ℎ() =  ( ()ℎ(1) + ∑︁ (:)ℎ1 )
 − −
&lt;
(1)
where  () ∈ × −1 is the weight matrix of the column  of the -th layer, (:) ∈
×  denote the lateral links connecting layer  − 1 of column  with layer  of column , ℎ0
is the input of the network. Figure 1 is a schematic diagram of a three-column PNN. The two
columns on the left represent the training of tasks 1 and 2. The third column is dedicated to the
ifnal task, which can receive the features of all previously learned old task layers.
        </p>
        <p>PNN has achieved continual learning capability with “zero forgetting” in multiple
reinforcement learning tasks. For example, in the Atari game experiment, every time PNN learns a new
game, it adds a network column and reuses the convolutional features and strategies of the
existing pathways through lateral connections. This design not only helps to achieve cross-task
knowledge transfer and sharing, but also efectively prevents information interference between
diferent tasks, thereby maintaining a clear separation between tasks.</p>
        <p>
          The advantage of PNN is that it is able to learn in an orderly manner during multi-task training,
and it has flexible knowledge transfer capabilities to avoid forgetting previous knowledge [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ].
This approach sufers from the drawback that parameter size expands proportionally with the
growth in task number. This results in a significant increase in computing resources and storage
requirements, which poses challenges to practical applications in environments with numerous
tasks or limited resources.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Regularization-Based Methods</title>
        <p>
          Through the incorporation of regularization terms into the loss function, regularization-based
methods constrain alterations to key parameters, thus reducing forgetting [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]. This approach
is classified into weight regularization and knowledge distillation.
        </p>
        <p>
          Weight regularization aims to regularize the model parameters associated with the previous
task diferently according to their significance [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]. Parameters deemed highly important are
constrained during new task training to avoid significant changes, thereby preventing the
forgetting of knowledge from earlier tasks. The methods for estimating parameter importance
include Elastic Weight Consolidation (EWC), Synaptic Intelligence (SI), and Memory Aware
Synapses (MAS) [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ].
        </p>
        <p>EWC uses the Fisher Information Matrix to quantify the relevance of network parameters to
earlier tasks. The Matrix is defined as follows,</p>
        <p>Here,   denotes the parameter values obtained following training on the prior task, and
E denotes the expectation over the data distribution. (|, ) represents the model’s output
probability distribution.</p>
        <p>Once the parameter importance is estimated, a weighted regularization term is embedded
within the original loss function during new task training to restrict changes in crucial
parameters. The EWC loss function is defined as follows,
ℒ() = ℒ new() +
∑︁ 

2 (  − 
*, )
2
ℒnew() denotes the conventional loss used for training the new task,  denotes the
regularization strength hyperparameter,  indicates the significance of parameter  estimated via the
Fisher Inform,  *,  signifies the -th parameter value after the prior task’s training phase, and
  denotes the current value of the -th parameter.</p>
        <p>Throughout the training phase, SI dynamically evaluates the importance of parameters by
evaluating the marginal impact of each parameter update to the loss reduction and integrating
it along the training path. It then protects these important parameters through weighted
regularization terms to reduce the forgetting of previous knowledge. The path integral is
expressed by the following formula,</p>
        <p>−1
Ω  = ∑︁</p>
        <p>, (4)
=1 (∆ , )2 + 
where  reflects each parameter’s significance to the loss function, calculated as the product,
 =  ℒ(⃗)   . ∆  =  ( ) −  (0) indicates the extent of parameter shift after T iterations
of training on the -th task.  is updated in each iteration, while ∆  is updated only after T
iterations.  represents a numerical stability term, which is used to prevent the denominator
from being too small and causing a numerical explosion. It is generally set to  = 0.01.</p>
        <p>The loss function for SI is given as follows,
[︃︂(</p>
        <p>= E∼
log (|, )
(5)
(6)
ℒSI = ℒnew() + 
∑︁(Ω (  − 

 *,  denotes the parameter values after training on the previous task, Ω  is the importance
weight of parameter  .  is a hyperparameter.</p>
        <p>
          MAS evaluates parameter importance by analyzing the autocorrelation of feature activations
[
          <xref ref-type="bibr" rid="ref26">26</xref>
          ]. We use parameter gradient variance to measure its sensitivity. Gradient variance serves as
an indicator of a parameter’s importance, with larger values reflecting stronger influence on
the model’s output. The calculation formula is as follows,
︂[ ⃒⃒ ‖‖ ⃒
        </p>
        <p>2 ⃒ ]︂
Λ  = E∼  ⃒⃒   ⃒⃒</p>
        <p>E∼  represents the expectation of the preceding task dataset, ‖‖ refers to the output
vector of the network for input  in the final layer,   represents the parameter values after
training.</p>
        <p>The loss function for MAS is given as follows,
ℒMAS = ℒnew() + 
∑︁(Λ (  − 

 *,  denotes the parameter values after training on the previous task, Λ  is the importance
weight of parameter  .  is a hyperparameter.</p>
        <p>
          In addition to preventing previous knowledge from being covered by adding regularization
terms to limit changes in important parameters, there is also a commonly used method called
knowledge distillation. Knowledge distillation regularization is diferent from weight
regularization. It transfers the constraint object from the parameter space to the output space, and pays
more attention to whether the model preserves the output behavior consistency of earlier tasks
as it learns new tasks. Knowledge distillation in general terms is a large neural network (teacher
model) in the knowledge condensed and refined to the small neural network (student model),
that is, to carry out the migration of knowledge, according to diferent transfer mechanisms,
knowledge distillation is divided into two paradigms: target distillation and feature distillation
[
          <xref ref-type="bibr" rid="ref27">27</xref>
          ].
        </p>
        <p>
          Target distillation refers to directly let the student model to imitate the teacher model in the
ifnal output layer of the prediction results, The commonly used loss is mainly Kullback -Leibler
(KL) Divergence [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ] or Cross Entropy [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ].
        </p>
        <p>When the KL divergence is employed to quantify the discrepancy between the output
distributions of the teacher and student models, the loss can be written as,
(7)
(8)
(9)
(10)
 = ( ()||())</p>
        <p>Where,  () and () represent the probability distribution of the teacher model and the
student model for input  output respectively. In addition, the cross-entropy also serves directly
as the distillation loss. The calculation formula is as follows,
 = −

∑︁  ()()
=1
where,  () and () refer to the prediction probability of the -th category after softmax
of the input  by the teacher and the student model respectively.  is the total number of
categories.</p>
        <p>
          The output process of feature distillation is diferent from that of target distillation. It focuses
more on the consistency of internal representation rather than aligning only at the output layer
[
          <xref ref-type="bibr" rid="ref30">30</xref>
          ]. Its output process is usually based on Euclidean distance loss rather than KL divergence.
The Euclidean distance loss formula is as follows,
        </p>
        <p>= ||ˆ − || 2
where, ˆ denotes the logits produced by the prior model,  indicates the new model’s logit
outputs.</p>
        <p>In practical applications, feature distillation is often combined with target distillation, which
simultaneously optimizes output consistency and internal feature similarity. This approach is
very suitable for model compression, network acceleration, and transfer learning scenarios.</p>
        <p>Compared with dynamic architecture-based methods, regularization-based methods are not
required to add new network columns when learning new tasks. They only need to impose
constraints on important parameters of previous tasks in the loss function. Therefore, they
have low computational overhead and simple implementation, but it is dificult to achieve "zero
forgetting".</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Replay-Based Methods</title>
        <p>
          The replay-based method is also one of the typical methods to solve "catastrophic forgetting"
problem. It saves a set of input-output pair samples into the memory module, and then
incorporates these samples with data from the current task for model training [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ]. The implementation
methods of this method include experience replay and generative replay [
          <xref ref-type="bibr" rid="ref32">32</xref>
          ].
        </p>
        <p>
          Experience replay stores a subset of past task samples in a replay bufer and interleaves these
with new-task data during training, ensuring simultaneous learning of current and previous
tasks to mitigate forgetting. This method uses real data and has high model stability [
          <xref ref-type="bibr" rid="ref33">33</xref>
          ].
Incremental Classifier and Representation Learning(iCaRL) is an incremental learning method
based on experience replay. The core group process of iCARL includes three steps. First,
classification is performed using the nearest-mean-of-exemplars (NME) rule. Second, exemplars
are selected and prioritized with the herding algorithm. Third, representation learning integrates
knowledge distillation with prototype rehearsal. This approach enables continuous learning
without access to all historical data.
        </p>
        <p>For generative replay, the initial stage consists of the training of a generative model for
approximating the data distribution of the previous task, and then the generated samples are
incorporated into the training set of the current task to maintain the memory of the previous
distribution and alleviate forgetting. Since there is no need to store the original data directly,
this method is very suitable in some scenarios where privacy needs to be guaranteed, but the
stability of the model is heavily influenced by the efectiveness of the generative model. If the
generative component fails to reliably represent the essential characteristics of previous tasks, it
may lead to memory degradation or even incorrect transfer, thus afecting the learning stability
and performance of the entire system.</p>
        <p>
          At present, the research on replay-based methods mainly focuses on three aspects. It mainly
includes improving sample storage eficiency through some core sample selection strategies
[
          <xref ref-type="bibr" rid="ref34">34</xref>
          ], enhancing the quality of generative models by improving generative model architectures
such as difusion models and Transformer-based generators, and improving the robustness of
models by integrating other technologies such as meta-learning and self-supervision [
          <xref ref-type="bibr" rid="ref35">35</xref>
          ].
        </p>
        <p>In comparison to the first two methods, the replay-based method has a stronger ability to
resist forgetting, but it is largely influenced by how well the generative model performs. If the
generated samples are very diferent from the real data, the efect of alleviating catastrophic
forgetting will also decrease.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Key Challenges and Future Prospects</title>
      <p>Although there are many methods that are capable of easing the problem of catastrophic
forgetting to a certain extent, a range of key problems remains to be overcome when facing
complex situations in actual application scenarios.</p>
      <p>
        The stability-plasticity dilemma remains an essential challenge [
        <xref ref-type="bibr" rid="ref36">36</xref>
        ]. Traditional single
strategies often fail to maintain an optimal balance between stability and plasticity in complex
scenarios. Therefore, recent research has gradually shifted to hybrid strategies to address this
issue. For example, methods combining replay and regularization (such as DER++) not only
replay historical samples through a bufer but also use knowledge distillation to constrain the
output distribution of the current model to be consistent with that of the historical model on
the same input. This allows parameter updates to leverage both hard and soft label information,
reducing representation drift and enhancing stability without significantly compromising
plasticity. Strategies combining architecture expansion with meta-learning ensure model capacity
through dynamic network expansion or sub-network allocation, while leveraging initialization
priors or adaptive optimizers derived from meta-learning to rapidly adapt to new tasks while
minimizing interference with existing tasks. Future research should continue to explore new
hybrid strategies that, through multi-dimensional synergy, enable the system to better balance
memory retention and new knowledge learning in complex task flows.
      </p>
      <p>In practical applications, some advanced algorithms face heavy training requirements and
are dificult to deploy to edge devices, which makes the model run ineficiently. For instance, an
IoT-enabled smart doorbell tasked with real-time pedestrian detection and anomalous behavior
recognition must operate under limited computational and memory resources, which restricts
the use of large-scale models. Therefore, in the future, it is therefore necessary to investigate
eficient algorithms and models capable of running efectively under constrained computational
power and storage capacity. To meet the actual needs in edge computing environments.</p>
      <p>
        In the fields of medical and industrial fields, the system is required to maintain strong
performance using limited labeled data and to have the ability to protect user privacy. However,
too little labeled data will result in few supervisory signals available for incremental tasks.
This limits its efective update in the incremental learning process. Therefore, in the future,
breakthroughs are needed in incremental learning of a small number of sample categories
[
        <xref ref-type="bibr" rid="ref37">37</xref>
        ] and unsupervised continual learning [
        <xref ref-type="bibr" rid="ref38">38</xref>
        ]. In addition, combining privacy protection
mechanisms such as federated learning will also be an important research direction for achieving
scalable, secure, and eficient learning systems.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>This review focuses on catastrophic forgetting, a fundamental challenge in continual learning,
and provides a systematic analysis of recent advances in addressing this issue. The methods
under review are grouped into three distinct classes, namely dynamic architecture-based
methods, regularization-based methods, and replay-based methods. For each category, the paper
examines their theoretical foundations, representative techniques, and respective advantages
and limitations. Looking ahead, future research on continual learning should address pressing
challenges, including the stability-plasticity dilemma, computational and storage overhead,
limited labeled data, and privacy concerns. To this end, integrating multiple strategies,
developing eficient algorithms and model architectures, and exploring incremental and unsupervised
continual learning, particularly in low-data regimes, are crucial steps toward realizing truly
lifelong learning in artificial intelligence systems.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>The present work was funded by the Henan Province Key Research and Development Program
(Grants No. 241111312000), the Henan Province Key International Science and Technology
Cooperation Project (Grants No. 251111520400, 252102521009), the Henan Province Key Technologies
Research and Development Project (Grants No. 252102211106, 252102320281, 252102221054),
the Young Backbone Teacher Program of Zhongyuan University of Technology (Grants No.
2023XQG15).</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative Al</title>
      <p>The author(s) have not employed any Generative Al tools.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <article-title>A comprehensive survey of continual learning: Theory, method and application</article-title>
          ,
          <source>IEEE Trans. on Pattern Analysis and Machine Intelligence</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Meng</surname>
          </string-name>
          ,
          <article-title>Yolo-msd: a robust industrial surface defect detection model via multi-scale feature fusion</article-title>
          ,
          <source>Applied Intelligence</source>
          <volume>55</volume>
          (
          <year>2025</year>
          )
          <fpage>1</fpage>
          -
          <lpage>18</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Hassan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.-G.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>Forget to learn (f2l): Circumventing plasticity-stability trade-of in continuous unsupervised domain adaptation</article-title>
          ,
          <source>Pattern Recognition</source>
          <volume>159</volume>
          (
          <year>2025</year>
          )
          <fpage>111139</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Meng</surname>
          </string-name>
          ,
          <article-title>Hardware-aware approach to deep neural network optimization</article-title>
          ,
          <source>Neurocomputing</source>
          <volume>559</volume>
          (
          <year>2023</year>
          )
          <fpage>126808</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Meng</surname>
          </string-name>
          ,
          <article-title>Dataset purification-driven lightweight deep learning model construction for empty-dish recycling robot</article-title>
          ,
          <source>IEEE Transactions on Emerging Topics in Computational Intelligence</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Douillard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ramé</surname>
          </string-name>
          , G. Couairon,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cord</surname>
          </string-name>
          ,
          <article-title>Dytox: Transformers for continual learning with dynamic token expansion</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>9285</fpage>
          -
          <lpage>9295</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ashfahani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pratama</surname>
          </string-name>
          ,
          <article-title>Autonomous deep learning: Continual learning approach for dynamic environments</article-title>
          ,
          <source>in: Proceedings of the 2019 SIAM international conference on data mining, SIAM</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>666</fpage>
          -
          <lpage>674</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Hasan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Roy-Chowdhury</surname>
          </string-name>
          ,
          <article-title>A continuous learning framework for activity recognition using deep hybrid feature models</article-title>
          ,
          <source>IEEE Transactions on Multimedia</source>
          <volume>17</volume>
          (
          <year>2015</year>
          )
          <fpage>1909</fpage>
          -
          <lpage>1922</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>G. M. van de Ven</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Soures</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Kudithipudi</surname>
          </string-name>
          ,
          <article-title>Continual learning and catastrophic forgetting</article-title>
          ,
          <source>arXiv preprint arXiv:2403.05175</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.-W.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-H.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-W.</given-names>
            <surname>Ha</surname>
          </string-name>
          , B.-T. Zhang,
          <article-title>Overcoming catastrophic forgetting by incremental moment matching</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11] D. Cheng, Y. Hu,
          <string-name>
            <given-names>N.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <article-title>Achieving plasticity-stability trade-of in continual learning through adaptive orthogonal projection</article-title>
          ,
          <source>IEEE Transactions on Circuits and Systems for Video Technology</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>F.</given-names>
            <surname>Wiewel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Brendle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <article-title>Continual learning through one-class classification using vae</article-title>
          ,
          <source>in: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</source>
          , IEEE,
          <year>2020</year>
          , pp.
          <fpage>3307</fpage>
          -
          <lpage>3311</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Dohare</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. F.</given-names>
            <surname>Hernandez-Garcia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Lan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rahman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. R.</given-names>
            <surname>Mahmood</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. S.</given-names>
            <surname>Sutton</surname>
          </string-name>
          ,
          <article-title>Loss of plasticity in deep continual learning</article-title>
          ,
          <source>Nature</source>
          <volume>632</volume>
          (
          <year>2024</year>
          )
          <fpage>768</fpage>
          -
          <lpage>774</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Iman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Rasheed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. R.</given-names>
            <surname>Arabnia</surname>
          </string-name>
          ,
          <article-title>Expanse, a continual deep learning system; research proposal</article-title>
          , in: 2021
          <source>International Conference on Computational Science and Computational Intelligence (CSCI)</source>
          , IEEE,
          <year>2021</year>
          , pp.
          <fpage>190</fpage>
          -
          <lpage>192</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wakelin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. A.</given-names>
            <surname>Mohammedali</surname>
          </string-name>
          ,
          <article-title>An analysis of current continual learning algorithms in an image classification context</article-title>
          ,
          <source>in: 2022 6th International Symposium on Computer Science and Intelligent Control (ISCSIC)</source>
          , IEEE,
          <year>2022</year>
          , pp.
          <fpage>34</fpage>
          -
          <lpage>39</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Noci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Orvieto</surname>
          </string-name>
          , T. Hofmann,
          <article-title>Achieving a better stability-plasticity trade-of via auxiliary networks in continual learning</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>11930</fpage>
          -
          <lpage>11939</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>H.</given-names>
            <surname>Shin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <article-title>Continual learning with deep generative replay</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>X.</given-names>
            <surname>Yue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Meng</surname>
          </string-name>
          ,
          <article-title>An ultralightweight object detection network for empty-dish recycling robots</article-title>
          ,
          <source>IEEE Transactions on Instrumentation and Measurement</source>
          <volume>72</volume>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Du</surname>
          </string-name>
          , G. Cheng,
          <article-title>Class incremental website fingerprinting attack based on dynamic expansion architecture</article-title>
          ,
          <source>IEEE Transactions on Network and Service Management</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Rusu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. C.</given-names>
            <surname>Rabinowitz</surname>
          </string-name>
          , G. Desjardins,
          <string-name>
            <given-names>H.</given-names>
            <surname>Soyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kirkpatrick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kavukcuoglu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pascanu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hadsell</surname>
          </string-name>
          ,
          <article-title>Progressive neural networks</article-title>
          ,
          <source>arXiv preprint arXiv:1606.04671</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>T.</given-names>
            <surname>Moriya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Masumura</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Asami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shinohara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Delcroix</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yamaguchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Aono</surname>
          </string-name>
          ,
          <article-title>Progressive neural network-based knowledge transfer in acoustic models</article-title>
          , in: 2018
          <source>AsiaPacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)</source>
          , IEEE,
          <year>2018</year>
          , pp.
          <fpage>998</fpage>
          -
          <lpage>1002</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <article-title>Progressive neural network for multi-horizon time series forecasting</article-title>
          ,
          <source>Information Sciences 661</source>
          (
          <year>2024</year>
          )
          <fpage>120112</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Fritsche</surname>
          </string-name>
          ,
          <article-title>Regularization-based eficient continual learning in deep state-space models</article-title>
          ,
          <source>in: 2024 27th International Conference on Information Fusion (FUSION)</source>
          , IEEE,
          <year>2024</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>S.</given-names>
            <surname>Nokhwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kumar</surname>
          </string-name>
          , Rtra:
          <article-title>Rapid training of regularization-based approaches in continual learning</article-title>
          ,
          <source>in: 2023 10th International Conference on Soft Computing &amp; Machine Intelligence (ISCMI)</source>
          , IEEE,
          <year>2023</year>
          , pp.
          <fpage>188</fpage>
          -
          <lpage>192</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <article-title>A statistical theory of regularization-based continual learning</article-title>
          ,
          <source>arXiv preprint arXiv:2406.06213</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>H.</given-names>
            <surname>Tercan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Deibert</surname>
          </string-name>
          , T. Meisen,
          <article-title>Continual learning of neural networks for quality prediction in production using memory aware synapses and weight transfer</article-title>
          ,
          <source>Journal of Intelligent Manufacturing</source>
          <volume>33</volume>
          (
          <year>2022</year>
          )
          <fpage>283</fpage>
          -
          <lpage>292</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hassan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Rasheed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Qureshi</surname>
          </string-name>
          ,
          <article-title>A new regularization-based continual learning framework</article-title>
          ,
          <source>in: 2024 Horizons of Information Technology and Engineering (HITE)</source>
          , IEEE,
          <year>2024</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>J.</given-names>
            <surname>Gou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ramamohanarao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Tao</surname>
          </string-name>
          ,
          <article-title>Collaborative knowledge distillation via multiknowledge transfer</article-title>
          ,
          <source>IEEE Transactions on Neural Networks and Learning Systems</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>H.</given-names>
            <surname>Bai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. King</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lyu</surname>
          </string-name>
          ,
          <article-title>Few shot network compression via cross distillation</article-title>
          ,
          <source>in: Proceedings of the AAAI Conference on Artificial Intelligence</source>
          , volume
          <volume>34</volume>
          ,
          <year>2020</year>
          , pp.
          <fpage>3203</fpage>
          -
          <lpage>3210</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>S.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Shao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ji</surname>
          </string-name>
          ,
          <article-title>Distilling a powerful student model via online knowledge distillation</article-title>
          ,
          <source>IEEE transactions on neural networks and learning systems 34</source>
          (
          <year>2022</year>
          )
          <fpage>8743</fpage>
          -
          <lpage>8752</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xiang</surname>
          </string-name>
          ,
          <article-title>Prototype-guided memory replay for continual learning</article-title>
          ,
          <source>IEEE transactions on neural networks and learning systems</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>G.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Memory enhanced replay for continual learning</article-title>
          ,
          <source>in: 2022 16th IEEE International Conference on Signal Processing (ICSP)</source>
          , volume
          <volume>1</volume>
          , IEEE,
          <year>2022</year>
          , pp.
          <fpage>218</fpage>
          -
          <lpage>222</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>T. L.</given-names>
            <surname>Hayes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. D.</given-names>
            <surname>Cahill</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Kanan</surname>
          </string-name>
          ,
          <article-title>Memory eficient experience replay for streaming learning</article-title>
          ,
          <source>in: 2019 International Conference on Robotics and Automation (ICRA)</source>
          , IEEE,
          <year>2019</year>
          , pp.
          <fpage>9769</fpage>
          -
          <lpage>9776</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>H.</given-names>
            <surname>Hassani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Nikan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shami</surname>
          </string-name>
          ,
          <article-title>Improved exploration-exploitation trade-of through adaptive prioritized experience replay</article-title>
          ,
          <source>Neurocomputing</source>
          <volume>614</volume>
          (
          <year>2025</year>
          )
          <fpage>128836</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>T.</given-names>
            <surname>Hospedales</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Antoniou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Micaelli</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Storkey,
          <article-title>Meta-learning in neural networks: A survey</article-title>
          ,
          <source>IEEE transactions on pattern analysis and machine intelligence</source>
          <volume>44</volume>
          (
          <year>2021</year>
          )
          <fpage>5149</fpage>
          -
          <lpage>5169</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36] D. Cheng, Y. Hu,
          <string-name>
            <given-names>N.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <article-title>Achieving plasticity-stability trade-of in continual learning through adaptive orthogonal projection</article-title>
          ,
          <source>IEEE Transactions on Circuits and Systems for Video Technology</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , L. Liu,
          <string-name>
            <given-names>O.</given-names>
            <surname>Silvén</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pietikäinen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <article-title>Few-shot class-incremental learning for classification and object detection: A survey</article-title>
          ,
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>A</article-title>
          . Ma'sum, M. Pratama,
          <string-name>
            <given-names>R.</given-names>
            <surname>Savitha</surname>
          </string-name>
          , L. Liu,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kowalczyk</surname>
          </string-name>
          , et al.,
          <article-title>Unsupervised few-shot continual learning for remote sensing image scene classification</article-title>
          ,
          <source>IEEE Transactions on Geoscience and Remote Sensing</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>