1. Introduction

Addressing Catastrophic Forgetting and Beyond: Key Challenges in Continual Learning

Rui Teng

Aihui Wang

Hengyi Li

Jinkang Dong

Yao Yao

Xueying Hu

0 0 The School of Automation and Electrical Engineering, Zhongyuan University of Technology , 450007 Zhengzhou , China

Current artificial intelligence relies on a one-time training process based on a predefined data set, which remains static during the subsequent reasoning and operation stages. However, a true artificial intelligence system needs to demonstrate the ability to continual learning, that is, to dynamically adapt to the changing environment and new information and to continuously evolve. In the continual learning scenario, catastrophic forgetting is the core problem it encounters. Therefore, this paper first systematically investigates various methods to deal with catastrophic forgetting; secondly, it classifies various methods and deeply analyzes their theoretical basis, specific cases, advantages and disadvantages; ifnally, it proposes the key challenges and future development directions currently facing continual learning, laying a solid foundation for building an artificial intelligence system with adaptive and self-improvement capabilities.

eol>Artificial Intelligence Continual learning Catastrophic forgetting Stability -Plasticity Experience replay

1. Introduction

Artificial intelligence enables machines to simulate human intelligent behavior to perceive the environment, recognize information and make reasoning decisions [ 1 ]. As an important branch of artificial intelligence, deep learning enables automatic extraction of multi-level features directly from raw inputs by building and training multi-layer neural networks to achieve intelligent tasks such as pattern recognition, prediction and decision-making [ 2 ]. Deep learning has found extensive applications in natural language processing, image recognition, autonomous driving and other fields [ 3 ]. However, current deep learning usually performs one-time training in a static environment, which means that the model parameters are no longer updated and are unable to adapt to constantly changing dynamic scenarios [ 4 ]. In addition, model training demands extensive labeling of data samples, which makes its generalization ability for a small number of samples weaker [ 5 ]. To address these shortcomings, intelligent systems need to continuously acquire, update, accumulate and utilize knowledge during their life cycle. This ability is called continual learning [ 6 ].

The primary objective of continual learning is to design algorithms that are able to learn and adapt to continuous data streams [ 7 ]. However, when a model sequentially learns new tasks, it usually overwrites the parameters of previous tasks [ 8 ], leading to impaired performance on tasks learned earlier, a phenomenon often referred to as "catastrophic forgetting" [ 9 ]. This is mainly because in a multi-task environment, the same set of parameters needs to serve both new and previous tasks, resulting in conflicts between the optimal solutions for new and previous tasks when updating parameters [ 10 ].

To suppress catastrophic forgetting within the continual learning framework, Researchers have explored diferent tactics [ 11 ], mainly including dynamic architecture-based methods, regularization-based methods, and replay-based methods [ 12 ].

The dynamic architecture-based method is to separate the parameters of diferent tasks by expanding the model structure when faced with new tasks, to avoid parameter conflicts between the new and previous tasks [ 13 ]. As an illustration, Iman et al. proposed a continuous and progressive learning system for deep transfer learning - EXPANSE [ 14 ]. The regularization-based method prevents drastic parameter changes by adding penalty terms for important parameters of previous tasks in the loss function. For example, Wakelin et al. proposed an analysis of current continual learning algorithms when addressing the image classification problem [ 15 ]. Generative replay is to reconstruct the data of previous tasks by training generative models [ 16 ]. For example, Shin et al. proposed deep generative replay [ 17 ].

This paper systematically reviews methods for preventing catastrophic forgetting in continual learning, summarize the theoretical basis and specific cases of various methods, analyze their advantages and disadvantages, and finally discuss the future research direction of continual learning to provide practical references for promoting the stable application of artificial intelligence in dynamic environments.

The following sections are arranged as follows: Section II classifies the typical methods for solving the problem of catastrophic forgetting; Section III delivers an extensive analysis of the principal challenges and prospective research directions in continual learning; Section IV summarizes the main results of this paper.

2. Taxonomy of Methods

To counteract catastrophic forgetting within continual learning frameworks, this section systematically reviews the primary approaches proposed in relative literature, analyzes their theoretical foundations along with representative cases (Table 1), and evaluates the strengths and weaknesses of each approach.

2.1. Dynamic Architecture-Based Methods

The dynamic architecture–based method modifies the neural network’s structure to enable the model to adaptively learn new knowledge while ensuring that previously acquired knowledge remains intact, thereby alleviating the phenomenon of catastrophic forgetting [ 18 ]. This approach achieves its goal by designing networks on demand, assigning independent parameters to each task, introducing adaptive submodules, and dividing the model into shared and dedicated components [ 19 ].

Progressive Neural Network (PNN) is the most common method based on dynamic architecture expansion. PNN has a multi-column architecture, and each new task corresponds to a separate network branch. When learning a new task, each layer within the newly added column reuses features extracted by the previous column through the adapter lateral connection to achieve knowledge transfer [ 20 ]. PNN starts with one base column, assume a deep neural network with layers, hidden activations ℎ(1) ∈ , let denote the neuron count of layer ≤ . Its parameters (1) are trained to convergence. At the initiation of a new task, the previous-task parameters (1) are frozen and the new column’s parameters (2) are randomly initialized. The activation ℎ(2) in layer then takes input from its own previous layer in the column, ℎ(12) and the corresponding layer in the preceding column, ℎ(11) via lateral connections − − [ 21 ]. More generally, for the -th task, the activation in layer is given by, ℎ() = ( ()ℎ(1) + ∑︁ (:)ℎ1 ) − − < (1) where () ∈ × −1 is the weight matrix of the column of the -th layer, (:) ∈ × denote the lateral links connecting layer − 1 of column with layer of column , ℎ0 is the input of the network. Figure 1 is a schematic diagram of a three-column PNN. The two columns on the left represent the training of tasks 1 and 2. The third column is dedicated to the ifnal task, which can receive the features of all previously learned old task layers.

PNN has achieved continual learning capability with “zero forgetting” in multiple reinforcement learning tasks. For example, in the Atari game experiment, every time PNN learns a new game, it adds a network column and reuses the convolutional features and strategies of the existing pathways through lateral connections. This design not only helps to achieve cross-task knowledge transfer and sharing, but also efectively prevents information interference between diferent tasks, thereby maintaining a clear separation between tasks.

The advantage of PNN is that it is able to learn in an orderly manner during multi-task training, and it has flexible knowledge transfer capabilities to avoid forgetting previous knowledge [ 22 ]. This approach sufers from the drawback that parameter size expands proportionally with the growth in task number. This results in a significant increase in computing resources and storage requirements, which poses challenges to practical applications in environments with numerous tasks or limited resources.

2.2. Regularization-Based Methods

Through the incorporation of regularization terms into the loss function, regularization-based methods constrain alterations to key parameters, thus reducing forgetting [ 23 ]. This approach is classified into weight regularization and knowledge distillation.

Weight regularization aims to regularize the model parameters associated with the previous task diferently according to their significance [ 24 ]. Parameters deemed highly important are constrained during new task training to avoid significant changes, thereby preventing the forgetting of knowledge from earlier tasks. The methods for estimating parameter importance include Elastic Weight Consolidation (EWC), Synaptic Intelligence (SI), and Memory Aware Synapses (MAS) [ 25 ].

EWC uses the Fisher Information Matrix to quantify the relevance of network parameters to earlier tasks. The Matrix is defined as follows,

Here, denotes the parameter values obtained following training on the prior task, and E denotes the expectation over the data distribution. (|, ) represents the model’s output probability distribution.

Once the parameter importance is estimated, a weighted regularization term is embedded within the original loss function during new task training to restrict changes in crucial parameters. The EWC loss function is defined as follows, ℒ() = ℒ new() + ∑︁ 2 ( − *, ) 2 ℒnew() denotes the conventional loss used for training the new task, denotes the regularization strength hyperparameter, indicates the significance of parameter estimated via the Fisher Inform, *, signifies the -th parameter value after the prior task’s training phase, and denotes the current value of the -th parameter.

Throughout the training phase, SI dynamically evaluates the importance of parameters by evaluating the marginal impact of each parameter update to the loss reduction and integrating it along the training path. It then protects these important parameters through weighted regularization terms to reduce the forgetting of previous knowledge. The path integral is expressed by the following formula,

−1 Ω = ∑︁

, (4) =1 (∆ , )2 + where reflects each parameter’s significance to the loss function, calculated as the product, = ℒ(⃗) . ∆ = ( ) − (0) indicates the extent of parameter shift after T iterations of training on the -th task. is updated in each iteration, while ∆ is updated only after T iterations. represents a numerical stability term, which is used to prevent the denominator from being too small and causing a numerical explosion. It is generally set to = 0.01.

The loss function for SI is given as follows, [︃︂(

= E∼ log (|, ) (5) (6) ℒSI = ℒnew() + ∑︁(Ω ( − *, denotes the parameter values after training on the previous task, Ω is the importance weight of parameter . is a hyperparameter.

MAS evaluates parameter importance by analyzing the autocorrelation of feature activations [ 26 ]. We use parameter gradient variance to measure its sensitivity. Gradient variance serves as an indicator of a parameter’s importance, with larger values reflecting stronger influence on the model’s output. The calculation formula is as follows, ︂[ ⃒⃒ ‖‖ ⃒

2 ⃒ ]︂ Λ = E∼ ⃒⃒ ⃒⃒

E∼ represents the expectation of the preceding task dataset, ‖‖ refers to the output vector of the network for input in the final layer, represents the parameter values after training.

The loss function for MAS is given as follows, ℒMAS = ℒnew() + ∑︁(Λ ( − *, denotes the parameter values after training on the previous task, Λ is the importance weight of parameter . is a hyperparameter.

In addition to preventing previous knowledge from being covered by adding regularization terms to limit changes in important parameters, there is also a commonly used method called knowledge distillation. Knowledge distillation regularization is diferent from weight regularization. It transfers the constraint object from the parameter space to the output space, and pays more attention to whether the model preserves the output behavior consistency of earlier tasks as it learns new tasks. Knowledge distillation in general terms is a large neural network (teacher model) in the knowledge condensed and refined to the small neural network (student model), that is, to carry out the migration of knowledge, according to diferent transfer mechanisms, knowledge distillation is divided into two paradigms: target distillation and feature distillation [ 27 ].

Target distillation refers to directly let the student model to imitate the teacher model in the ifnal output layer of the prediction results, The commonly used loss is mainly Kullback -Leibler (KL) Divergence [ 28 ] or Cross Entropy [ 29 ].

When the KL divergence is employed to quantify the discrepancy between the output distributions of the teacher and student models, the loss can be written as, (7) (8) (9) (10) = ( ()||())

Where, () and () represent the probability distribution of the teacher model and the student model for input output respectively. In addition, the cross-entropy also serves directly as the distillation loss. The calculation formula is as follows, = − ∑︁ ()() =1 where, () and () refer to the prediction probability of the -th category after softmax of the input by the teacher and the student model respectively. is the total number of categories.

The output process of feature distillation is diferent from that of target distillation. It focuses more on the consistency of internal representation rather than aligning only at the output layer [ 30 ]. Its output process is usually based on Euclidean distance loss rather than KL divergence. The Euclidean distance loss formula is as follows,

= ||ˆ − || 2 where, ˆ denotes the logits produced by the prior model, indicates the new model’s logit outputs.

In practical applications, feature distillation is often combined with target distillation, which simultaneously optimizes output consistency and internal feature similarity. This approach is very suitable for model compression, network acceleration, and transfer learning scenarios.

Compared with dynamic architecture-based methods, regularization-based methods are not required to add new network columns when learning new tasks. They only need to impose constraints on important parameters of previous tasks in the loss function. Therefore, they have low computational overhead and simple implementation, but it is dificult to achieve "zero forgetting".

2.3. Replay-Based Methods

The replay-based method is also one of the typical methods to solve "catastrophic forgetting" problem. It saves a set of input-output pair samples into the memory module, and then incorporates these samples with data from the current task for model training [ 31 ]. The implementation methods of this method include experience replay and generative replay [ 32 ].

Experience replay stores a subset of past task samples in a replay bufer and interleaves these with new-task data during training, ensuring simultaneous learning of current and previous tasks to mitigate forgetting. This method uses real data and has high model stability [ 33 ]. Incremental Classifier and Representation Learning(iCaRL) is an incremental learning method based on experience replay. The core group process of iCARL includes three steps. First, classification is performed using the nearest-mean-of-exemplars (NME) rule. Second, exemplars are selected and prioritized with the herding algorithm. Third, representation learning integrates knowledge distillation with prototype rehearsal. This approach enables continuous learning without access to all historical data.

For generative replay, the initial stage consists of the training of a generative model for approximating the data distribution of the previous task, and then the generated samples are incorporated into the training set of the current task to maintain the memory of the previous distribution and alleviate forgetting. Since there is no need to store the original data directly, this method is very suitable in some scenarios where privacy needs to be guaranteed, but the stability of the model is heavily influenced by the efectiveness of the generative model. If the generative component fails to reliably represent the essential characteristics of previous tasks, it may lead to memory degradation or even incorrect transfer, thus afecting the learning stability and performance of the entire system.

At present, the research on replay-based methods mainly focuses on three aspects. It mainly includes improving sample storage eficiency through some core sample selection strategies [ 34 ], enhancing the quality of generative models by improving generative model architectures such as difusion models and Transformer-based generators, and improving the robustness of models by integrating other technologies such as meta-learning and self-supervision [ 35 ].

In comparison to the first two methods, the replay-based method has a stronger ability to resist forgetting, but it is largely influenced by how well the generative model performs. If the generated samples are very diferent from the real data, the efect of alleviating catastrophic forgetting will also decrease.

3. Key Challenges and Future Prospects

Although there are many methods that are capable of easing the problem of catastrophic forgetting to a certain extent, a range of key problems remains to be overcome when facing complex situations in actual application scenarios.

The stability-plasticity dilemma remains an essential challenge [ 36 ]. Traditional single strategies often fail to maintain an optimal balance between stability and plasticity in complex scenarios. Therefore, recent research has gradually shifted to hybrid strategies to address this issue. For example, methods combining replay and regularization (such as DER++) not only replay historical samples through a bufer but also use knowledge distillation to constrain the output distribution of the current model to be consistent with that of the historical model on the same input. This allows parameter updates to leverage both hard and soft label information, reducing representation drift and enhancing stability without significantly compromising plasticity. Strategies combining architecture expansion with meta-learning ensure model capacity through dynamic network expansion or sub-network allocation, while leveraging initialization priors or adaptive optimizers derived from meta-learning to rapidly adapt to new tasks while minimizing interference with existing tasks. Future research should continue to explore new hybrid strategies that, through multi-dimensional synergy, enable the system to better balance memory retention and new knowledge learning in complex task flows.

In practical applications, some advanced algorithms face heavy training requirements and are dificult to deploy to edge devices, which makes the model run ineficiently. For instance, an IoT-enabled smart doorbell tasked with real-time pedestrian detection and anomalous behavior recognition must operate under limited computational and memory resources, which restricts the use of large-scale models. Therefore, in the future, it is therefore necessary to investigate eficient algorithms and models capable of running efectively under constrained computational power and storage capacity. To meet the actual needs in edge computing environments.

In the fields of medical and industrial fields, the system is required to maintain strong performance using limited labeled data and to have the ability to protect user privacy. However, too little labeled data will result in few supervisory signals available for incremental tasks. This limits its efective update in the incremental learning process. Therefore, in the future, breakthroughs are needed in incremental learning of a small number of sample categories [ 37 ] and unsupervised continual learning [ 38 ]. In addition, combining privacy protection mechanisms such as federated learning will also be an important research direction for achieving scalable, secure, and eficient learning systems.

4. Conclusion

This review focuses on catastrophic forgetting, a fundamental challenge in continual learning, and provides a systematic analysis of recent advances in addressing this issue. The methods under review are grouped into three distinct classes, namely dynamic architecture-based methods, regularization-based methods, and replay-based methods. For each category, the paper examines their theoretical foundations, representative techniques, and respective advantages and limitations. Looking ahead, future research on continual learning should address pressing challenges, including the stability-plasticity dilemma, computational and storage overhead, limited labeled data, and privacy concerns. To this end, integrating multiple strategies, developing eficient algorithms and model architectures, and exploring incremental and unsupervised continual learning, particularly in low-data regimes, are crucial steps toward realizing truly lifelong learning in artificial intelligence systems.

Acknowledgments

The present work was funded by the Henan Province Key Research and Development Program (Grants No. 241111312000), the Henan Province Key International Science and Technology Cooperation Project (Grants No. 251111520400, 252102521009), the Henan Province Key Technologies Research and Development Project (Grants No. 252102211106, 252102320281, 252102221054), the Young Backbone Teacher Program of Zhongyuan University of Technology (Grants No. 2023XQG15).

Declaration on Generative Al

The author(s) have not employed any Generative Al tools.

[1]

Wang ,

Zhang ,

Su ,

Zhu , A comprehensive survey of continual learning: Theory, method and application , IEEE Trans. on Pattern Analysis and Machine Intelligence ( 2024 ).

[2]

Ge ,

Li ,

Meng , Yolo-msd: a robust industrial surface defect detection model via multi-scale feature fusion , Applied Intelligence 55 ( 2025 ) 1 - 18 .

[3]

M. A.

Hassan ,

C.-G.

Lee , Forget to learn (f2l): Circumventing plasticity-stability trade-of in continuous unsupervised domain adaptation , Pattern Recognition 159 ( 2025 ) 111139 .

[4]

Li ,

Meng , Hardware-aware approach to deep neural network optimization , Neurocomputing 559 ( 2023 ) 126808 .

[5]

Ge ,

Li ,

Yue ,

Li ,

Meng , Dataset purification-driven lightweight deep learning model construction for empty-dish recycling robot , IEEE Transactions on Emerging Topics in Computational Intelligence ( 2025 ).

[6]

Douillard ,

Ramé , G. Couairon,

Cord , Dytox: Transformers for continual learning with dynamic token expansion , in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , 2022 , pp. 9285 - 9295 .

[7]

Ashfahani ,

Pratama , Autonomous deep learning: Continual learning approach for dynamic environments , in: Proceedings of the 2019 SIAM international conference on data mining, SIAM , 2019 , pp. 666 - 674 .

[8]

Hasan ,

A. K.

Roy-Chowdhury , A continuous learning framework for activity recognition using deep hybrid feature models , IEEE Transactions on Multimedia 17 ( 2015 ) 1909 - 1922 .

[9] G. M. van de Ven , N.

Soures , D.

Kudithipudi , Continual learning and catastrophic forgetting , arXiv preprint arXiv:2403.05175 ( 2024 ).

[10]

S.-W.

Lee ,

J.-H.

Kim ,

Jun ,

J.-W.

Ha , B.-T. Zhang, Overcoming catastrophic forgetting by incremental moment matching , Advances in neural information processing systems 30 ( 2017 ).

[11] D. Cheng, Y. Hu,

Wang ,

Zhang ,

Gao , Achieving plasticity-stability trade-of in continual learning through adaptive orthogonal projection , IEEE Transactions on Circuits and Systems for Video Technology ( 2025 ).

[12]

Wiewel ,

Brendle ,

Yang , Continual learning through one-class classification using vae , in: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , IEEE, 2020 , pp. 3307 - 3311 .

[13]

Dohare ,

J. F.

Hernandez-Garcia ,

Lan ,

Rahman ,

A. R.

Mahmood ,

R. S.

Sutton , Loss of plasticity in deep continual learning , Nature 632 ( 2024 ) 768 - 774 .

[14]

Iman ,

Rasheed ,

H. R.

Arabnia , Expanse, a continual deep learning system; research proposal , in: 2021 International Conference on Computational Science and Computational Intelligence (CSCI) , IEEE, 2021 , pp. 190 - 192 .

[15]

Wakelin ,

N. A.

Mohammedali , An analysis of current continual learning algorithms in an image classification context , in: 2022 6th International Symposium on Computer Science and Intelligent Control (ISCSIC) , IEEE, 2022 , pp. 34 - 39 .

[16]

Kim ,

Noci ,

Orvieto , T. Hofmann, Achieving a better stability-plasticity trade-of via auxiliary networks in continual learning , in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2023 , pp. 11930 - 11939 .

[17]

Shin ,

J. K.

Lee ,

Kim ,

Kim , Continual learning with deep generative replay , Advances in neural information processing systems 30 ( 2017 ).

[18]

Yue ,

Li ,

Meng , An ultralightweight object detection network for empty-dish recycling robots , IEEE Transactions on Instrumentation and Measurement 72 ( 2023 ) 1 - 12 .

[19]

Yuan ,

Du , G. Cheng, Class incremental website fingerprinting attack based on dynamic expansion architecture , IEEE Transactions on Network and Service Management ( 2025 ).

[20]

A. A.

Rusu ,

N. C.

Rabinowitz , G. Desjardins,

Soyer ,

Kirkpatrick ,

Kavukcuoglu ,

Pascanu ,

Hadsell , Progressive neural networks , arXiv preprint arXiv:1606.04671 ( 2016 ).

[21]

Moriya ,

Masumura ,

Asami ,

Shinohara ,

Delcroix ,

Yamaguchi ,

Aono , Progressive neural network-based knowledge transfer in acoustic models , in: 2018 AsiaPacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) , IEEE, 2018 , pp. 998 - 1002 .

[22]

Lin , Progressive neural network for multi-horizon time series forecasting , Information Sciences 661 ( 2024 ) 120112 .

[23]

Zhang ,

Lin ,

Sun ,

Yin ,

Fritsche , Regularization-based eficient continual learning in deep state-space models , in: 2024 27th International Conference on Information Fusion (FUSION) , IEEE, 2024 , pp. 1 - 8 .

[24]

Nokhwal ,

Kumar , Rtra: Rapid training of regularization-based approaches in continual learning , in: 2023 10th International Conference on Soft Computing & Machine Intelligence (ISCMI) , IEEE, 2023 , pp. 188 - 192 .

[25]

Zhao ,

Wang ,

Huang ,

Lin , A statistical theory of regularization-based continual learning , arXiv preprint arXiv:2406.06213 ( 2024 ).

[26]

Tercan ,

Deibert , T. Meisen, Continual learning of neural networks for quality prediction in production using memory aware synapses and weight transfer , Journal of Intelligent Manufacturing 33 ( 2022 ) 283 - 292 .

[27]

Hassan ,

Rasheed ,

M. A.

Qureshi , A new regularization-based continual learning framework , in: 2024 Horizons of Information Technology and Engineering (HITE) , IEEE, 2024 , pp. 1 - 5 .

[28]

Gou ,

Sun ,

Yu ,

Du ,

Ramamohanarao ,

Tao , Collaborative knowledge distillation via multiknowledge transfer , IEEE Transactions on Neural Networks and Learning Systems ( 2022 ).

[29]

Bai ,

Wu , I. King ,

Lyu , Few shot network compression via cross distillation , in: Proceedings of the AAAI Conference on Artificial Intelligence , volume 34 , 2020 , pp. 3203 - 3210 .

[30]

Li ,

Lin ,

Wang ,

Wu ,

Tian ,

Shao ,

Ji , Distilling a powerful student model via online knowledge distillation , IEEE transactions on neural networks and learning systems 34 ( 2022 ) 8743 - 8752 .

[31]

Ho ,

Liu ,

Du ,

Gao ,

Xiang , Prototype-guided memory replay for continual learning , IEEE transactions on neural networks and learning systems ( 2023 ).

[32]

Xu ,

Guo ,

Wang , Memory enhanced replay for continual learning , in: 2022 16th IEEE International Conference on Signal Processing (ICSP) , volume 1 , IEEE, 2022 , pp. 218 - 222 .

[33]

T. L.

Hayes ,

N. D.

Cahill ,

Kanan , Memory eficient experience replay for streaming learning , in: 2019 International Conference on Robotics and Automation (ICRA) , IEEE, 2019 , pp. 9769 - 9776 .

[34]

Hassani ,

Nikan ,

Shami , Improved exploration-exploitation trade-of through adaptive prioritized experience replay , Neurocomputing 614 ( 2025 ) 128836 .

[35]

Hospedales ,

Antoniou ,

Micaelli , A . Storkey, Meta-learning in neural networks: A survey , IEEE transactions on pattern analysis and machine intelligence 44 ( 2021 ) 5149 - 5169 .

[36] D. Cheng, Y. Hu,

Wang ,

Zhang ,

Gao , Achieving plasticity-stability trade-of in continual learning through adaptive orthogonal projection , IEEE Transactions on Circuits and Systems for Video Technology ( 2025 ).

[37]

Zhang , L. Liu,

Silvén ,

Pietikäinen ,

Hu , Few-shot class-incremental learning for classification and object detection: A survey , IEEE Transactions on Pattern Analysis and Machine Intelligence ( 2025 ).

[38] M. A . Ma'sum, M. Pratama,

Savitha , L. Liu,

Kowalczyk , et al., Unsupervised few-shot continual learning for remote sensing image scene classification , IEEE Transactions on Geoscience and Remote Sensing ( 2024 ).