=Paper=
{{Paper
|id=Vol-3302/paper23
|storemode=property
|title=A novel Deep-Learning model for Human Activity Recognition based on Continuous Wavelet Transform
|pdfUrl=https://ceur-ws.org/Vol-3302/paper14.pdf
|volume=Vol-3302
|authors=Olena Pavliuk,Myroslav Mishchuk
|dblpUrl=https://dblp.org/rec/conf/iddm/PavliukM22
}}
==A novel Deep-Learning model for Human Activity Recognition based on Continuous Wavelet Transform==
<pdf width="1500px">https://ceur-ws.org/Vol-3302/paper14.pdf</pdf>
<pre>
A novel Deep-Learning model for Human Activity Recognition
based on Continuous Wavelet Transform
Olena Pavliuka,b, Myroslav Mishchukb
a
    Silesian University of Technology, ul. Akademicka 2A, 44-100 Gliwice, Poland
b
    Lviv Polytechnic National University, Stepana Bandery St, 12, Lviv, 79000, Ukraine


                 Abstract
                 Human Activity Recognition (HAR) has recently become in the spotlight of scientific
                 research due to the development and proliferation of wearable sensors. HAR has found
                 applications in such areas as digital health, mobile medicine, sports, abnormal activity
                 detection and fall prevention. Neural Networks have recently become a widespread method
                 for dealing with HAR problems due to their ability automatically extract and select features
                 from the raw sensor data. However, this approach requires extensive training datasets to
                 perform sufficiently under diverse circumstances. This study proposes a novel Deep Learning
                 - based model, pre-trained on the KU-HAR dataset. The raw, six-channel sensor data was
                 preliminarily processed using the Continuous Wavelet Transform (CWT) for better
                 performance. Nine popular Convolutional Neural Network (CNN) architectures, as well as
                 different wavelets and scale values, were tested to choose the best-performing combination.
                 The proposed model was tested on the whole UCI-HAPT dataset and its subset to assess how
                 it performs on new activities and different amounts of training data. The results show that
                 using the pre-trained model, especially with frozen layers, leads to improved performance,
                 smoother gradient descent and faster training on small datasets. Additionally, the model
                 performed on the KU-HAR dataset with a classification accuracy of 97.48% and F1-score of
                 97.52%, which is a competitive performance compared to other state-of-the-art HAR models.

                 Keywords 1
                 Human activity recognition, biomedical signal processing, transfer learning, continuous
                 wavelet transform, convolutional neural network

1. Introduction
   Human Activity Recognition (HAR) is of particular interest due to its growing role in such areas
as health care (especially for the elderly and patients with limited mobility), sports, abnormal activity
recognition, fall prevention, epileptic seizure detection and military training [1, 2, 3]. HAR has
become especially relevant with the spread of intelligent clothing items, such as smartwatches, fitness
bracelets and smartphones, which usually contain diverse built-in sensors and auxiliary devices
(accelerometers, gyroscopes, magnetometers, GPS sensors, cameras, microphones) [4]. By combining
the computing power of these devices and their ability to interact with the outside world, HAR opens
up a vast field for applications, for example, real-time human activity monitoring, physical training
evaluation, burn calories counting, detecting falls and recognizing anomalous activity for people with
neurological disorders.
   Usually, the construction of HAR models based on wearable sensors includes the following steps:
signal pre-processing and noise removal, feature extraction and feature selection to obtain
representative characteristics, and further use of sophisticated Machine Learning (ML) algorithms for
activity classification [5, 6]. In many publications, the approach with manual feature extraction and

IDDM-2022: 5th International Conference on Informatics & Data-Driven Medicine, November 18–20, 2022, Lyon, France
EMAIL: olena.m.pavliuk@lpnu.ua (A. 1); myroslav.mishchuk.knm.2019@lpnu.ua (A. 2);
ORCID: 0000-0003-4561-3874 (A. 1); 0000-0001-8723-2514 (A. 2);
            ©️ 2022 Copyright for this paper by its authors.
            Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
            CEUR Workshop Proceedings (CEUR-WS.org)
selection has shown promising results [7, 8, 9, 10], although this approach has certain drawbacks. The
major limitation is that statistical signal characteristics (i.e., shallow features) are often insufficient to
recognize complex, multi-step activities and transitory states, due to which models based on the
classical approach are often rather complex. In addition, the usage of this approach requires high
qualifications of the researcher and an individual approach for each dataset.
   A promising approach to solving HAR problems is Convolutional Neural Networks (CNNs). Due
to the possibility of automatic construction and selection of features, CNNs can obtain high-level
characteristics and often show better results than classical methods [11, 12, 13, 14]. Despite the
proven performance of CNNs, they require extensive training datasets to produce adequate results on
new data. Otherwise, this method is prone to underfitting and overfitting problems. There are several
techniques to mitigate this problem, such as regularization and data augmentation [15, 16]. Another
widely-used approach is Transfer Learning (TL), where a model trained on one (source) dataset is
fine-tuned on another (target) dataset [17].
   While there are many models pre-trained on data for visual object recognition [18], this work
focuses on developing a pre-trained model specifically for the HAR classification problem. The
resulting model will make it possible to train deep CNN on relatively small HAR target datasets,
transferring the knowledge obtained from the more extensive and general source dataset.

2. Employed datasets and Prior works
   In this work, we used two time-domain HAR datasets, one of which is relatively new. The Khulna
University Human Activity Recognition dataset (KU-HAR) [19][30] was chosen for pre-training, and
the University of California Irvine Human Activities and Postural Transitions dataset (UCI-HAPT)
[8][31] for fine-tuning and testing. This section provides brief information on these datasets and the
main works where they were used.

2.1.    KU-HAR Dataset
    For model pre-training, the KU-HAR dataset was used [19]. It was published in 2021 and contains
20,750 non-overlapping time-domain subsamples belonging to 18 classes (stand, sit, talk-sit, talk-
stand, stand-sit, lay, lay-stand, pick, jump, push-up, sit-up, walk, walk-backward, walk-circle, run,
stair-up, stair-down and table-tennis).
    Each subsample has a duration of 3 seconds and consists of six channels. The data was collected
from 90 participants aged 18 to 34 using a smartphone attached to the waist. The recorded signals
contain raw data from a triaxial accelerometer and a triaxial gyroscope with a sampling frequency of
100 Hz. Gravitational acceleration was discarded during data acquisition, and no denoising and
filtering operations were performed. It can be claimed that KU-HAR is a realistic (representational)
dataset as it is an unbalanced dataset with no denoising operation performed and with no overlapping
between samples.
    In [19] and [20], the authors used manual feature extraction and feature selection methods and the
Random Forest classifier, which resulted in the classification accuracy of 89.67% and 89.5% on the
KU-HAR dataset, respectively. Authors of [7] used Wavelet Packet Transform and Genetic
Algorithm, with the subsequent use of tree-based classifiers, which resulted in maximal accuracy of
94.76%. In [13], using a sequential CNN model and transforming raw signals into circulant matrices,
the authors achieved a classification accuracy of 96.67% for this dataset.

2.2.    UCI-HAPT Dataset
   The UCI-HAPT dataset [8] and its subset were used as a benchmark for the proposed pre-trained
model. This dataset was published in 2014 and is an extended version of the University of California
Irvine Human Activity Recognition dataset (UCI-HAR) [21], supplemented with postural transitions.
The UCI-HAPT contains tri-axial accelerometer and gyroscope signals collected with a waist-
mounted smartphone with a sampling frequency of 50 Hz. The data was collected from thirty
volunteers aged from 19 to 48 years. It contains sensor data for 12 classes (walking, walking upstairs,
walking downstairs, sitting, standing, laying, stand-to-sit, sit-to-stand, sit-to-lie, lie-to-sit, stand-to-lie,
and lie-to-stand), 6 of which do not belong to the KU-HAR dataset. The signals were pre-processed
for noise removal using median and low-pass Butterworth filters.
   Authors of [22, 23, 24] achieved promising classification results for the UCI-HAPT dataset.
However, in this work, we did not use the proposed frequency domain variables but extracted samples
from the raw sensor readings. It was done due to the requirement of Transfer Learning that target and
source samples should have the same shape.

3. Methodology
    In this work, we kept to the following workflow: first, the row signals samples were pre-processed
using the CWT with different wavelet and scale values. After that, nine popular CNN models were
trained on the generated scalograms, and the best-performing combination of the model architecture
and CWT parameters was selected. The chosen, pre-trained model was then fine-tuned on the target
datasets, and the results were compared with the non-pre-trained one. Detailed descriptions of the
carried-out steps are provided in the subsections below.

3.1.    Continuous Wavelet Transform
   Continuous Wavelet Transform (CWT) of a function                 is a mathematical operation defined by
the following expression:
                                                                        ,                                 (1)
where       is a continuous function in the frequency and time domain called a mother wavelet, is
the scale value,                , and is the translational value,     . The overline represents
the operation of the complex conjugate.
   Results of the CWT can be represented as a heat map (i.e. scalogram) with the -values set along
the -axis, -values set along the -axis, and the intensity of each point determined by (1). An
example of a transformed accelerometer -axis signal from the KU-HAR dataset is illustrated in
Figure 1.


Figure 1: A transformed accelerometer -axis signal (scalogram). Morlet wavelet,
   The Wavelet Transform (WT) has certain advantages over the commonly-used Fourier transform.
First, it provides a better representation of functions with sharp breaks and peaks, which often are
important signal characteristics in HAR problems. Secondly, it has the ability to obtain both temporal
and local spectral information, overcoming the problem of the non-stationary nature of signals.
Therefore, it is more efficient to replace the Fourier-related transforms with the WT, which is a
powerful tool for frequency and time domain feature extraction.
   Authors of [25, 26, 27, 28] achieved promising performance using CWT together with CNNs for
time-series classification problems. Additionally, it is considered that CNNs with 2-D convolutional
layers, in general, produce better results than the same neural networks with 1-D convolutional layers
when classifying signals from wearable sensors.
   In this paper, we have used CWT-generated scalograms to improve the accuracy of the models and
mitigate the overfitting and underfitting problems that often occur during fine-tuning the pre-trained
models. The scalograms were generated using the Mexican Hat and Morlet wavelets with the values
from 0 to 32, 64, 128 and 256. Thus, the models' performance was tested on eight different CWT
configurations.

3.2.     Model selection
   In order to select a CNN model nine popular architectures were tested, namely ResNet50,
ResNet101, ResNet152, Xception, InceptionV3, InceptionResNetV2, DenseNet121, DenseNet169
and DenseNet201. All mentioned models were trained on all scalogram configurations, which resulted
in 72 possible combinations. Each input sample consisted of 6 scalograms (a scalogram for each
signal channel). It is important to note that Xception, InceptionV3, and InceptionResNetV2
architectures have restrictions on the input data shape, so the scalograms with values less than 128
could not be used. Hence, the total amount of tested combinations is 60.
   Each combination was attempted five times to avoid sub-optimal local minima, which resulted in
300 models being trained. The criteria for the model selection was classification accuracy.

3.3.     Model testing
   The selected pre-trained model was tested on scalograms generated from the pre-processed UCI-
HAPT dataset. This dataset contains sensor reading for 12 activities, 6 of which are entirely new for
the pre-trained model (i.e. the source dataset does not include instances of these classes). The CWT
parameters for the target dataset were chosen to be the same as for the selected model.

3.3.1. Target dataset pre-processing
   To perform fine-tuning, the target dataset was pre-processed to have the same sample shape as the
source dataset. Firstly, the sampling frequency of the raw sensor readings was doubled from 50 Hz to
100 Hz. It was done by insertion of the average value between two adjacent readings. After that,
samples from the pre-processed data were extracted using the non-overlapping windowing technique
with a 3-second sample duration, resulting in 4847 six-channel samples in the target dataset.
   We used the whole UCI-HAPT dataset and its subset to determine how the pre-trained model
performs on different target dataset sizes. The subset contains 30% of randomly selected samples
from the original dataset, which stands for 1652 samples. Figure 2 illustrates the datasets’
distributions.
   As can be noticed from Figure 2, both the UCI-HAPT dataset and its subsets are imbalanced.
Moreover, considering that the signals in the UCI-HAPT were pre-processed using median and low-
pass Butterworth filters and that the frequency of signals was artificially adjusted, it can be claimed
that source and target datasets have substantial distinctions in signal representation. It implies that if
the selected model performs a Positive Transfer on the UCI-HAPT dataset, it would be reasonable to
use the proposed pre-trained model for other signal data with similar distinctions, which would be
helpful, for example, for Cross-Position Activity Recognition (CPAR).
Figure 2: Class sample ratios of the pre-processed UCI-HAPT dataset (outer circle) and its subset
(inner circle).

3.3.2. Transfer learning
    To perform Transfer learning and assess the selected model, the top fully connected layer of the
pre-trained model was removed and replaced with a new one with weights set using the Xavier
uniform initialization. The number of neurons in the new fully connected layer corresponds to the
number of classes in the target dataset (in our case, 12). The model performance was tested with
different numbers of frozen layers.
    Layer freezing (i.e. making some layers non-trainable) is a technique widely used in Transfer
Learning. The number of layers to freeze is usually chosen according to the similarity between the
source and target datasets. If the datasets are similar, it may be sufficient to freeze all layers except the
fully connected top layer of the network. The more diverse the datasets are, the more layers of the pre-
trained network need to be trainable during fine-tuning.
    The numbers of frozen layers were chosen to correspond to the architecture of the selected model,
which is DenseNet121 (discussed in a later section). The DenseNet121 comprises a conv block, four
dense blocks and a fully connected layer. Hence, the following configurations were considered: only
the top fully connected layer is trainable, the first 308 layers are frozen (conv, dense 1, 2, 3 blocks
frozen), first 136 layers are frozen (conv, dense 1, 2 blocks frozen), and all layers are trainable. The
described approach is illustrated in Figure 3.


Figure 3: Transfer learning using the selected pre-trained model.
4. Results and discussion
   In this section, the experimental results obtained using the methods represented in the previous
section are described.
   First, we describe the selected combination of CNN architecture and CWT parameters that
performed the best on the KU-HAR source dataset. Second, the selected pre-trained model is assessed
on the scalograms created from the pre-processed UCI-HAPT dataset and its subset. The model
performance with different amounts of frozen (i.e. non-trainable) layers were tested to estimate how
their presence affects the results for target datasets of various sizes.
   We used randomly selected 70% of the datasets for training and 30% for testing. For validation,
10% of the training sets were used. It has been experimentally determined that 100 epochs are
sufficient for training most models, although this number could have been increased to 120 if
underfitting was observable. Adam was chosen as the optimizer; the learning intensity was set to
0.001. Loss function - categorical crossentropy.

4.1.    Selected pre-trained model
    We tested nine popular architectures, namely ResNet50, ResNet101, ResNet152, Xception,
InceptionV3, InceptionResNetV2, DenseNet121, DenseNet169 and DenseNet201. Each of the 60
combinations was attempted five times to avoid sub-optimal local minima, which resulted in 300
models being trained. The criteria for the model selection was classification accuracy.
    It was found that the best results were produced by the model with DenseNet121 architecture and
the following CWT parameter values: Morlet wavelet and scale from 0 to 256. This combination
resulted in significant classification accuracy of 97.48% and F1-score of 97.52% on the KU-HAR
source dataset. The confusion matrix of the conducted classification using the selected model is
illustrated in Figure 4.


Figure 4: Confusion matrix of the classification of the KU-HAR dataset using the selected model.
    As could be noticed from Figure 4, the agglomeration of classification errors is a square with
classes stand, sit and talk-sit, which are all static activities. These activities are challenging to
differentiate, and they can be considered separately to improve performance, which is a promising
direction for further research.
    The F1-score value, which is not affected by the dataset imbalance, is reasonably close to the
classification accuracy value, implying the reliability of the classification performance of the selected
model.
    Considering the fact that the KU-HAR is an unbalanced dataset with no denoising operation
performed (i.e. a realistic dataset), we find the performance of the proposed pre-trained model rather
promising. Table 1 provides a comparison of the results achieved on the KU-HAR dataset in recent
works.

Table 1
Comparison of the results achieved on the KU-HAR dataset in recent works
            Study                          Accuracy (%)                          F1-score (%)
             [20]                              89.5                                 80.67
             [19]                             89.67                                 87.59
             [29]                               -                                   94.25
              [7]                             94.76                                 94.73
             [13]                             96.67                                 96.41
          Proposed                            97.48                                 97.52

   As could be seen from Table 1, the proposed model outperforms most state-of-art works, where
the KU-HAR dataset was used, which indicates the effectiveness of the selected model, as well as the
potency of the approach with using CWT and CNN for the HAR classification problems.

4.2.    Performance on the target dataset
   We used the whole UCI-HAPT dataset and its subset to determine how the pre-trained model
performs on different target dataset sizes. The subset contains 30% of randomly selected samples
from the original dataset. The model performance was tested with different amounts of frozen (i.e.
non-trainable) layers. The following configurations were considered: only the top layer is trainable,
the first 308 layers are frozen, the first 136 layers are frozen, and all layers are trainable. Table 2
compares the best results achieved on the pre-trained and non-pre-trained models.

Table 2
Comparison of performance of pre-trained and non-pre-trained models on the target datasets
                                              UCI-HAPT                   UCI-HAPT subset
                Model
                                       Accuracy (%)      F1(%)     Accuracy (%)        F1(%)
 Not pre-trained DenseNet121              92.23          92.19         86.29           86.38
 Pre-trained DenseNet121, only top            80.00           77.99           75.60             64.08
 layer trainable
 Pre-trained DenseNet121, 308 layers          92.44           92.52           86.90             87.11
 frozen
 Pre-trained DenseNet121, 136 layers          92.23           92.24           89.11             89.27
 frozen

 Pre-trained DenseNet121, all layers          91.89           91.92           88.31             88.26
 trainable
    As could be seen from Table 2, the use of pre-trained models led to better results both on the
whole dataset and on its subset.
    Concerning the whole UCI-HAPT dataset, the pre-trained model with 308 frozen layers showed
the best results, with an increase of 0.21% for accuracy and 0.33% for the F1-score compared to the
non-pre-trained one. However, for other configurations, pre-training resulted in a performance
decrease. This fact indicates that the pre-processed UCI-HAPT dataset size is big enough that pre-
training may degrade the model's performance, i.e. result in a Negative Transfer.
    As for the UCI-HAPT subset, all pre-trained models except one with the only trainable top
performed better than the non-pre-trained model. The best results were achieved by the model with
136 frozen layers, increasing accuracy by 2.82% and F1-score by 2.89%. Moreover, as shown in
Figure 5, pre-training the models resulted in smoother gradient descent and faster learning.


Figure 5: Accuracy and loss values of the non-pre-trained DenseNet121 model and the pre-trained
DenseNet121 model with 136 layers frozen during the training on the UCI-HAPT subset.

   Summarizing the information obtained, it can be stated that using a pre-trained model, especially
with frozen layers, leads to improved performance, smoother gradient descent and faster training on
small datasets. However, using a pre-trained model on medium- and large-sized datasets may result in
Negative Transfer and degraded performance.

5. Conclusion
   In this study, we propose a novel deep-learning model pre-trained on the scalograms generated
from the KU-HAR dataset. The nine popular deep-learning architectures and eight CWT
configurations were tested, which resulted in 60 possible combinations and 300 models being trained.
   It was established that the best results were produced by the model with DenseNet121 architecture,
Morlet wavelet and scale value from 0 to 256, which resulted in classification accuracy of 97.48% and
F1-score of 97.52% on the KU-HAR dataset, which outperforms most state-of-art works, where this
dataset was used.
   The proposed pre-trained model was then tested on the pre-processed UCI-HAPT dataset and its
subset to determine how the pre-trained model performs on target datasets of different sizes and with
some significant distinctions from the source dataset. Usage of the proposed model led to the
maximum increase of 0.21% for accuracy and 0.33% for F1-score on the whole UCI-HAPT dataset,
and of 2.82% for accuracy and 2.89% for F1-score on the subset, compared to the non-pre-trained
models.
   It was concluded that using the pre-trained model, especially with frozen layers, leads to improved
performance, smoother gradient descent and faster training on small datasets. However, using the
proposed model on medium- and large-sized datasets may result in Negative Transfer and degrade the
performance.
   In the subsequent studies, it is planned to assess more combinations of the neural network
architectures and CWT parameters, with further analysis of the influence of the scale values (i.e.
scalogram sizes) on models’ performance. Moreover, promising is the design and analysis of
heterogeneous pre-trained models, for example, with the usage of the Long Short-Term Memory
(LSTM) or Gated Recurrent Unit (GRU) layers, as well as developing pre-training models using
combined datasets using Inter-Domain Activities Analysis.

6. References
[1] A. Subasi, M. Radhwan, R. Kurdi, K. Khateeb, IOT based mobile healthcare system for human
     activity recognition, 2018 15th Learning and Technology Conference (L&T) (2018) 29-34.
     doi:10.1109/lt.2018.8368507.
[2] K.-Y. Chen, M. Harniss, S. Patel, K. Johnson, Implementing technology-based embedded
     assessment in the home and community life of individuals aging with disabilities: A participatory
     research and development study, Disability and Rehabilitation: Assistive Technology 9 (2013)
     112–120. doi:10.3109/17483107.2013.805824.
[3] O.D. Lara, M.A. Labrador, A survey on human activity recognition using wearable sensors,
     IEEE       Communications        Surveys     &     Tutorials    15      (2013)       1192–1209.
     doi:10.1109/surv.2012.110112.00192.
[4] Pew Research Center, About one-in-five Americans use a smart watch or fitness tracker, 2020.
     URL:       https://www.pewresearch.org/fact-tank/2020/01/09/about-one-in-five-americans-use-a-
     smart-watch-or-fitness-tracker/.
[5] F. Li, K. Shirahama, M. Nisar, L. Köping, M. Grzegorzek, Comparison of feature learning
     methods for human activity recognition using wearable sensors, Sensors 18 (2018) 679.
     doi:10.3390/s18020679.
[6] M. Shoaib, S. Bosch, O. Incel, H. Scholten, P. Havinga, A survey of online activity recognition
     using mobile phones, Sensors 15 (2015) 2059–2085. doi:10.3390/s150102059.
[7] M.H. Abid, A.-A. Nahid, M.R. Islam, M.A. Parvez Mahmud, Human activity recognition based
     on wavelet-based features along with feature prioritization, 2021 IEEE 6th International
     Conference on Computing, Communication and Automation (ICCCA) (2021) 933-939.
     doi:10.1109/iccca52192.2021.9666294.
[8] J.-L. Reyes-Ortiz, L. Oneto, A. Ghio, A. Samá, D. Anguita, X. Parra, Human activity recognition
     on smartphones with awareness of basic activities and postural transitions, Artificial Neural
     Networks and Machine Learning – ICANN 2014 (2014) 177–184. doi:10.1007/978-3-319-
     11179-7_23.
[9] Y.-L. Hsu, S.-L. Lin, P.-H. Chou, H.-C. Lai, H.-C. Chang, S.-C. Yang, Application of
     nonparametric weighted feature extraction for an inertial-signal-based Human Activity
     Recognition System, 2017 International Conference on Applied System Innovation (ICASI)
     (2017) 1718-1720. doi:10.1109/icasi.2017.7988270.
[10] H. Nematallah, S. Rajan, A.-M. Cretu, Logistic model tree for human activity recognition using
     smartphone-based       inertial   sensors,    2019     IEEE     SENSORS           (2019)     1-4.
     doi:10.1109/sensors43011.2019.8956951.
[11] F. Moya Rueda, R. Grzeszick, G. Fink, S. Feldhorst, M. ten Hompel, Convolutional Neural
     Networks for human activity recognition using body-worn sensors, Informatics 5 (2018) 26.
     doi:10.3390/informatics5020026.
[12] F. Demrozi, G. Pravadelli, A. Bihorac, P. Rashidi, Human activity recognition using inertial,
     physiological and Environmental Sensors: A comprehensive survey, IEEE Access 8 (2020)
     210816–210836. doi:10.1109/access.2020.3037715.
[13] N. Sikder, M.A. Ahad, A.-A. Nahid, Human action recognition based on a sequential deep
     learning model, 2021 Joint 10th International Conference on Informatics, Electronics & Vision
     (ICIEV) and 2021 5th International Conference on Imaging, Vision & Pattern Recognition
     (IcIVPR) (2021) 1-7. doi:10.1109/icievicivpr52578.2021.9564234.
[14] T. Mahmud, A.Q. Sazzad Sayyed, S.A. Fattah, S.-Y. Kung, A novel multi-stage training
     approach for human activity recognition from Multimodal Wearable Sensor Data Using Deep
     Neural Network, IEEE Sensors Journal 21 (2021) 1715–1726. doi:10.1109/jsen.2020.3015781.
[15] R. Moradi, R. Berangi, B. Minaei, A survey of regularization strategies for deep models,
     artificial intelligence review 53 (2019) 3947–3986. doi:10.1007/s10462-019-09784-7.
[16] C. Shorten, T.M. Khoshgoftaar, A survey on image data augmentation for Deep Learning,
     Journal of Big Data 6 (2019). doi:10.1186/s40537-019-0197-0.
[17] R. Ribani, M. Marengoni, A survey of transfer learning for Convolutional Neural Networks,
     2019 32nd SIBGRAPI Conference on Graphics, Patterns and Images Tutorials (SIBGRAPI-T)
     (2019) 47-57. doi:10.1109/sibgrapi-t.2019.00010.
[18] K. He, R. Girshick, P. Dollar, Rethinking ImageNet pre-training, 2019 IEEE/CVF International
     Conference on Computer Vision (ICCV) (2019) 4917-4926. doi:10.1109/iccv.2019.00502.
[19] N. Sikder, A.-A. Nahid, Ku-Har: An open dataset for heterogeneous human activity recognition,
     Pattern Recognition Letters 146 (2021) 46–54. doi:10.1016/j.patrec.2021.02.024.
[20] M.H. Abid, A.-A. Nahid, Two unorthodox aspects in handcrafted-feature extraction for human
     activity recognition datasets, 2021 International Conference on Electronics, Communications and
     Information Technology (ICECIT) (2021) 1-4. doi:10.1109/icecit54077.2021.9641197.
[21] D. Anguita, A. Ghio, L. Oneto, X. Parra, Jorge L. R.-Ortiz, A public domain dataset for human
     activity recognition using smartphones, 21th European Symposium on Artificial Neural
     Networks, Computational Intelligence and Machine Learning (ESANN) (2013).
[22] A.O. Jimale, M.H. Mohd Noor, Subject variability in sensor-based activity recognition, Journal
     of Ambient Intelligence and Humanized Computing (2021). doi:10.1007/s12652-021-03465-6.
[23] N.T. Hoai Thu, D.S. Han, Hihar: A hierarchical hybrid deep learning architecture for wearable
     sensor-based human activity recognition, IEEE Access 9 (2021) 145271–145281.
     doi:10.1109/access.2021.3122298.
[24] M.T. Uddin, M.M. Billah, M.F. Hossain, Random forests based recognition of human activities
     and postural transitions on smartphone, 2016 5th International Conference on Informatics,
     Electronics and Vision (ICIEV) (2016) 250-255. doi:10.1109/iciev.2016.7760005.
[25] F.S. Butt, L. La Blunda, M.F. Wagner, J. Schäfer, I. Medina-Bulo, D. Gómez-Ullate, Fall
     detection from electrocardiogram (ECG) signals and classification by Deep Transfer Learning,
     Information 12 (2021) 63. doi:10.3390/info12020063.
[26] G.Q. Ali, H. Al-Libawy, Time-series deep-learning classifier for human activity recognition
     based on smartphone built-in sensors, Journal of Physics: Conference Series 1973 (2021)
     012127. doi:10.1088/1742-6596/1973/1/012127.
[27] Y.-H. Byeon, S.-B. Pan, K.-C. Kwak, Intelligent deep models based on scalograms of
     electrocardiogram signals for Biometrics, Sensors 19 (2019) 935. doi:10.3390/s19040935.
[28] L. Gou, H. Li, H. Zheng, H. Li, X. Pei, Aeroengine Control System Sensor Fault diagnosis based
     on CWT and CNN, Mathematical Problems in Engineering 2020 (2020) 1–12.
     doi:10.1155/2020/5357146.
[29] P. Kumar, S. Suresh, DeepTransHHAR: Inter-subjects heterogeneous activity recognition
     approach in the non-identical environment using wearable sensors, National Academy Science
     Letters 45 (2022) 317–323. doi:10.1007/s40009-022-01126-6.
[30] A.-A. Nahid, N. Sikder, I. Rafi, Ku-Har: An open dataset for human activity recognition,
     Mendeley Data (2021). URL: https://data.mendeley.com/datasets/45f952y38r/5. doi:
     10.17632/45f952y38r.5.
[31] UCI Machine Learning Repository, Smartphone-based recognition of human activities and
     postural transitions data set, 2015. URL: http://archive.ics.uci.edu/ml/datasets/smartphone-
     based+recognition+of+human+activities+and+postural+transitions.

</pre>