Biosignal and Image Processing System for Emotion Recognition
Applications
Vitaliy Yakovyna, Viktor Khavalko, Viktor Sherega, Andrii Boichuk, Andrii Barna
a
    Lviv Polytechnic National University, Lviv, 79013, Ukraine

                Abstract
                The analysis of human emotion recognition methods and algorithms have been conducted. The
                new approach for emotion recognition based on the combined set of data from biosignals and
                visual features is proposed in the paper. In order to solve the task of emotion recognition, it is
                proposed an architecture of a system that would automatically read data, process them, build
                models and train them, monitor experiments and provide the user with a Web service based on
                the most accurate model studied. It was decided to continue the planned research on DEAP
                dataset since it was the most suitable for the described idea and proper data filling. Since the
                experiment includes the two totally different sources of data - biosignals and facial videos - the
                two data pipelines should be designed respectively. After analyzing the prediction distribution
                for all models, it was found that GRU architectures do best in the field of values, compared to
                CNN and FNN, which studied the mean. In order to define which filtering coefficients are best
                suited for this scientific work the automatic search algorithm is suggested. The elaborated
                algorithm for solving this roadmap is based on Deep reinforcement learning and is successfully
                demonstrated in solving similar tasks - Neural architecture search.

                Keywords 1
                emotion recognition, convolution neural network, dataset, video pipeline, biosignal pipeline,
                neural network, CNN-based architecture.


1. Purpose and motivation
    Not so long ago the idea for a machine to understand the human thoughts could be comprehended
as total nonsense. Computer-brain interfaces, BrainNet, deep interactive gaming are fields that are
thought to be fictional but at the same time could benefit and move human society to the next level of
development. Digital advertising, marketing, one-on-one interviews are the technologies we are already
using though they could be greatly improved by applying the described idea. After thoroughly
researching the computer-brain interfaces, its versions, the labs and scientists are developing in the
moment of writing this thesis the glimpses of new technologies such as EEG-to-speech, EEG-to-devices
(mental typing) are already on the horizon [1-5].
    By leveraging a couple of brain-wave detectors and complex algorithms, it’s gradually becoming
possible to analyze brain signals and extract reasonable brain patterns. Brain activity, such as neurons
and synapses cooperation, can then be recorded by a non-invasive device, so that no surgical
intervention is needed. In fact, most of the developed prototypes and mainstream BCIs are non-invasive.
Generally, they are contained inside the wearable headbands and earbuds. Regarding the invasive
approach, over the last years a specific type of BCI gained attention - a model that utilizes a grid of
electrodes implanted directly into the motor cortex and neighboring areas. In this context, motor


IT&AS’2021: Symposium on Information Technologies & Applied Sciences, March 5, 2021, Bratislava, Slovakia
EMAIL: yakovyna@matman.uwm.edu.pl (V. Yakovyna); viktor.m.khavalko@lpnu.ua (V. Khavalko), viktor.shereha.mknm.2019@lpnu.ua
(Sherega V.), barbek.ua@gmail.com (A. Boichuk), andrii.o.barna@lpnu.ua (A. Barna)
(ORCID: 0000-0003-0133-8591 (V. Yakovyna); 0000-0002-9585-3078 (V. Khavalko); 0000-0002-0563-5748 (A.Boichuk), 0000-0003-
3192-5439 (A. Barna)
             ©️ 2021 Copyright for this paper by its authors.
             Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
             CEUR Workshop Proceedings (CEUR-WS.org)
imagery is used as an intuitive and natural strategy to elicit brain activity changes and subsequently to
control movements of a robotic arm in real-time.
    With the technologies on the horizon that gives the opportunity for much more accurate brain-data
extraction the research can be moved to the following stage - emotions and its understanding. The
emotions are considered the vital states of the human being and play a dramatic role in its lifecycle,
commonly being emphasized in theoretical research as a mechanism of consciousness. The questions
about cognition, conscience, philosophy, human nature rise more frequently making the research of
emotions a first step of understanding how to answer them.
    Besides making a contribution to global topics, the research of emotions can influence people's lives
on the baseline level. As it is known emotions affect organisms not only on the psychical level but on
the physiological as well. An abundance of positive emotions improves a person's health and work
efficiency. On the other hand, negative emotions are one of the main reasons of depression which is the
widely spread cause of suicide if being neglected.
    For emotion recognition, the emotions should be defined and evaluated quantitatively. The sole
definition of initial emotions was first proposed decades ago. However, the precise definition has never
been widely acknowledged by psychologists. They tend to assess emotions with two different
approaches. One is to split the emotions into separate groups or classes. Another one is to use multi-
dimensional labels. For emotion elicitation, subjects are given a series of emotionally-evocative
materials to induce a certain emotion. For the past few years, entertainment stimulations are the most
common product. Besides, some new methods called situational stimulation are rising in recent years.

2. Main principles of emotion recognition
   Today, there are more and more categories of signs and data that can be used to teach a machine to
recognize a person's emotional state. The following categories should be noted: visual - includes images
or videos of the observed person, biometric signals - EMG, EOG, SLC and EEG, textual - contain the
semantic context, sound - the intonation of speech. Particular attention should be paid to combined data
types, as they significantly increase the area from which the machine learning model learns.
   It has been repeatedly confirmed that for the task of recognizing emotions, visual data can be forged,
so they cannot serve as the only category to rely on. On the other hand, most information a person
perceives with his eyes. Thus, visual data is the most familiar to us, which allows us to quickly assess
the emotional state of the interlocutor [6].
   Considering biosignals, and especially EEG, it is assumed that they are almost impossible to falsify,
as some of the processes that emit a brain signal are subconscious and different for each individual.
Compared to visual signs, a person cannot determine the emotions of another with the help of a
biosignal.
   After analyzing both types of data, we can conclude that an effective solution is a combination.
Based on the visual component, the process of data creation is accelerated, involving the characteristics
on which the living organism rests in the system. Biosignals will complement it by filtering out incorrect
operation and improving overall accuracy. This system will be considered in this scientific work.

3. Problem statement
    In order to solve the task of emotion recognition, it is necessary to build a system that would
automatically read data, process them, build models and train them, monitor experiments and provide
the user with a Web service based on the most accurate model studied. The combined set of data from
biosignals and visual features will be taken as a basis. The development of the system and the conduct
of research will not only lead to the acquisition of a product for the purpose of determining the emotional
state, but will also be able to be adapted to related goals, most of which include a computer-brain
interface. It is a system and a set of studies that cover the fields of medicine, education, entertainment.
    This is especially true for the diagnosis and treatment of diseases of the brain, nervous system -
patients suffering from paralysis, epilepsy, Alzheimer's symptom.
   Depression or anxiety is another diagnosis that falls into the mentioned group. Itcan be explained by
the heterogeneous mood that prevents to enjoy most activities. This symptom is also accompanied by
other ones, such as irritability, anxiety, inability to conduct problems.
   For the last decade, a lot of discussions appeared around cognitive performance improvement
enhancers. Coffee or tea is the most commonly mentioned stimulator. However, the debate has gained
importance about cognitive enhancement, which includes prescription drugs like Modafinil and
Ritalinby both professional workers and students.
   The invention of BCIs is also considered an approach to boost the cognitive functions of healthy
users. Neurofeedback training (brain activity alteration through operant conditioning), for instance, to
improve attention, long and short working memory, critical thinking functions is common among the
average healthy user. Another application area is optimized content learning. Although there is a lack
of good research-proven data on its effects, the volume is probably small and limited to specific
cognitive tasks. Generally speaking, there may be a thin line between non-medical and medical use of
neurofeedback.

4. Methods and materials
4.1. Dataset choosing
   To address the described challenge the proper dataset is needed, the one with enough data the
complex models to be empirically built on. After a broad data exploration for such datasets as DECAF,
MAHNOB, etc, it was decided to continue the planned research on DEAP dataset since it was the most
suitable for the described idea and proper data filling.
   DEAP dataset is publicly available as a state-of-art dataset for visual and biosignal emotion
recognition. The dataset was assembled, processed and generated by a R&D team at the Queen Mary
University of London. The DEAP dataset consists of multiple physiological signal types and face video
records for the evaluation of emotions. 32 channel EEG data were collected from 32 volunteers. The
face records and biosignals were recorded by showing 40 preselected music videos which varied in its
topic to boost the emotional engagement, each with a duration of 60 seconds (Table 1). The signals
were downsampled to 128 Hz and denoised via the bandpass and lowpass frequency filters.

Table 1
Filters for each of the dataset channels
 Chanel                                                                           Low          High
                                 Chanel description
  index                                                                         pass(Hz)     pass(Hz)
    1                                  Fp1(EEG)                                    8.0         13.0
    2                                  AF3(EEG)                                    8.0         13.0
    3                                      F3(EEG)                                 8.0         13.0
    4                                      F7(EEG)                                 8.0         13.0
    5                                  FC5(EEG)                                    8.0         13.0
    6                                  FC1(EEG)                                    8.0         13.0
    7                                   C3(EEG)                                    8.0         13.0
    8                                      T7(EEG)                                 8.0         13.0
    9                                  CP5(EEG)                                    8.0         13.0
   10                                  CP1(EEG)                                    8.0         13.0
   11                                   P3(EEG)                                    8.0         13.0
   12                                   P7(EEG)                                    8.0         13.0
 Chanel                                                           Low        High
                           Chanel description
  index                                                         pass(Hz)   pass(Hz)
   13                          PO3(EEG)                           8.0       13.0
   14                           O1(EEG)                           8.0       13.0
   15                           Oz(EEG)                           8.0       13.0
   16                           Pz(EEG)                           8.0       13.0
   17                          Fp2(EEG)                           8.0       13.0
   18                          AF4(EEG)                           8.0       13.0
   19                           Fz(EEG)                           8.0       13.0
   20                           F4(EEG)                           8.0       13.0
   21                           F8(EEG)                           8.0       13.0
   22                          FC6(EEG)                           8.0       13.0
   23                          FC2(EEG)                           8.0       13.0
   24                           Cz(EEG)                           8.0       13.0
   25                           C4(EEG)                           8.0       13.0
   26                           T8(EEG)                           8.0       13.0
   27                          CP6(EEG)                           8.0       13.0
   28                          CP2(EEG)                           8.0       13.0
   29                           P4(EEG)                           8.0       13.0
   30                           P8(EEG)                           8.0       13.0
   31                          PO4(EEG)                           8.0       13.0
   32                           O2(EEG)                           8.0       13.0
   33            hEOG (horizontal EOG, hEOG1 - hEOG2)             0.5       3.25
   34              vEOG (vertical EOG, vEOG1 - vEOG2)            0.35        3.5
   35        zEMG (Zygomaticus Major EMG, zEMG1 - zEMG2)          0.5        1.5
   36            tEMG (Trapezius EMG, tEMG1 - tEMG2)              0.5        5.5
           GSR (values from Twente converted to Geneva format
   37                           (Ohm))                           0.25        4.5
   38                       Respiration belt                      1.0        3.5
   39                       Plethysmograph                        0.2        0.8
   40                        Temperature                          0.5        3.0


4.2.    Proposed Approach
   To make a move toward modeling an AI system the data digestible by backpropagation algorithm is
required, since it is vulnerable to unstandardized, missing, sparse values. The data processing pipeline
serves as a tool to complete these requirements and secures the correct data format for the training
session. Since the experiment includes the two totally different sources of data - biosignals and facial
videos - the two data pipelines should be designed respectively.
   The video pipeline consists of the following steps: key frame extraction, frame standardization,
transition to model’s input format.
   The biosignal pipeline, on the other hand, is composed of the bandpass filtering via the empirical
definition of the required parameters, outliers handling, standardization.
   Both of the mentioned pipelines can also be integrated into the third one - the one that synchronizes
the video and signal time-series data frames into a single multi-input model format.
   CNN-based signal model - an approach based on stacking all signals vectors on a single frame. It is
possible since the length of the signal vector is constant. For the extraction of this frame, 2D CNNs
were used [7 – 10].
   The next modification that was proposed is to add a bottleneck to model architecture in order to
preserve kernel forgetting the difference between distinct channels. Bottleneck was identified by CNN
with a kernel to wrap the share of channels and a steady step to simulate working with images that CNN
specializes in. Another benefit of considering CNN is its parameter size efficiency. Despite being
multiple times smaller in complexity compared to feedforward models, CNN produces greated results
with the less time needed for the inference computation and training.

4.3.    GRU-based signal model
    The main concept was to preserve the long memory values without resetting it anew. This is the
reason GRU was suggested to implement since the GRU layer does not have the forget gates. It also is
trained faster compared to traditional Bidirectional LSTM which gave more computational time to tune
the parameters [11].

4.4.    Video-based model
    Emotion recognition for images gained the new benchmark horizons over the last decade since the
rise of the convolutional neural network. It is especially robust for the examples that represent the far
extremes of the emotional space as they are more definite and lack the ambiguosity. Speaking about the
ones that are located at the center of 3D or 4D emotional space, they are pruned to be classified as
neutral. Another scenario where the mentioned model type fails is the recognition of concealed
emotions. To deal with such challenges the video-based model is proposed. It will not only derive the
information from the image data but also captures the context over the defined timeframe. Compared
to doing the recognition on images it also helps to make the smoother prediction over time omitting the
accidentally-captured frames that can harm the performance [12].

4.5.    Fusion model
   The initial goal was to fuse the biosignal channels with video frames, preserving their variance. Also
the video frame theoretically can add more context to the data to learn from. It could help to solve the
obstacles the previous deep learning model faced such as emotion counseling and emotionless
expression. Combining those two methods, the system potentially gains more saturated features to
process and more parameters to be shared between two parallel flows. After completing the parameter
optimization, the video and single flows can be separated and used as pre-trained models. The benefit
of applying such a technique is that the visual model shares the weights with signal one and it produces
the knowledge base that cannot be obtained on the sole visual data alone. The same goes vice-versa for
the signal data as well [13 – 15].
   The pipeline for building architecture with such capabilities is: multichannel extraction, video
processing, fusion module, multiple deep layers.
5. Results
   In order to obtain the reliable data that fully covers the performance and blind spots of the system
flow the two consequent approaches were designed to be tested on.
   For the initial AI part testing the following metrics are required to assessed:
   •    Categorical Accuracy;
   •    MAE;
   •    MSE;
   •    F-score.
    Based on the defined metrics we concluded that the CNN-based architecture has worked best for
signals. This is due to the complexity of CNN, which includes fewer parameters than competitors and
better matches on datasets with a small number of characteristics (features).
    As for the video, the pre-trained model showed better results than the one that started learning from
random parameters. This case is refuted by the homogeneity of the frames in the video. On the other
hand, the descriptors of the pre-trained model were optimized on 1281167 unique images, which
allowed better CNN-Inception extractors convergence to highlight the main characteristics on the basis
of which you can make predictions.
    The combined model between GRU signals and the pre-trained video model performed best in the
Categorical Accuracy metric, but lost to MAE in Signal CNN.
    Another approach that was suggested is based on the predicted labels’ distribution analysis. It allows
to detect the situations when the constructed model is not able to fully optimize the parameter for robust
workflow. In the worst-case scenario, the empirically derived model can derive only the mean value of
the whole test dataset. This serves as a major cause of underfitting [16 – 19]. To avoid such a situation,
the produced models were tested via this method. The obtained results appeared to be contrary to the
calculated before:
   •   The FNN signal network during optimization studied the value of 7 for use in most cases.
   •   CNN observations for the signals revealed that the model studied the average of all labels.
   •   For the signal GRU, a concentration of predictions is observed for values 6 and 7. Other values
   were obtained in a similar pattern to FNN.
   •   CRNN is a model that concentrates most values at one point.

   CRNN with pretrained ResNet descriptors - Compared to the CRNN model, which trained using
random weights in the beginning, this model focused on 3 points instead of 1. In addition, they are
uniform.
   The combined model made it possible to study the merged distribution of the previously described
models. Similar to the pre-trained CRNN model, the concentration of values is in 2 main classes. As
for the GRU for signals, its application gave a smoothing effect to the distribution.
   To sum up the results of both of the predefined approaches, the CRNN with pretrained ResNet
descriptors showed the most promising insights in understanding of the emotion-space mapping.
   The most important predictors are given in Table 2.

Table 2
The most important predictors
       Model name             Categorical         MSE           MAE         F-score      Categorical
                               Accuracy                                                 Crossentropy
        Signal FNN              0.172             7.175         2.0          0.17            2.5
        Signal CNN              0.149             5.04          1.81         0.08          2.187
        Signal GRU              0.177             7.187         2,07         0.16          2.964
       Video C-RNN               0.11            15.498         3.01         0.06          3.115
      Model name              Categorical         MSE           MAE         F-score      Categorical
                               Accuracy                                                 Crossentropy
   Video C-RNN based            0.133            12.498        2.841         0.10          2.388
     on transferred
       ResNet101
    Multi-input signal           0.186           6.168         1.936         0.14             2.63
   GRU + video C-RNN
        based on
 transferredResNet101

    After analyzing the prediction distribution for all models, it was found that GRU architectures do
best in the field of values, compared to CNN and FNN, which studied the mean. It should be noted that
the number of GRU neurons 256 and 128, respectively (Fig. 1). This decision was made due to the
inability of the model to optimize the larger layers 512, 256, due to the small number of informative
characteristics in the bioset signalset.
    CRNN with pre-workouts ResNet101 weights Fig. 2 showed better results compared to the
competing model, namely MSE 2.38 and 3.15. This difference is due to insufficient data to study the
internal CNN extractors of this architecture. Although 32 1-minute videos were used for each of the 22
people, the difference in information between the processed frames is insignificant, which is a
simulation of the upsampling process, which is based on duplicating records to balance classes in a
dataset while storing variance.) and standard deviation (std).

6. Conclusion
    Emotion recognition surely is one of the essential tasks we need to solve before declaring the
understanding of the human body and its brain. It plays a major role in the solution that can drastically
increase the level of life for a common person. Biotechnology, BCI, depression handling - the tools that
can be seen on the horizon after making progress in this field [20 – 26]. To make the commitment to
such a cause this research’s results are presented to be freely shared.
    This paper describes the study of multichannel signals and systems for preparing for the integration
with the visual data. Visual emotion recognition was for a while in the market and showed promising
results. On the other hand signals differed in their origin and sources. The techniques and methods to
extract the positive information from them are also unique, especially the complexity is rising as we
dive deeper into the brain. Approaches to the evaluation information flow of the channel to the
classification of modules were demonstrated. An effective, efficient and interactive means of model
capacity, typical extraction functions, construction of hybrid models were also demonstrated.
    The CRNN with pretrained ResNet descriptors demonstrated the most promising results from all the
conducted experiments. The downside of this solution is its computation requirements since it is built
on two complex neural networks. On the other hand, if one recalls the duration of the data records, it
makes clear that the real-time processing is not needed and the best-performing model is fully covering
the needed trafic of requests for inferencing. In the worst-case scenario, the performance problem can
be transitioned to the infrastructure scaling without changing the state of the deployed model. Since the
system is far from being though the process of comparing it to ground truth labels, more modifications
are proposed to be designed and experimented. It is also worth mentioning that the ground truth labels
cannot be considered 100% accurate since the emotions were assessed by the humans and the emotions
itself are the subjectictive not objective measurement. As it was mentioned the signal filtering stage was
an essential part of the model's training cycle.
    In order to define which filtering coefficients are best suited for this scientific work the automatic
search algorithm is suggested. In a conjunction with automated model training cycle, both of those
automatic approaches can solve two tasks simultaneously:
    •     Filtering parameters search for biosignals,
    •     Emotion recognition model optimization,
   •    Emotion recognition model hyperparameter tuning.
   The type of algorithm for solving this roadmap is based on Deep reinforcement learning and is
successfully demonstrated in solving similar tasks - Neural architecture search. Though the proposed
method required a tremendous amount of computational power since thousands of signal recognition
models are needed to be trained repeatedly.


Figure 1: Architecture visualization for the combined model based on signals and video
Figure 2: Architecture visualization for GRU model based on signals
7. References
[1] Lynn, Htet Myet, Sung Bum Pan, and Pankoo Kim. "A deep bidirectional GRU network model for
     biometric electrocardiogram classification based on recurrent neural networks." IEEE Access 7
     (2019): 145395-145405.
[2] Tan Chuanqi, et al., A survey on deep transfer learning, in: International conference on artificial
     neural networks. Springer, Cham, 2018.
[3] Edla Damodar Reddy, et al., Classification of EEG data for human mental state analysis using
     Random Forest Classifier, Procedia computer science, 132, 2018, pp.1523-1532.
[4] Eralda Nishani and Betim Çiço, Computer vision approaches based on deep learning and neural
     networks: Deep neural networks for video analysis of human pose estimation, in: 6th
     Mediterranean Conference on Embedded Computing (MECO). IEEE, 2017.
[5] A. Craik, He Yongtian and L. Jose Contreras-Vidal, Deep learning for electroencephalogram
     (EEG) classification tasks: a review, Journal of neural engineering, 16.3, 2019.
[6] Hong-Wei Ng, et al., Deep learning for emotion recognition on small datasets using transfer
     learning, ACM on international conference on multimodal interaction, 2015.
[7] Schirrmeister, Robin Tibor, et al., Deep learning with convolutional neural networks for EEG
     decoding and visualization, Human brain mapping 38.11, 2017, pp.5391-5420.
[8] He Kaiming, et al., Deep residual learning for image recognition, in: Proceedings of the IEEE
     conference on computer vision and pattern recognition, 2016.
[9] Xuanyu He, and Wei Zhang, Emotion recognition by assisted learning with convolutional neural
     networks, Neurocomputing, 291, 2018) pp.187-194.
[10] Hossain, M. Shamim, and Ghulam Muhammad, Emotion recognition using deep learning approach
     from audio–visual emotional big data, Information Fusion, 49, 2019, pp. 69-78.
[11] Chuanqi Tan, et al., Multimodal classification with deep convolutional-recurrent neural networks
     for electroencephalography, in: International Conference on Neural Information Processing.
     Springer, 2017.
[12] Sepehr Valipour, et al., Recurrent fully convolutional networks for video segmentation, in: IEEE
     Winter Conference on Applications of Computer Vision (WACV). IEEE, 2017.
[13] Samarth Tripathi, et al., Using deep and convolutional neural networks for accurate emotion
     classification on DEAP dataset, in: Proceedings of the Thirty-First AAAI Conference on Artificial
     Intelligence. 2017.
[14] Sabir, Ekraam, et al. "Recurrent convolutional strategies for face manipulation detection in
     videos." Interfaces (GUI) 3.1 (2019).
[15] Steven Lemm, et al., Spatio-spectral filters for improving the classification of single trial EEG, in:
     IEEE transactions on biomedical engineering 52.9, 2005, pp. 1541-1548.
[16] Shakhovska N., Basystiuk O., & Shakhovska K. Development of the Speech-to-Text Chatbot
     Interface Based on Google API, in: MoMLeT, 2019, pp. 212-221.
[17] Shakhovska K., Shakhovska N. Veselý P. The Sentiment Analysis Model of Services Providers’
     Feedback. Electronics, 2020, 9(11), 1922.
[18] Khavalko V., Mazur A., Mykhailyshyn V., Zhelizniak R., Kovtyk I. Economic efficiency of
     innovative projects of CNN modified architecture application, International workshop on cyber
     hygiene (CybHyg-2019), Kyiv, November 30, 2019, pp. 182–193.
[19] Kryvenchuk Y., Boyko N., Helzynskyy I., Helzhynska T., Danel R. Synthesis control system
     physiological state of a soldier on the battlefield, in: Proceedings of the 2nd International workshop
     on informatics & data-driven medicine IDDM 2019, Lviv, Ukraine, November 11-13, 2019, Vol.
     1, pp. 297–306
[20] Holub S., Khymytsia N., Holub M., Fedushko S. The Intelligent Monitoring of Messages on Social
     Networks. CEUR Workshop Proceedings. Vol 2616: Proceedings of the 2nd International
     Workshop on Control, Optimisation and Analytical Processing of Social Networks (COAPSN-
     2020), Lviv, Ukraine, May 21, 2020. p. 308-317. http://ceur-ws.org/Vol-2616/paper26.pdf
[21] Fedushko S., Ortynskyy V., Reshota V., Tereshchuk V. Legal And Economic Aspects of the PR
     Сampaign of Scientific Conference in Social Networks. CEUR Workshop Proceedings. Vol 2616:
     Proceedings of the 2nd International Workshop on Control, Optimisation and Analytical
     Processing of Social Networks (COAPSN-2020), Lviv, Ukraine, May 21, 2020. p. 342-
     352. http://ceur-ws.org/Vol-2616/paper29.pdf
[22] Wosiak A., et al. Optimisation of the cooling unit in the system for supervising the condition of
     large power transformers. Przeglad Elektrotechniczny, 2009, 85.12: 166-169.
[23] Lipinski P., Yatsymirskyy M. Efficient 1D and 2D Daubechies wavelet transforms with application
     to signal processing. In: International Conference on Adaptive and Natural Computing Algorithms.
     Springer, Berlin, Heidelberg, 2007. p. 391-398.
[24] Stolarek J., Lipiński P. Improving watermark resistance against removal attacks using orthogonal
     wavelet adaptation. In: International Conference on Current Trends in Theory and Practice of
     Computer Science. Springer, Berlin, Heidelberg, 2012. p. 588-599.
[25] Lipinski P. On domain selection for additive, blind image watermarking. Bulletin of the Polish
     Academy of Sciences. Technical Sciences, 2012, 60.2: 317-321.
[26] Lipinski P., Yatsymirskyy M. On synthesis of 4-tap and 6-tap reversible wavelet filters. Przegląd
     Elektrotechniczny, 2008, 84.12: 284-286.
[27] Glonek G., Wojciechowski A. Hybrid orientation based human limbs motion tracking method.
     Sensors, 2017, 17.12: 2857.
[28] R. G. Alakbarov, “Method for Effective Use of Cloudlet Network Resources,” IJCNIS, vol. 12,
     no. 5, pp. 46–55, Oct. 2020, doi: 10.5815/ijcnis.2020.05.04.
[29] Md. R. Ahmed, T. Islam Robin, and A. Ali Shafin, “Automatic Environmental Sound Recognition
     (AESR) Using Convolutional Neural Network,” IJMECS, vol. 12, no. 5, pp. 41–54, Oct. 2020, doi:
     10.5815/ijmecs.2020.05.04.
[30] Opałka S.et al. Multi-Channel Convolutional Neural Networks Architecture Feeding for Effective
     Eeg Mental Tasks Classification. Sensors, 2018, 18.10: 3451.