Identification of Modern Facial Emotion Recognition Models Kirill Smelyakov 1, Oleksandr Bohomolov 1, Maksym Kizitskyi 1, Anastasiya Chupryna 1 1 Kharkiv National University of Radio Electronics, 14 Nauky Ave., Kharkiv, 61166, Ukraine Abstract The paper is devoted to the problem of developing a generalized algorithm for the effective identification of computational intelligence models used to recognize emotions by a person's facial expression. To solve this problem, an actual dataset was selected, alternative recognition models, algorithms and machine learning technologies were identified, as well as performance indicators and metrics that are used in the course of a comparative analysis of the obtained results. A series of numerous experiments has been carried out in relation to the identification of the parameters of alternative models of neural networks that are used to recognize emotions and evaluate the effectiveness of their application. Based on a comparative analysis of the effectiveness of the results of experiments, a generalized algorithm for identifying emotions was formulated, as well as recommendations for the use of certain architectures of neural networks in the framework of the tasks of facial emotion recognition. Keywords 1 Computer vision, facial emotion recognition, face recognition, convolutional neural network, transfer learning 1. Introduction Facial emotion recognition (FER) is a quite new and fast growing area of computer vision. Its main task is to identify what kind of emotion a person feels, using his/her facial expression. As an area of computer vision, the use of neural networks looks quite promising for this task. Because convolutional networks show, good results in other tasks. Many of the public networks are pretrained. Therefore, the question of using transfer learning for FER task arises. This will reduce the uncertainty of researchers when choosing a machine learning model and significantly speed up and increase the effectiveness of experiments in the field of FER. Therefore, the issue of transferring skills from other problem areas is rather little studied and promising. The aim of the work is to research the possibility of neural networks and transfer learning technology applying to FER problems. The goals of the work are to choose a dataset, based on it, plan and perform experiments, the results of which will allow: ● to formulate an effective algorithm for neural network identification and usage within the framework of the FER task; ● to determine which architecture of neural networks is better to use as a backbone for FER tasks in different situations; ● to compare the effectiveness of face recognition based backbones with standard solutions for transfer learning. COLINS-2022: 6th International Conference on Computational Linguistics and Intelligent Systems, May 12–13, 2022, Gliwice, Poland EMAIL: kyrylo.smelyakov@nure.ua (K. Smelyakov); oleksandr.bohomolov@nure.ua (O. Bohomolov); maksym.kizitskyi@nure.ua (M. Kizitskyi); anastasiya.chupryna@nure.ua (A. Chupryna) ORCID: 0000-0001-9938-5489 (K. Smelyakov); 0000-0002-9539-8888 (O. Bohomolov); 0000-0001-9771-5771 (M. Kizitskyi); 0000-0003- 0394-9900 (A. Chupryna) ©️ 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) 2. Related Works Researches in recent years focus on facial emotion recognition (FER) task [1-3]. Such systems often supplement to face recognition systems (Azure Face API, Face, FaceReader, etc.) [4-6] and can be used in many situations, from customer satisfaction analysis, service at the checkout, to tracking emotions at a psychologist’s appointment [7, 8], in perspective drone vision services [9], etc. The most efficient approaches that use such networks as ResNet, AffectNet, MobileNet, etc. on facial emotion recognition (FER) task are described by researchers. To simplify the access to this information they organized special list [10]. On the other hand, it includes various forms of ensembling and stacking of neural networks. It gives a win in the quality of the classification of emotions, but this approach also has disadvantages. Firstly, the model itself becomes quite huge and heavy, and a lot of time is spent on predictions. Because of this, the application of models of this kind is very complicated on mobile devices or in real-time systems. Secondly, due to the presence of several neural networks, the process of maintaining them within the production system becomes more complicated, and the task of updating models while maintaining the logic of work becomes more difficult compared to a solution in the form of an end-to- end model. Therefore, the issue of developing a model, perhaps not as effective, but much more compact and easy to maintain for use in face recognition systems, remains relevant and open. At the same time, a wide variety of machine learning models and algorithms, as well as a high degree of uncertainty in the application conditions, often create great difficulties in choosing an appropriate network architecture and tuning its parameters effectively [11-13]. Why are neural networks and transfer learning considered to solve FER problems? In recent years, neural networks have become the standard tool in the area of computer vision [14, 15]. A large number of diverse architectural solutions (EfficientNet, ResNet, Yolov5, etc.) and machine learning methods have been proposed to solve the problems of image classification object detection, and recognition. Their performance is affected by the quality of images [16, 17], result of image segmentation [18, 19], the architecture and hyperparameter settings of neural networks [20]. Moreover, researches on the application of convolutions are carried out to improve the effectiveness of CNN application in the case of the optimization of convolution mask parameters, the number of layers and a number of other parameters [21, 22]. For the purposes of identifying the parameters of a neural network, a wide range of machine learning algorithms is currently used. One of the most effective is transfer learning [23]. Transfer learning (fine- tuning a neural network with pre-trained weights on a huge data set (for example ImageNet) to solve a specific problem) is widely used in all areas of computer vision and increases the quality of solving different kinds of problems [24, 25]. The main advantage of this approach is that, thanks to the pre-trained weights, the model transforms the input image into a smaller set of meaningful features. Because of this, the relief of the loss function is smoothed out and the models converge faster to its minimum. Also recently, in such a field as face recognition, SOTA technique is often used, where the model is trained to compress the image into a feature vector by which a person’s face can be identified [26]. Which in turn is very similar to what transfer learning is used for. That's why we decided to compare classical transfer learning models with face recognition models in more detail. Besides, this domain was selected because this is quite a popular area and many pre-trained models are in the public access [27]. For models to benefit from pre-trained weights, the task must be related to the domain on which the models were trained. The research results are important not only for FER services, but also for solving a great number of related tasks, including the development of effective integrated E-learning services, AI solutions [28], the development of ICT solutions, network solutions and security services [12, 29, 30]. In addition, if face recognition based models show advantages over standard approaches, it means that the use of face recognition learning approaches can improve the quality of transfer learning models in other areas, increase learning speed and allow using less data for training. And it will allow specialists to conduct more experiments and reduce the outlay of cloud learning services. 3. Methods and Materials First of all, consider the data that will be used in further experiments, some other materials and methods proposed to solve the problem under consideration. 3.1. Dataset Description In order to test our approach, we chose a quite well known data set FER2013 [31]. The 2013 Facial Expression Recognition dataset (FER2013) is a Kaggle dataset, introduced by Pierre-Luc Carrier and Aaron Courvill at the International Conference on Machine Learning (ICML) in 2013. This dataset was chosen because it is in the public access. It also contains photographs of people of different age, gender, race, nationality, with different background and accessories (such as glasses, masks). It allows a better evaluation of the generalization abilities in emotions recognizing. This dataset contains grayscale images of faces. Their size is 48x48 pixels. These images have been created using an automatic face registration so that the faces on them are centered and occupy nearly the same amount of space in each image. So when making a comparison, we assume that the images have already been preprocessed in advance, therefore we will not consider this issue within the framework of our paper. Each image is labeled with one of seven emotions from the following list: Angry, Disgust, Fear, Happy, Sad, Surprise, Neutral. The Disgust expression has the minimal number of images – 547, while other labels have nearly 5,000 samples each. More detailed information is presented in Table 1. Table 1 Number of pictures for each class Labels Angry Surprise Disgust Neutral Happy Fear Sadness Total Number of 4953 4002 547 6198 8989 5121 6077 35887 examples Figure 1 shows examples of randomly selected pictures from the data set. As we can see, both men and women of different ages (from babies to old people), of different nationalities and races are represented in the data set. Figure 1: Examples of images [31] In general, this data set provides a wide variety of face images, which will favorably affect the generalization ability of the model. However, it also has an imbalance of classes that is why the accuracy of determining the emotion of disgust will probably be lower in comparison with others. To split the data set, a standard function from the sklearn package, train_test_split, was used. Training dataset - 70% (25,121 images). Validation dataset - 10% (3,589 image). Test dataset - 20% (7,177 images). The partition was stratified by emotion in the image with random_state = 42. 3.2. Methods We have chosen as key metrics: ● accuracy on the training set; ● loss on the training set; ● accuracy on the validation set; ● accuracy on the validation set; ● mean convergence rate (MCR) 1 (1) 𝑀𝐶𝑅 = 𝑛 ∑𝑛𝑖=1 𝑀𝑒𝑡𝑟𝑖𝑐𝑖 − 𝑀𝑒𝑡𝑟𝑖𝑐𝑖−1 , where n – number of epochs; 𝑀𝑒𝑡𝑟𝑖𝑐𝑖 – performance metric on train data set during i`s epoch; ● mean overfitting rate (MOFR) 1 (2) 𝑀𝑂𝐹𝑅 = ∑𝑛𝑖=1(𝑀𝑒𝑡𝑟𝑖𝑐_𝑡𝑟𝑎𝑖𝑛𝑖 − 𝑀𝑒𝑡𝑟𝑖𝑐_𝑣𝑎𝑙𝑖 ) − (𝑀𝑒𝑡𝑟𝑖𝑐_𝑡𝑟𝑎𝑖𝑛𝑖−1 − 𝑛 𝑀𝑒𝑡𝑟𝑖𝑐_𝑣𝑎𝑙𝑖−1 ), where n – number of epochs; 𝑀𝑒𝑡𝑟𝑖𝑐_𝑡𝑟𝑎𝑖𝑛𝑖 – performance metric on train data set during i`s epoch; 𝑀𝑒𝑡𝑟𝑖𝑐_𝑣𝑎𝑙𝑖 – performance metric on validating data set during i`s epoch; ● initial accuracy – accuracy after training for 1 epoch. We chose this metric because it shows how well the pre-trained weights of the model fit the domain; ● initial loss – loss after training for 1 epoch. In our experiment Metric will be accuracy and loss (categorical cross entropy). 4. Experiment This section presents the plan of the experiment. In order to evaluate the effectiveness of transfer learning, we will compare several popular architectures such as VGG-Face (Figure 2), OpenFace (Figure 3) which are neural networks trained for face recognition. Our hypothesis is that since the task of face recognition is in some way similar to FER, therefore, the weights of the networks will already contain the necessary features that will increase the learning performance. We also chose ResNet-50, MobileNet (Figure 4) pretrained on ImageNet dataset because they are the standard choice as a backbone in transfer learning. In these networks, the last layer was excluded, and all layers except the last 4 were frozen. Figure 2: VGG-Face architecture [32] Figure 3: OpenFace architecture [33] Figure 4: MobileNet-50 architecture [34] The model structure of VGG-Face and OpenFace were loaded using deepface library [35]. The pretrained weights are available on [36-38]. ResNet-50, MobileNet were loaded using keras framfork [39]. Each model will be trained with a fixed set of hyperparameters such as the learning rate (10-4), the number of epochs is 20. Also, key metrics will be measured every 5 epochs. As a loss function we chose categorical cross entropy. To compare the efficiency of transfer learning, we will train neural networks in 2 versions: with pre- trained weights and with randomly initialized weights. This approach will allow us to determine how and at what stages the pre-trained weights affect the efficiency of the model. After the experiment we will find out in which model the pre-trained weights give the greatest value compared to random initialization, determine which model converges faster than others, is more resistant to overfitting and shows the highest accuracy. Training will be carried out in the Google Colaboratory environment. 5. Results The results of the experiments are presented in Figures 5 – Figure 8 and in Tables 2 – Table 5. The high resolution versions of all images are presented here [40]. 5.1. ML Results Figures 5 – Figure 8 show the accuracy and loss during training for MobileNet, OpenFace, ResNet- 50, VGG-Face models. Each graph shows the results of the pretrained model (straight line) and the randomly initialized model (dashed line). a) b) Figure 5: MobileNet training process: a) Accuracy change over epoch of MobileNet; b) Loss change over epoch of MobileNet a) b) Figure 6: OpenFace training process: a) Accuracy change over epoch of OpenFace; b) Loss change over epoch of OpenFace a) b) Figure 7: ResNet-50 training process: a) Accuracy change over epoch of ResNet-50; b) Loss change over epoch of ResNet-50 a) b) Figure 8: VGG-Face training process: a) Accuracy change over epoch of VGG-Face; b) Loss change over epoch of VGG-Face Table 2 shows MCR (1) for each model based on accuracy. Table 3 shows MCR for each model based on loss. Table 4 shows MOFR (2) for each model based on accuracy. Table 5 shows MOFR for each model based on loss. It shows how much the difference between the metrics on the validation and training sets increases on average over the epoch, that is, how quickly the model over fits. Table 2 Accuracy based MCR Model Epoch 1-5 Epoch 6-10 Epoch 11-15 Epoch 16-20 MobileNet_pretrained 0,039937 0,02512 0,019603 0,016789 MobileNet_random 0,013814 0,021909 0,023413 0,025798 openface_pretrained 0,080035 0,014538 0,003204 0,001637 openface_random 0,011508 0,001289 0,002027 0,0018 ResNet50_pretrained 0,063678 0,026667 0,008366 0,001567 ResNet50_random 0,052546 0,045559 0,023246 0,007233 vggfase_pretrained 0,001372 0,000111 0,000293 0,000453 vggfase_random 0 0 0 0 Table 3 Loss based MCR Model Epoch 1-5 Epoch 6-10 Epoch 11-15 Epoch 16-20 MobileNet_pretrained -0,09537 -0,05194 -0,04445 -0,03934 MobileNet_random -0,0492 -0,03906 -0,05088 -0,06358 openface_pretrained -0,20339 -0,041 -0,00892 -0,00399 openface_random -0,0145 -0,00418 -0,0031 -0,0008 ResNet50_pretrained -0,14797 -0,06965 -0,0328 -0,01256 ResNet50_random -0,12636 -0,11673 -0,06173 -0,02249 vggfase_pretrained -0,00415 -0,00199 -3,9E-05 -0,00179 vggfase_random -0,00034 -9,1E-05 -4E-05 -6,1E-05 Table 4 Accuracy based MOFR Model Epoch 1-5 Epoch 6-10 Epoch 11-15 Epoch 16-20 MobileNet_pretrained 0,033082 0,042063 0,007035 0,026356 MobileNet_random 0,017102 0,017952 0,018843 0,021943 openface_pretrained 0,100823 0,003225 -0,00725 5,79E-05 openface_random 0,003176 -0,00272 0,000745 0,001056 ResNet50_pretrained 0,054287 0,02505 0,009175 0,000267 ResNet50_random 0,034293 0,040487 0,025503 -0,00187 vggfase_pretrained -0,00052 -0,00173 0,000293 0,001846 vggfase_random 0 0 0 0 Table 5 Loss based MOFR Model Epoch 1-5 Epoch 6-10 Epoch 11-15 Epoch 16-20 MobileNet_pretrained -0,0504 -0,07935 -0,02334 -0,23275 MobileNet_random -0,04781 -0,04951 -0,07605 -0,12429 openface_pretrained -0,57781 -0,0532 0,051506 -0,09709 openface_random -0,00039 0,001388 0,00041 0,002145 ResNet50_pretrained -0,14471 -0,09968 -0,07212 -0,04759 ResNet50_random -0,12195 -0,22848 -0,17572 0,012097 vggfase_pretrained -0,00339 0,004537 0,002139 -0,00136 vggfase_random -0,00013 -1,4E-05 -4,9E-05 -5E-05 Figures 9 – Figure 10 show accuracy and loss of the models after the first epoch of training. Figure 9: Initial accuracy Figure 10: Initial loss 5.2. Testing Results Figures 11 – Figure 17 show the results of image classification from a test set by various networks. An image with an emotion label is on the left, and a bar plot with neural network prediction is on the right. a) b) Figure 11: Classification result of emotion “angry”: a) An image example [26]; b) Predicted emotion probabilities a) b) Figure 12: Classification result of emotion “disgust”: a) An image example [26]; b) Predicted emotion probabilities a) b) Figure 13: Classification result of emotion “fear”: a) An image example [26]; b) Predicted emotion probabilities a) b) Figure 14: Classification result of emotion “happy”: a) An image example [26]; b) Predicted emotion probabilities a) b) Figure 15: Classification result of emotion “neutral”: a) An image example [26]; b) Predicted emotion probabilities a) b) Figure 16: Classification result of emotion “sad”: a) An image example [26]; b) Predicted emotion probabilities a) b) Figure 17: Classification result of emotion “surprise”: a) An image example [26]; b) Predicted emotion probabilities 6. Discussions As a result of the experiment, it was revealed that pre-trained models showed better performance than randomly initialized ones in the FER task. Also, the pre-trained models had a higher average convergence rate at the first epochs (1-10), but then values became the same, in some cases, at epochs 15-20, the randomly initialized model converged faster. This is mainly due to the fact that the pre- trained model at that moment reached an accuracy of more than 0.8 and, accordingly, the quality increase slowed down. On the other hand, pre-trained models are more prone to overfitting, therefore, when using them, it is desirable to apply various regularization methods or data augmentation. The best model in terms of initial and final accuracy on the validation set is VGGFace_pretrained. Therefore, its weights are initially best suited for the FER task. But in our experiment, this model had the worst performance in terms of convergence rate. That is why, for its training, other hyperparameters should be used, for example, to increase the learning rate or add more dense classification layers. The second model for face detection - OpenFace - shows a level of accuracy comparable to the standard solutions in transfer learning - ResNet-50. But at the same time it has fewer parameters, therefore it fits and predicts faster. OpenFace has 3,743,280 parameters and ResNet-50 has 23,587,712 parameters. MobileNet has the fewest parameters (3,228,864), but it`s performance is lower than in OpenFace. Also, OpenFace has the highest convergence rate and overfitting rate in comparison with other models. Thus, the face recognition based models proved to be at a fairly high level, in some cases even surpassing standard models like ResNet-50 and MobileNet. As can be seen from Figures 8-14, such emotions as happiness, anger, fear, surprise are best recognized, and disgust is worst of all recognized. This is because this class is the least represented in the dataset. In addition, some pictures are rather controversially labeled (for example, pictures 12-13). In these examples neural networks show low confidence in the image class. Based on the results of the experiment, the final learning algorithm was developed, which can be suggested to use in FER systems: Preprocessing: 1) apply the face detection model to the image. You can use one of the pre-trained models, or train your own; 2) apply various augmentations to images. This will balance the classes (if the original dataset is unbalanced) and also increase the stability of the model on new data. Training: 1) select a backbone model. If speed is more important within the task and there is enough data for training, we recommend choosing OpenFace. If the quality of recognition is more important and there are no enough resources for full model training, choose VGGFace; 2) freeze all layers of the neural network for training and add fully connected layers on top of them; 3) select hyper parameters and start the learning process with them. 7. Conclusions As a result of the research the aim and goals of the work were reached. We formulated an effective algorithm for neural network identification and usage within the framework of the FER task; determined which architecture of neural networks was better to use as a backbone for FER tasks in different situations; compared the effectiveness of face recognition based backbones with standard solutions for transfer learning. We found one of the most popular datasets on FER task – FER-2013. While analyzing its structure we found out that it was quite unbalanced. On the one hand it’s a drawback, because models will learn how to distinguish minor class worse. But on the other hand it will show how models will work with real-world datasets that are often unbalanced. Then we defined key metrics for analysis of networks performance during learning. Proposed metrics showed the efficiency of transfer learning for each architecture and determined what pre trained weights are most suitable for FER task and lead to faster convergence and less overfitting speed. As part of this work, we organized an experiment and conducted a comparative analysis of the quality of the most popular neural network architectures for transfer learning (ResNet-50, MobileNet) with networks for face recognition (OpenFace, VGG-Face) within the FER task using various metrics. The obtained results show only general performance of the networks because they were all trained under the same conditions, and the best set of hyperparameters was not selected. Based on the analysis of the experimental results, we recommend using the algorithm proposed in this article with a pretrained VGGFace. Also, under the condition of limited resources and the use of regularization methods, we recommend OpenFace as an alternative. But we also recommend specifically setting up the classifier for each specific task separately, because this will give a gain in quality. For a deeper analysis of the effectiveness of neural networks, it is necessary to perform a deeper study, which is not the purpose of this work. It includes testing a larger class of architectures on a larger number of data sets, using various types of classifiers for embedding (including those not based on neural networks). 8. References [1] J. Guo et al., "Dominant and Complementary Emotion Recognition from Still Images of Faces," in IEEE Access, vol. 6, pp. 26391-26403, 2018, doi: 10.1109/ACCESS.2018.2831927. [2] H. Zhang and M. Xu, "Weakly Supervised Emotion Intensity Prediction for Recognition of Emotions in Images," in IEEE Transactions on Multimedia, vol. 23, pp. 2033-2044, 2021, doi: 10.1109/TMM.2020.3007352. [3] J. Li, S. Qiu, Y. -Y. Shen, C. -L. Liu and H. He, "Multisource Transfer Learning for Cross-Subject EEG Emotion Recognition," in IEEE Transactions on Cybernetics, vol. 50, no. 7, pp. 3281-3293, July 2020, doi: 10.1109/TCYB.2019.2904052. [4] K. Smelyakov, A. Datsenko, V. Skrypka and A. Akhundov, "The Efficiency of Images Reduction Algorithms with Small-Sized and Linear Details," 2019 IEEE International Scientific-Practical Conference Problems of Infocommunications, Science and Technology (PIC S&T), 2019, pp. 745- 750, doi: 10.1109/PICST47496.2019.9061250. [5] L. Li, X. Mu, S. Li and H. Peng, "A Review of Face Recognition Technology," in IEEE Access, vol. 8, pp. 139110-139120, 2020, doi: 10.1109/ACCESS.2020.3011028. [6] J. Zhao, S. Yan and J. Feng, "Towards Age-Invariant Face Recognition," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 1, pp. 474-487, 1 Jan. 2022, doi: 10.1109/TPAMI.2020.3011426. [7] N. -C. Ristea, L. C. Duţu and A. Radoi, "Emotion Recognition System from Speech and Visual Information based on Convolutional Neural Networks," 2019 International Conference on Speech Technology and Human-Computer Dialogue (SpeD), 2019, pp. 1-6, doi: 10.1109/SPED.2019.8906538. [8] P. Partila, J. Tovarek, M. Voznak, J. Rozhon, L. Sevcik and R. Baran, "Multi-Classifier Speech Emotion Recognition System," 2018 26th Telecommunications Forum (TELFOR), 2018, pp. 1-4, doi: 10.1109/TELFOR.2018.8612050. [9] Tokariev V., Tkachov V., Ilina I., Partyka S. Implementation of combined method in constructing a trajectory for structure reconfiguration of a computer system with reconstructible structure and programmable logic // Selected Papers of the XIX International Scientific and Practical Conference "Information Technologies and Security", (ITS 2019), CEUR Workshop Processing, 28 Nov, 2019, pp. 71-81. [10] Facial Expression Rec. URL: https://paperswithcode.com/task/facial-expression-recognition. [11] K. Smelyakov, M. Shupyliuk, V. Martovytskyi, D. Tovchyrechko and O. Ponomarenko, "Efficiency of image convolution," 2019 IEEE 8th International Conference on Advanced Optoelectronics and Lasers (CAOL), 2019, pp. 578-583, doi: 10.1109/CAOL46282.2019.9019450. [12] D. C. Nguyen et al., "Enabling AI in Future Wireless Networks: A Data Life Cycle Perspective," in IEEE Communications Surveys & Tutorials, vol. 23, no. 1, pp. 553-595, Firstquarter 2021, doi: 10.1109/COMST.2020.3024783. [13] S. Chaterji et al., "Lattice: A Vision for Machine Learning, Data Engineering, and Policy Considerations for Digital Agriculture at Scale," in IEEE Open Journal of the Computer Society, vol. 2, pp. 227-240, 2021, doi: 10.1109/OJCS.2021.3085846. [14] G. Cao, Y. Ma, X. Meng, Y. Gao and M. Meng, "Emotion Recognition Based On CNN," 2019 Chinese Control Conference (CCC), 2019, pp. 8627-8630, doi: 10.23919/ChiCC.2019.8866540. [15] Y. Tian, "Artificial Intelligence Image Recognition Method Based on Convolutional Neural Network Algorithm," in IEEE Access, vol. 8, pp. 125731-125744, 2020, doi: 10.1109/ACCESS.2020.3006097. [16] K. Smelyakov, A. Chupryna, M. Hvozdiev and D. Sandrkin, "Gradational Correction Models Efficiency Analysis of Low-Light Digital Image," 2019 Open Conference of Electrical, Electronic and Information Sciences (eStream), 2019, pp. 1-6, doi: 10.1109/eStream.2019.8732174. [17] A. I. Wright, C. M. Dunn, M. Hale, G. G. A. Hutchins and D. E. Treanor, "The Effect of Quality Control on Accuracy of Digital Pathology Image Analysis," in IEEE Journal of Biomedical and Health Informatics, vol. 25, no. 2, pp. 307-314, Feb. 2021, doi: 10.1109/JBHI.2020.3046094. [18] P. Yin, R. Yuan, Y. Cheng and Q. Wu, "Deep Guidance Network for Biomedical Image Segmentation," in IEEE Access, vol. 8, pp. 116106-116116, 2020, doi: 10.1109/ACCESS.2020.3002835. [19] G. Wang et al., "DeepIGeoS: A Deep Interactive Geodesic Framework for Medical Image Segmentation," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 7, pp. 1559-1572, 1 July 2019, doi: 10.1109/TPAMI.2018.2840695. [20] C. Nunes and F. Pádua, "A Convolutional Neural Network for Learning Local Feature Descriptors on Multispectral Images," in IEEE Latin America Transactions, vol. 20, no. 2, pp. 215-222, Feb. 2022, doi: 10.1109/TLA.2022.9661460. [21] N. Tian, Y. Liu, W. Wang and D. Meng, "Automatic CNN Compression Based on Hyper- parameter Learning," 2021 International Joint Conference on Neural Networks (IJCNN), 2021, pp. 1-8, doi: 10.1109/IJCNN52387.2021.9533329. [22] L. Liao, Y. Zhao, S. Wei, Y. Wei and J. Wang, "Parameter Distribution Balanced CNNs," in IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 11, pp. 4600-4609, Nov. 2020, doi: 10.1109/TNNLS.2019.2956390. [23] R. Gonzales-Martínez, J. Machacuay, P. Rotta and C. Chinguel, "Hyperparameters Tuning of Faster R-CNN Deep Learning Transfer for Persistent Object Detection in Radar Images," in IEEE Latin America Transactions, vol. 20, no. 4, pp. 677-685, April 2022, doi: 10.1109/TLA.2022.9675474. [24] X. Liu, W. Yu, F. Liang, D. Griffith and N. Golmie, "Toward Deep Transfer Learning in Industrial Internet of Things," in IEEE Internet of Things Journal, vol. 8, no. 15, pp. 12163-12175, 1 Aug.1, 2021, doi: 10.1109/JIOT.2021.3062482. [25] S. Hussein, P. Kandel, C. W. Bolan, M. B. Wallace and U. Bagci, "Lung and Pancreatic Tumor Characterization in the Deep Learning Era: Novel Supervised and Unsupervised Learning Approaches," in IEEE Transactions on Medical Imaging, vol. 38, no. 8, pp. 1777-1787, Aug. 2019, doi: 10.1109/TMI.2019.2894349. [26] Deep Face Recognition: A Survey. URL: https://arxiv.org/pdf/1804.06655.pdf?source=post_page. [27] Deepfase. URL: https://github.com/serengil/deepface. [28] Y. Lu, Q. Mao and J. Liu, "A Deep Transfer Learning Model for Packaged Integrated Circuit Failure Detection by Terahertz Imaging," in IEEE Access, vol. 9, pp. 138608-138617, 2021, doi: 10.1109/ACCESS.2021.3118687. [29] O. Lemeshko, O. Yeremenko and A. M. Hailan, "Two-level method of fast ReRouting in software- defined networks," 2017 4th International Scientific-Practical Conference Problems of Infocommunications. Science and Technology (PIC S&T), 2017, pp. 376-379, doi: 10.1109/INFOCOMMST.2017.8246420. [30] Shubin, I., Kyrychenko, I., Goncharov, P., Snisar, S., "Formal representation of knowledge for infocommunication computerized training systems," 2017 IEEE 4th International Scientific- Practical Conference Problems of Infocommunications, Science and Technology (PIC S&T), 2017, pp. 287–291, doi: 10.1109/INFOCOMMST.2017.8246399. [31] Learn facial expressions from an image. URL: https://www.kaggle.com/msambare/fer2013. [32] VGG-Face network architecture. URL: https://www.researchgate.net/figure/VGG-Face-network- architecture_fig2_319284653. [33] OpenFace architecture. URL: https://www.cs.cmu.edu/~satya/docdir/CMU-CS-16-118.pdf. [34] MobileNet-50 architecture. URL: https://arxiv.org/pdf/1704.04861.pdf. [35] OpenFace. URL: A general-purpose face recognition library with mobile applications: http://reports-archive.adm.cs.cmu.edu/anon/2016/CMU-CS-16-118.pdf . [36] VGGF.URL:https://drive.google.com/file/d/1CPSeum3HpopfomUEK1gybeuIVoeJT_Eo/view. [37] Openface.URL:https://drive.google.com/file/d/1LSe1YCV1x-BfNnfb7DFZTNpv_Q9jITxn/view. [38] ResNet and ResNetv2. URL: https://keras.io/api/applications/resnet/#resnet50-function. [39] Keras. URL: https://keras.io/api/applications/mobilenet. [40] All images. URL: https://docs.google.com/document/d/1Z_S_FpRkv4Xf2cRAqHxo23BUv7aYqt MZ59aJrpvYf-M/edit?usp=sharing.