Mood evaluation by Remote-PPG and facial expression recognition Andrea Manni 1, Andrea Caroppo 1, Pietro Siciliano1 and Alessandro Leone 1 1 National Research Council of Italy, Institute for Microelectronics and Microsystems, Via per Monteroni c/o Campus Universitario Palazzina A3, Lecce, Italy Abstract Psychological health monitoring plays an important role in mood evaluation, especially of ageing subjects within the home environment. For this purpose, the development of innovative and easy-to use platforms based on the use of contact or contactless smart sensor is spreading widely. This paper presents the design and the implementation of a novel framework able to evaluate the mood combining vital signs and facial expressions. For this purpose, a low-cost and commercial vision sensor is used to allow a wider diffusion of the proposed solution and with the aim of increasing the acceptability of the proposed solution. Specifically, a heart rate estimation algorithm and a facial expression recognition module are combined to evaluate the end user’s mood. This result has been achieved through use of deep learning and transfer learning algorithms that work in real time also on embedded hardware platform not equipped with GPUs, consequently increasing its usability. The first added value of the proposed framework consists in the possibility of detecting facial expressions “in the wild” independently from the selected vision sensor and from face orientation. Another important added value lies in the implementation of rule-based expert system which combines data acquired from the same smart sensor but whose operation is also maintained using information from heterogeneous sensors that provide the same type of discrete input values. Due to COVID-19 restrictions, the overall system is currently being tested first in a controlled environment and then in a real environment to achieve the final goal. The findings of the preliminary experiments show promising results for heart rate and facial expression monitoring with a low average error expressed in terms of Root Mean Square Error for HR estimation and high accuracy regarding facial expression recognition. Keywords 1 Mood Evaluation, Contactless Sensors, Ageing Subjects, Deep Learning, Transfer Learning, Heart Rate Monitoring, Facial Expression Recognition. 1. Introduction Mood generally reflects a person’s mental state but can also have a significant relationship with the physical health [1]. Consequently, his assessment could have a very important impact on daily life and work. For example, it is well known that a negative mood is a key factor that influences human health and different studies demonstrated that a negative mood over a long period of time can contribute to various health problems such as depression or heart disease [2]. Generally, there are two ways to evaluate the mood. The first one is based on the estimation of emotional behavior of the person/patient through the analysis of facial expressions patterns. Moreover, the mood can be evaluated analyzing the physiological signals of the observed subject, such as heart rate (HR), HR variability, electrocardiogram (ECG), and electroencephalogram (EEG). Italian Workshop on Artificial Intelligence for an Ageing Society (AIxAS 2021), November 29th, 2021 EMAIL: andrea.manni@le.imm.cnr.it (A. 1); andrea.caroppo@cnr.it (A. 2); pietro.siciliano@le.imm.cnr.it (A. 3); pietro.siciliano@le.imm.cnr.it (A. 3) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Wor Pr ks hop oceedi ngs ht I tp: // ceur - SSN1613- ws .or 0073 g CEUR Workshop Proceedings (CEUR-WS.org) The evaluation of the mood turns out to be an important information especially for monitoring the health status of ageing subjects and/or frailty subjects. In this context, an enormous advantage is obtained by implementing HW/SW platforms that are implemented through devices (possibly commercial) such as to increase the degree of acceptability of the proposed solution. Various experiments have shown that wearable devices are poorly accepted by elderly subjects for monitoring, for example, vital signs. Furthermore, another disadvantage arises from the fact that such devices can be unworn due to an oversight, compromising any long-term analysis. Consequently, many scientific studies have turned towards the extraction of features for the evaluation of the mood using non-contact, minimally invasive sensors, increasing the degree of acceptability of the proposed solution. From the analysis of the most recent scientific literature in the sector it is evident how facial expressions are the most widely employed modality for the evaluation of mood. For example, in [3] a stationary wavelet transform is used to extract features for facial expression recognition and the selected features are then passed into a feed-forward neural network that is trained through a back-propagation algorithm. Moreover, in [4] a hybrid feature descriptor-based method is proposed to recognize human emotions from their facial expressions. Previous research showed that even HR is a good indicator for the evaluation of mood since it was demonstrated that HR fluctuates with mood changes. In [5] an experiment showed that physiological signals have unique responses to different emotions. For example, HR increased significantly when people were angry or fearful, but decreased substantially during disgust. In [6] it is demonstrated that HR during a positive mood was lower than during a neutral mood. The authors of [7] in their study showed that the effects of relaxation and fear on HR were significantly different, and the average HR during happiness was lower than in a sad state. As described above, both facial expressions and HR help in mood assessment. In recent years, with the more advanced process of fusing information from different sources, it has become possible to merge the features of reference emotional states. To automatically recognize emotions, many works have proposed fusion with audiovisual information, e.g., combining speech with facial expressions. In [8] a database (emoF-BVP) is presented consisting of various audio and video recordings of actors expressing various intensities of different emotional expressions. Then, four deep belief network (DBN) models are presented which allow the generation of robust multimodal features for emotion classification in an unsupervised manner. Some studies have investigated the combination of EEG and physiological signals. For instance, in [9] the database “Dataset for Emotion Analysis using Physiological signals” is used for the classification of emotions. The aim is to determine which of the physiological and EEG signals are most relevant for emotion recognition. In this paper, an algorithmic pipeline that fuse HR values and facial expressions for the evaluation of mood is presented. The mood is evaluated by combining HR and facial expressions using a rule- based expert system. This system automatically detects the mood of the observed subject integrating a low-cost and commercial camera, a face detection algorithmic module based on Deep Learning (DL) and a Facial Expression Recognition (FER) module based on the concept of Transfer Learning (TL) algorithms. Moreover, the pipeline integrates a contactless HR detection module. A first important added value of the proposed pipeline is to be found in its running in real-time on a non-GPU embedded hardware platform. Another important advantage is that the algorithmic pipeline is independent from the image capture device and from the face orientation of the observed subject, achieving more accurate prediction of HR and facial expression of the end-user in a typical Ambient Assisted Living (AAL) context. The remainder of this paper is organized as follows. Section 2 explains our proposed algorithmic pipeline and provides an overview of the methodology by detailing the algorithmic step implemented for HR, facial expression estimation and mood evaluation. The results obtained are reported in Section 3. Finally, Section 4 shows our conclusions and discussions on some ideas for future work. 2. Method An overview of the proposed framework via a block diagram representation is depicted in Figure 1. The algorithmic pipeline has as input an image acquired by a commercial vision sensor in the RGB color space. It consists of four main blocks: 1) a pre-processing step integrating a face-detection module and a series of algorithmic steps useful to format the data for the following steps, 2) a HR estimation module based on a specific Region of Interest (ROI) extraction, filtering, detrending and Fast Fourier Transform (FFT) of the signals obtained, 3) a FER module based on a pre-trained deep-learning model, 4) a final software module for the evaluation of the end-user mood. In the last main block, the estimated value of HR and the corresponding facial expression are sent as input to an expert system that returns the patient’s mood using established rules. Each component of the pipeline is detailed in the following. Figure 1: Block diagram of the proposed algorithmic pipeline designed and implemented for mood evaluation. 2.1. Pre-Processing The first main block of the pipeline implements the pre-processing algorithms on the initially acquired image. One of the most important algorithmic step involves the detection of the facial region in the streaming video captured by the commercial vision sensor. It is important to underline that this block is in common within the pipeline both for the estimation of the heartbeat and for the recognition of the facial expression of the observed subject. To obtain an accurate real-time face detection, the latest version of OpenCV library is used, where a deep neural network (DNN) architecture is included. This module allows face detection “in the wild” in real-time on a PC without GPU. Moreover, this approach allows face detection even in less-than- ideal lighting conditions. The module uses a reduced ResNet-10 model [10] and its output is the bounding-box coordinates of the facial region together with a confidence index. After face detection, to crop only the facial region, a software procedure is implemented extracting only the coordinates of the upper left corner, the height, and the width of the face, thus removing all the information not related to the face. In addition, both a down-sampling step and an increasing resolution step are added, depending on the resolution of the facial image. Specifically, a simple linear interpolation was used for down-sampling, while a nearest-neighbor interpolation was implemented to increase the size of the facial images. At this point, a “normalization” step is added to stabilize the contrast and brightness of the image. Here, normalization was performed through the application of "contrast-limited adaptive histogram equalization" (CLAHE) [11]. 2.2. Heart Rate Estimation The gold standard techniques for measuring HR such as ECG and photoplethysmography (PPG) require skin contact and can inevitably cause discomfort, especially in the current pandemic period. Recently, remote photoplethysmography (rPPG) has obtained an increasing attention because it allows measurement of HR in a contactless way. In this section, heart rate estimation block is explained. After the application of pre-processing steps, given the face identified by the landmarks, regions of interest (ROIs) corresponding to a region with a strong blood modulation transition are identified. As described in [12] and other HR estimation research, the forehead and the cheeks resulted to be the most suitable regions for the purpose since the strength of PPG signals differs between different regions of the face, with the cheek and forehead regions tending to produce the strongest PPG signals [13]. Due to forehead occlusion depending on the hair style, in the present version of the framework the validation of the HR measures was carried out considering only the cheeks. To identify the ROIs, a shape predictor (with 68 landmarks) that includes the face is used (Figure 2a). Then, to identify cheeks, the areas delimited by landmarks 1-29-34-4 (right cheek) and 17-29-34-14 (left cheek) are considered (Figure 2b). Figure 2: ROIs identification: (a) shape predictor with 68 landmarks and (b) left and right cheeks identification. The next step is to filter the obtained raw RGB signals to remove frequencies that are not realistic for a human heart. Accordingly, a band-pass filter with ideal behavior is applied to remove high- and low-frequency noise. The filter removes components occurring outside the frequency band [0.65, 3] Hz, which has been commonly used in the literature and corresponds to the HR between 40 and 180 bpm. Identification of cyclic components in signals is achieved using the power spectrum, but sufficient signal quality and length are required. However, all parameters - including the periodic component of the signal - vary and the HR signal is limited to a specific time interval. Furthermore, as the noise spectrum is similar to the spectrum of the signal to be recovered, filtering to increase the signal-to-noise ratio (S/N) is critical. In addition, the three raw RGB signals obtained can be decomposed into basic source signals using Blind Source Separation (BSS) algorithms. Since raw RGB signals contain HR information in mixed components, in order to extract the source signals from these mixed signals, independent component analysis (ICA) is used. ICA is a BSS since it calculates a linear sum W of the available data sets y (raw RGB color channels) with weights w in order to maximize one independent source at a time. The data sources x must not have a Gaussian distribution and there may only be linear mixes M of these unknown sources. Thus, the mixed data sets and the original independent components can be expressed by Equations 1 and 2: 𝑦(𝑡) = 𝑴𝑥(𝑡) (1) 𝑥(𝑡) = 𝑾𝑦(𝑡) (2) where M is the unknown mixing matrix, and its inverse W is the de-mixing matrix found by the ICA. Several ICA algorithms are available. In this proposed system, FastICA [14] method is used to analyze the RGB signals and to reveal the original source signals removing noise artifacts. A fourth order moment (Kurtosis) is used to identify the independent components (three components in the proposed approach). Although there is no ordering of the ICA components, the second component typically contained a strong plethysmography signal and consequently, for the sake of simplicity and automation, is selected here as the desired source signal. Finally, Fast Fourier Transform (FFT) is applied on the selected component to obtain the power spectrum inside the frequency band [0.65, 3] Hz matching to [40, 180] bpm. The peak of the power spectrum in the given range represents the pulse frequency (Figure 3). Then, to improve the performance of the system, HR values are collected in a time window, fixed in the actual version of the module in 30 seconds of length. Then, in order to reject possible artifacts, the outliers are removed, and a single value is obtained by calculating the median value of the residual components in the time window. Figure 3: Feature extraction for contactless HR measurements. Three pre-processed RGB signals are extracted from ROI and subsequently filtered. FastICA is applied on the normalized, de-trended and smoothed RGB signals to recover three independent source signals. Finally, FFT is applied to the second component and the highest power of the spectrum is selected as the estimated HR. 2.3. Facial Expression Recognition The FER theory consists of extracting a small number of basic emotions such as happiness, sadness, neutrality, surprise, anger, disgust, and fear. In addition to these expressions, neutral facial expression is also estimated in some studies. The performance of a FER method is based on the use of the most discriminating features. There are two main categories of features methods in this research area: hand-crafted features extraction and automatic feature extraction. The first ones are frequently used as geometric or appearance descriptors (such as Scale-Invariant Feature Transform, Local Binary Pattern, …). The second category is more recent and focuses on features generated automatically by a DL architecture. From the state-of-the-art algorithms, CNNs [15] have worked very well for the FER problem in unconstrained scenarios. CNNs allow to process data having a grid pattern, such as images, by learning spatial hierarchies of features (from low to high-level patterns) automatically. The most important problem regarding the use of CNN for FER is the availability of facial expression datasets with a very high number of labelled images, since training DL architectures with a limited number of images can lead to the problem of overfitting. To address this problem, one of the solutions used is to evaluate the concept of TL [16]. TL is based on training a specific network on a small dataset. The network is first subjected to a pre-training phase on an extremely large dataset and then applied to the given task of interest. In this work, TL is used for the FER task. The famous architecture VGG16 [17] is pre-trained using the Facial Emotion Recognition 2013 (FER-2013) dataset. FER-2013 was introduced in the ICML 2013 workshop’s facial expression recognition challenge. The dataset is quite challenging, since faces greatly vary in age, pose and occlusion conditions [18]. Generally, VGG16 architecture is structured into two main and well separated sections: feature extraction and classification. In our work, considering that the feature extraction section is used for extracting new dataset features, the classification section originally structured with 3-FC layers (named FC6, FC7 and FC8) is replaced with a novel FC layer useful for tuning the desired output, i.e., the number of facial expression classes (seven classes in our case, six different facial expressions plus the neutral expression). The steps to design the proposed new architecture are shown in Figure 4. Figure 4: VGG-16 Deep Convolutional Neural Network (DCNN) architecture trained on ImageNet database. In the top row the network has 16 layers and can classify images into 1000 object categories; in the bottom row transfer-learning application for FER replacing classification layers of original VGG16 architecture is shown. From the above schematic representation, it can be seen that the feature extraction part is the same as the origin of the VGG16 architecture, but a new FC layer is added to adjust the number of outputs with the number of new classes in the dataset, i.e., 7. In addition, there is a flatten layer between the feature extraction and the new FC layer whose function is to change the size of the input tensor of the previous layer and ensure that the size of the output is a 1 × 1 tensor with a length corresponding to the input tensor volume. Then, VGG16 is trained with the stochastic gradient descent algorithm [19], estimating the error gradient for the current state of the model using examples from the training dataset and then updating the model weights using the backpropagation of error algorithm. The classifier layer of the transfer- learned model can classify seven classes of expressions: “Anger”, “Disgust”, “Fear”, “Happy”, “Sad”, “Surprise”, and “Neutral”. In the present work, the deep CNN features were evaluated on three different machine learning classifiers that have shown promising results in previous FER studies, such as Support Vector Machine (SVM), Logistic Regression (LR), and k-nearest neighbors (kNN). SVM separates categorical data in a high dimension space finding a hyperplane with the maximum possible margin between the same hyperplane and the cases [20]. LR is a predictive analysis tool and describes the relationship between one dependent binary variable and one/multiple independent variables. In LR, the dependent variable is binary, in contrast with linear regression having continuous dependent variable. In this work, for LR we set only the parameter C (that is, the inverse of regularization strength λ) to 0.01 [21]. At last, kNN calculates, in a non-parametric way, the distances between the nearest k training cases and an unclassified case, and classifies the latter to the highest of the nearest k training cases. Different distance metrics can be applied in an experimental stage, with the most widely used that are the Euclidean Distance and the Manhattan distance. Here, k value was set to 2 and the Manhattan distance was used as a distance function [22]. This module returns a facial expression label with a sampling time of one second. Consequently, to make the whole system suitable for the analysis of video sequences, a decision strategy based on the temporal consistency of the FER results is introduced. Facial expression is taken by analyzing a time window of the same size of the HR estimation and checking which facial expression associated with a confidence index greater than 0.8 is most prevalent in the window. 2.4. Mood Evaluation In this module, a decision strategy based on production rules, usually used as a simple expert system in artificial intelligence, is implemented. Specifically, HR values and detected facial expression are combined for mood evaluation. A production rule is composed of an IF part and a THEN part, turning out to be: 𝑃! : 𝐼𝐹 X 𝑇𝐻𝐸𝑁 Y, (3) where Pi represents the rule i, X is the antecedent of the rule i, and Y is the consequent. Here, X is composed of (x1, x2) where x1 represents the HR value and x2 represents the detected facial expression. Rules are activated when their conditions are satisfied. Table 1 reports a sample of used rules. For instance, if the HR value is less than 70 bpm and the facial expression is “happy”, then the output of mood evaluation module is: VERY POSITIVE (Rule 1 in Table 1). Table 1 Extract of the rules. Antecedent Consequent Rule No. HR FER 1 <= 70 Happy Very positive 2 > 70 and <= 90 Happy Positive 3 < 90 Sad Neutral … … … … 16 >= 90 Anger Very negative 3. Results and discussion Currently, due to COVID-19 restrictions, only the HR estimation module and FER pipeline blocks have been tested. The validation was conducted in the laboratory of the Institute of Microelectronics and Microsystems (IMM) in Lecce, Italy. The experimental setup consisted of an embedded PC with Intel core i7 and 8GB of RAM, using Python (3.7) language with OpenCV for algorithm development. The Intel RealSenseTM D435 camera was used for image streaming acquisition. A total of 15 participants (nine males and six females) with ages ranging from 35 to 69 years were included in this study after giving their voluntary consent. For HR estimation, the root mean squared error (RMSE) is proposed for evaluating the accuracy of HR measurements, considering a commercial pulse oximeter as ground truth. The experiments were run by varying head poses, lighting conditions, and distance from the vision sensor (from 0.5m to 2m). For the sake of brevity, Table reports the RMSE obtained at varying of head poses (ranging from -40° to +40° for yaw angle and -20° and +20° for pitch angle) and lighting conditions (in the range 30-100 lumens) at a fixed distance from the vision sensor (0.5 mt.). Table 2 RMSE at varying of light conditions (30 and 100 Lux), yaw angle (between -40° and +40°) and pitch angle (between -20° and +20°). Lx 30 100 Yaw Angle -40 -20 0 +20 +40 -40 -20 0 +20 +40 RMSE mean 5.72 4.49 2.41 4.36 5.80 4.86 2.45 1.97 2.78 4.35 Pitch Angle -20 0 -20 -20 0 -20 RMSE mean / 2.56 1.87 2.43 / / 2.35 1.57 2.25 / These results show that the implemented approach allows effective HR classification even in the presence of significant changes in head pose and lighting conditions, with the RMSE increasing slightly as lighting intensity decreases. For FER module, each user involved in the trial simulated in sequence the classical six facial expressions plus the neutral expression. The performance of FER module was evaluated using accuracy as metric. Accuracy (Acc) is the overall classification in term of True Positive (TP) and True Negative (TN) of the proposed method. In Figure 5, Figure 6, Figure 7 the confusion matrices of the average accuracies obtained using the considered classifiers (i.e. SVM, LR and kNN) are reported. The accuracies were calculated by averaging the accuracies obtained by varying lighting conditions and face orientation. Figure 5: Confusion matrices for seven classes of facial expressions using SVM as classifier and at varying (a) yaw angles and (b) pitch angle. Figure 6: Confusion matrices for seven classes of facial expressions using LR as classifier and at varying (a) yaw angles and (b) pitch angle. Figure 7: Confusion matrices for seven classes of facial expressions using kNN as classifier and at varying (a) yaw angles and (b) pitch angle. From the confusion matrices, it can be seen that the SVM classifier achieves the best performance in the efficient recognition of the considered facial expressions varying light conditions the pitch and yaw angle. More specifically, SVM obtains an improvement in the classification of about 5.8% compared to the LR classifier and of about 4.6% compared to the kNN classifier with lux = 100, while the performance in terms of accuracy with lux = 500 is less evident (+2.3% compared to LR and +2% compared to kNN). 4. Conclusion In this work, a novel algorithmic pipeline for mood evaluation starting from HR estimation and FER was proposed. The hardware platform returns the HR and facial expression of the subject in real time using the same input information, i.e., the facial region. Moreover, the platform implements a software module capable of evaluating the "mood" through temporal combinations of the information previously extracted. An added value lies in the fact that to capture the substantial visual features of HR and facial expressions from the face, a series of algorithmic steps were designed and implemented considering various head poses, distances from the sensor, and variations in lighting conditions. Thanks to these algorithmic steps, the entire pipeline allows greater usability of the proposed solution, integrating perfectly into a AAL environment where generally fragile or ageing subjects are present. From the performance point of view, the algorithmic pipeline achieved satisfactory results in terms of RMSE for HR estimation and accuracy for FER in the wild. The next development of the proposed work will be the test of the introduced production rules with the purpose to distinguish at least positive, negative, and neutral mood of the observed subject. A further development of this work will involve the extraction of the breathing rate from the same facial region, and the combination within the rules of this information, to provide an output mood that is as close to reality as possible. 5. Acknowledgements This work has been carried out within the project PON Si-Robotics funded by MUR-Italian Ministry for University and Research. 6. References [1] K. Gouizi, C. Maaoui, F. B. Reguig, Negative emotion detection using EMG signal, in: Proceedings of the 2014 International Conference on Control, Decision and Information Technologies (CoDIT), pp. 690-695, Metz, France (3-5 November 2014). [2] L. D. Kubzansky and I. Kawachi. “Going to the heart of the matter: do negative emotions cause coronary heart disease?”, J Psychosom Res., A, 48 (4-5) (2000). [3] H. Qayyum, M. Majid, S. M. Anwar and B. Khan. “Facial Expression Recognition Using Stationary Wavelet Transform Features”, Mathematical Problems in Engineering, vol. 2017 (2017). [4] T. Kalsum, S. M. Anwar, M. Majid, B. Khan and S. M. Ali. “Emotion recognition from facial expressions using hybrid feature descriptors”, IET Image Process., 12, pp. 1004-1012 (2018). [5] P. Ekman. “An argument for basic emotions”, Cognition and Emotion, 6(3-4), pp. 169-200 (1992). [6] A. Britton, M. Shipley, M. Malik, K. Hnatkova, H. Hemingway and M. Marmot. “Changes in Heart Rate and Heart Rate Variability Over Time in Middle-Aged Men and Women in the General Population (from the Whitehall II Cohort Study)”, The American Journal of Cardiology, 100(3), pp. 524-527 (2007). [7] M. T. Valderas, J. Bolea, P. Laguna, M. Vallverdú and R. Bailón, Human emotion recognition using heart rate variability analysis with spectral bands based on respiration, in: Proceedings of the 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 6134-6137, Milan, Italy (25-29 August 2015). [8] H. Ranganathan, S. Chakraborty and S. Panchanathan, Multimodal emotion recognition using deep learning architectures, in: Proceedings of the 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1-9, Lake Placid, NY, USA (7-10 March 2016). [9] C. A. Torres-Valencia, H. F. García-Arias, M. A. Álvarez López and A. A. Orozco-Gutiérrez, Comparative analysis of physiological signals and electroencephalogram (EEG) for multimodal emotion recognition using generative models, in: Proceedings of the 2014 XIX Symposium on Image, Signal Processing and Artificial Vision, pp. 1-5, Armenia, Colombia, (2014). [10] K. He, X. Zhang, S. Ren and J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778, 2016. [11] A. M. Reza. “Realization of the contrast limited adaptive histogram equalization (CLAHE) for real-time image enhancement”, Journal of VLSI signal processing systems for signal, image and video technology, 38(1), pp. 35-44 (2004). [12] A. Challoner. “Photoelectric plethysmography for estimating cutaneous blood flow”, Non- Invasive Physiol. Meas. 1979, 1, pp. 125-151 [13] M. Kumar, A. Veeraraghavan and A. Sabharwal. “DistancePPG: Robust non-contact vital signs monitoring using a camera”, Biomedical optics express, 6(5), 1565-1588. (2015) [14] A. Hyvarinen. “Fast and robust fixed-point algorithms for independent component analysis”, IEEE transactions on Neural Networks, 10(3), 626-634 (1999). [15] D. H. Hubel and T. N. Wiesel. “Receptive fields and functional architecture of monkey striate cortex”, J. Physiol. 1968, 195, pp. 215-243. [16] K. Weiss, T. M. Khoshgoftaar and D. Wang. “A survey of transfer learning”, J. Big Data 2016, 3, pp. 1-40. [17] K. Simonyan and A. Zisserman. “Very deep convolutional networks for large-scale image recognition”, arXiv preprint arXiv:1409.1556 (2014). [18] I. J. Goodfellow, D. Erhan, P. L. Carrier, A. Courville, M. Mirza, B. Hamner and Y. Zhou, Challenges in representation learning: A report on three machine learning contests. in: International Conference on Neural Information Processing, Springer, Heidelberg, 2013, pp. 117– 124. [19] L. Bottou. ”Stochastic gradient descent tricks”, Neural networks: Tricks of the trade, Springer, Berlin, Heidelberg, 421-436. (2012) [20] J. A. Suykens and J. Vandewalle. “Least squares support vector machine classifiers”, Neural Process. Lett. 1999, 9, pp. 293-300. [21] D. W. Hosmer, S. Lemeshow Jr. and R. X. Sturdivant. “Applied Logistic Regression”, John Wiley & Sons: Hoboken, NJ, USA, 2013, Volume 398. [22] S. A. Dudani. “The distance-weighted k-nearest-neighbor rule”, IEEE Transactions on Systems, Man, and Cybernetics, IEEE: New York, NY, USA, 1976, pp. 325-327.