Intelligence System For Emotional Facial State Estimation During Inspection Control Viktor Sineglazov Roman Pantyeyev Ilya Boryndo Aviation computer integrated Aviation computer integrated Aviation computer integrated complexes department complexes department complexes department National Aviation University National Aviation University National Aviation University Kyiv, Ukraine Kyiv, Ukraine Kyiv, Ukraine romanpanteevmail@gmail.com ib.mistlemagic@gmail.com svm@nau.edu.ua Abstract—The problem of customs control intelligence applied in computer vision due to the classification of these security system creations is considered. The necessity of the features for potential danger prevention. Using the profiling passenger emotions face changes analysis during control in method it is necessary to use a coding system of facial airports is shown. The emotional changes of a human face movements (SKLiD) [4]. By using the SKLID, it is possible correspond to internal reaction of the person on the posed to create a facial model based on units of actions and a fixed control questions. As the solution of this problem it’s proposed period of time needed to act any emotion. Here, the units of to apply the convolution neural network at the stage of micro action are the movements performed by individual muscles or emotion identification and the indistinct qualifier – at the stage a group of muscles. of decision making for potential threats of passenger. As the indistinct qualifier it is offered to use the NEFCLASS neural The system also has a limited number of descriptors network. The example of practical approach for micro emotion (unitary movements performed by a group of muscles: recognition by means of convolution neural network is given. tightening the cheeks, stretching the eyelids, raising the wings of the nose, raising the upper lip, deepening the noselabial Keywords—combined convolution neural network, fold, raising the corners of the lips, dimpling the lips, NEFCLASS, micro expressions, facial state recognition. lowering the corners of the mouth, lowering the lower lip, pulling off the lips) [5]. Each manifestation of facial emotions I. INTRODUCTION of a person can be described by a set of descriptors. As the Nowadays, the real importance is given to increasing the apparent facial changes there also occur the micro emotions. aircraft safety conditions, in particular during the passenger They can be taken into account in more complicated control. Commonly, the number of people for each security recognition approaches. Table 1 describes the main facial officer is too high to deal with them in restricted period of changes relatively to the six standard types of emotions [6]. time. The employee of Aircraft Company is faced by a hard Motion units of the person can be divided into three task, to ask the number of special questions to understand the groups conditionally: emotional state of the passenger to successful admission of the flight. The main features that allow solving this problem • static – recognition using only the photo is possible; is emotional changes of the passenger during the control • dynamic – it is necessary to continuous frame conversation. The most successful application of this changing, key points initialization or obtaining the approach was done by El Al aircraft company (Israel). The average value of distances between motion units; challenge is to detect and recognize the facial micro changing. It’s very difficult task especially with a large • empty – actively participate in manifestation of number of passengers (about 20-30 people on one security emotions, however are not registered search officer) and demand considerable preparation. In this work as algorithms (dimples on cheeks). the solution of given problem offered to use an intellectual analysis system which consists of the two-level neural Now it is possible to review the following recognition networks of deep learning based on micro emotion methods of the human emotional state using a profiling recognition: at the first level the convolution neural network approach. The most efficient ones are: holistic methods, local [1], on the second – the qualifier constructed with use of methods, methods of calculation of forms of objects, methods indistinct neural network, for example NEFCLASS [2] is of calculation of dynamics of objects (Tab. 2) [7]. For the applied. human face it is possible to initialize up to 80 facial landmarks. Commonly, it is borders of eyes, mouth and II. REVIEW OF EXISTING SOLUTIONS eyebrows. Molar muscles are not important feature for human expression recognition and its analysis. Each of emotions has During the analysis of human emotional state recognition, the unique dynamics parameters. it’s necessary to consider “profiling” [3]. This is a set of psychological approaches for analyzing and predicting a persons behavior, based on particular features, appearance and verbal behavior. This technique can be successfully Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0) TABLE I. RELATIONS OF EMOTIONAL FACIAL FEATURES CHANGING identification of 12 key areas, such as the corners of the eye and the corners of the mouth. Pluses: recognizes anger, Emotion Eyebrow Mouth sadness, fear, surprise, disgust and happiness. The software is Surprise Rise Open not demanding on the computer. Disadvantages: unknown details of the implementation algorithm (no defects Fear Rise and wrinkled Open and stretch identified). Disgust Decrease Rise and ends will 3) Face Analysis System - developed by “MMER- decrease Systems” (Germany). System features: Face overlay a certain Anger Decrease and wrinkled Opens and ends will deformable mask that allows you to calculate the necessary decrease parameters in real time. Advantages: recognizes six basic Happiness Bends down Ends will rise emotions, determines gender, age, and ethnicity; identifies the person if the photo was previously uploaded to the database, Sadness End part will decrease Ends will decrease has an additional module. Disadvantages: incomplete coverage of the loaded data, since you can work with a webcam; inaccurate results for uploading data. The task of recognizing emotions is part of a TABLE II. METHODS FOR FACIAL EMITONAL STATE RECOGNITION OF comprehensive system for image analysis. For its proper HUMAN FACE operation, it is important to clearly identify the person’s face Methods Holistic methods Local methods in the image and properly extract it for further processing, relay on a number of dynamic parameters. Methods for Classificatory: Classificatory: shapes Artificial Neural network, Artificial Neural network, calculations Random Forest, Bayes Classificatory, Face detection algorithms can be divided into four Adaboost, Adaboost, categories [8]: Gabor filters, Geometric face models. • empirical method; Own vectors: PCA. • method of invariant signs; 2D face models: AAM, ASM EBGM. Local histograms: • recognition on the template implemented by the HoG, LBP. developer; Methods for Optical flow, 3D dynamic models • a method detection on external signs (the training dynamics Dynamic models. Statistical models: HMM, systems). calculations DBN. The main stages of algorithms of empirical approach are: Therefore, profiling is the relevant method for emotional • stay on the image of the person: eye, nose, mouth; identifying and analyzing of an individual features, which • detection: borders of the person, form, brightness, makes it possible to significantly increase the level of texture, color; security, for example, at airports. • combination of all found invariant signs and their These approaches were implemented in the following verification. software for processing video images of a human face subject to emotions [7]: Shortcoming is that this algorithm is very sensitive to degree of an inclination and turn of the head. 1) Face Reader - developed by “Noldus Information Technology” (Netherlands), requires the facial video for proper identification. System capabilities: recognition of III. MATHEMATICAL PROLEM STATEMENT FOR FACIAL emotions, the definition of ethnicity, the use of the Active STATE RECOGNITION Template method, the creation of an artificial face model. Mimic reactions of each person have a certain set of Advantages: recognition with an accuracy of 89%; you can standard manifestation parameters and are divided into two define emotions by frames or completely (video), full categories: geometric and behavioral. visualization (histograms, diagrams of emotions). Disadvantages: does not recognize children under five; To describe the quantitative and qualitative parameters of inaccurate definition of emotions in a person with glasses; the face (voluntary and involuntary) use a coding system of different skin color is perceived by the system in different facial movements. In this case, the quantitative parameter is ways; turned face is not detected. the intensity of movement from A to E. 2) Emotion Software and GladOrs application - developed The video data stream is a sequential set of frames. The by “Visual Recognition” (Netherlands). System capabilities: goal of recognition is to merge faces on images into disjoint the system creates a 3D model of the face with the classes. The task of face recognition is formulated as follows: it is required to build a recognition function (1) output that where m1 + m2 + … + mk = m, m = |M|. determines the image class w, presented by a vector of sign For a solution of this task in work the combined (𝑥1(𝑤),…,𝑥𝑛(𝑤)). convolution neural network which consists of convolution F(w) = ( 𝐹1(𝑤), 𝐹2(𝑤),…, 𝐹k(𝑤)) (1) neural network, the qualifier and scanning neural network is used (Fig. 2). Use of this network allows selecting emotional In this case a class is the one of six basic emotions of the reaction of each of above-mentioned features. Data of exits of person. this network are inputs of the indistinct qualifier which allows making on the basis of the analysis of reaction of separate (2) elements of the person decisions concerning threats which this passenger can represent. Search of a solution is carried out by using of artificial The combined convolution neural network solves such neural networks. problems as: The invariant is the property of some class (set) of 1) Recognition and description of features of the person. mathematical objects remaining invariable at conversions of a certain type [8]. This task demands definition of position of bodies (eyes, a nose, a mouth, etc.) on a face and also forms of these bodies The invariant moments represent characteristic signs should be defined. which can meet in each picture. Most often persons on video frames are exposed to the different deformations inherent to mimicry of the person. In such conditions it is necessary to tell about "pseudo-invariants" [9]. It is reasonable to convert the color images in a gray-scale Fig 1. Structural scheme of combined convolutional neural network style. After preprocessing and normalization the sample represents a matrix of pixels, each of which matters brightness in the range [0…1] (Fig. 1). IV. PROBLEM SOLUTION The solution of a problem of recognition of emotions belongs to a problem of classification, i.e. the neural network should carry the received data set to emotions, answering to Fig. 2. Structural scheme of single covolutional neural network the set of parameters. Let's consider the mathematical description of a problem of recognition: 2) Classification. Let the set of M of images of persons (emotion, for According to information on features of the person it is example, surprise) is given as {𝑤1,…,𝑤n}, each of which has a necessary to define what type of emotion is present at the vector of values of signs (mimic signs) [9]: image. Then it is necessary to define information on mood for (3) further development of the intelligent interface. Vector of signs are carried by experts to some classes: The combined network consists of convolution neural network, some qualifier and re-convolution neural network. The conceptual structure of such network presented on a (4) Figure 1. Such architecture allows not only to distinguish image elements, but also to notice on it recognition elements. Re-convolution neural network is specula reflection of convolution neural network on test selection it is possible to convolution neural network. determine a number of significant parameters [10]: The ability of the multilayer neural networks trained by 1. Number of convolution layers. method of gradient descent to creation of difficult multidimensional areas on the basis of a large number of the 2. Number of aggregation layers. training examples allows applying them as the qualifier to 3. Mutual placement of convolution layers and aggregation image identification. layers. Despite it, in traditional full-meshed neural network there 4. Convolution layers (for each layer separately): is a number of the shortcomings lowering efficiency of their work. First of all, it is the big size of images (the image is • Convolution core size (for each layer separately); understood as the graphical representation of a recognizable • Number of features maps (for each layer separately); image presented in the form of set of pixels), which can reach several hundred. For correct training in such data it is • Padding size (for each layer separately); required to increase number of the hidden neurons that leads to increase in number of parameters, and, as a result, lowers • Parameter of ending effect. training speed, demands the big training selection. But the 5. Aggregation layers (for each layer separately): biggest restriction of such networks is that they do not differ in invariance to different deformations, for example, to • Aggregation core size; transfer or insignificant distortion of an input signal. • Aggregation core function. The convolution neural network systems (Conv Nets or 6. Fully-connected layers (for each layer separately): CNNs) are the logical instrument receives an input parameters as image in the set of pixels view, finds some • Number of fully-connected layers; features on it and due to it sets the parameters (weighted coefficients) to wide data objects in the images and be able to • Sizes of each layer; highlight any special things among all over objects. Conv • Classificatory type: auto encoder. Nets requires the less clean processing power relatively to other processing algorithms. Unlike to standard filter methods 7. Existence of extracting operation for each layer: – that working as hard-engineered unit, the convolution neural extracting percent and random function. network can achieve it through the training processes. For optimization of structure and parameters of The structure of a Conv Nets is same as connectivity convolution neural network the genetic algorithm is used image of human brain biological neurons and was based on [10]. group of the “Visual Cortex”. Each one neuron responds to stimuli only in a specially-restricted area of the visual field known as the receptive area. A set of such areas overlap to cover the entire visual area. A Conv Nets are able to clearly capture the spatial and temporal dependencies in any input image data through the application based on relevant filters. CNN structure performs a great setting to the image dataset with help of reduction in the number of parameters involved and reusability of weights. Therefore, the neural network can be trained to understand the sophistication of the image better. On the Fig. 2 presented the conceptual structure of convolution neural network. Fig. 3. Facial expression recognition example obtained using complex CNN As the input commonly we will get the RNG image which means R – red color, G – greed and B – blue. Also, image can V. CONCLUSION be in different color types: gray-scale, HSV, RGB, CMYK, In this work the effective approach for emotional state etc. recognition of human face using digital images analysis is For the while, take into the mind which the computational proposed. It is developed the ways of application the power needed for process the image of 8K (7680×4320) sizes. combined convolution neural network for assigned task and The main purpose of Conv Nets is to compress it into much algorithms of digital image processing was applied. For the more lighter state without losing important parts (features). solution of the classification task it is used fuzzy neural It’s necessary to take care when we’re designing structure network classifier such as “NEFCLASS”, inputs of which are with low features learning possibilities and huge data sets outputs of CNN. Given approach has the acceptable (Fig. 3). recognition level and good enough accuracy. This system can On the basis of results of check of the combined be successfully applied to perform the security purposes in the airports and able to increase the security level. However the system has some limitations as the sensitivity to lightning level, dependence on head rotation degree. These limitations will be eliminated at continuation of this work in the following directions: gain of compensation of light difference using the complication of an algorithm of adaptive filtering; expansion the range of emotions. In figure 3 presented the result of combined convolution neural network for expression and analysis of human emotional state. REFERENCES [1] A. Krizhevsky, I. Sutskeyever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Network” 2012. (references) [2] D. Nauck, R. Cruse, “NEFCLASS – a neuro-fuzzy approach for classification of data”, Junuary 1995. [3] Y. M. Volinsky-Bosmanov “Profayling. Technology for preventing crime acts” in Magnetism, vol. III, G. T. Rado and H. Suhl, Eds. New York: Academic, 2015, pp. 220. [4] R. Li, S. Ma, “Impact of depression on response to comedy: A dynamic facial coding analysis” Journal of abnormal psychology 116 (4): 804–9, 2007. [5] B. A. Knyazev, Y. E. Gapanyuk “Recognition of aberrant behavior of the person” Journal of engineering, 2013, p 512. [6] P. Ekman and W. Friesen, “Facial Action Coding System: A Technique for the Measurement of Facial Movement”, consulting Psychologists Press, Palo Alto, 1978. [7] D. Stutz, “Introduction to Neural Networks. Seminar on Selected Topics in Human Language Technology and Pattern Recognition”, 2014. [8] D. A. Tatarenkov, “Analysis of face recognition methods on images”, 2015, p. 270. [9] G. Deco and D. Obradovic, “An Information-theoretic Approach to Neural Computing, Springer”, New-York, 1996. [10] G. K. Voronosky, S. N. Petrashev, S. A. Sergeev, “Genetic algorithms. Artificial intelligence systems and problems of virtual reality”, 1997.