Intelligence System For Emotional Facial State
             Estimation During Inspection Control
         Viktor Sineglazov                                      Roman Pantyeyev                                           Ilya Boryndo
    Aviation computer integrated                          Aviation computer integrated                           Aviation computer integrated
                                                             complexes department                                    complexes department
       complexes department
    National Aviation University                          National Aviation University                           National Aviation University
           Kyiv, Ukraine                                         Kyiv, Ukraine                                           Kyiv, Ukraine
                                                         romanpanteevmail@gmail.com                               ib.mistlemagic@gmail.com
          svm@nau.edu.ua

    Abstract—The problem of customs control intelligence                       applied in computer vision due to the classification of these
security system creations is considered. The necessity of the                  features for potential danger prevention. Using the profiling
passenger emotions face changes analysis during control in                     method it is necessary to use a coding system of facial
airports is shown. The emotional changes of a human face                       movements (SKLiD) [4]. By using the SKLID, it is possible
correspond to internal reaction of the person on the posed                     to create a facial model based on units of actions and a fixed
control questions. As the solution of this problem it’s proposed               period of time needed to act any emotion. Here, the units of
to apply the convolution neural network at the stage of micro                  action are the movements performed by individual muscles or
emotion identification and the indistinct qualifier – at the stage             a group of muscles.
of decision making for potential threats of passenger. As the
indistinct qualifier it is offered to use the NEFCLASS neural                      The system also has a limited number of descriptors
network. The example of practical approach for micro emotion                   (unitary movements performed by a group of muscles:
recognition by means of convolution neural network is given.                   tightening the cheeks, stretching the eyelids, raising the wings
                                                                               of the nose, raising the upper lip, deepening the noselabial
  Keywords—combined       convolution       neural     network,                fold, raising the corners of the lips, dimpling the lips,
NEFCLASS, micro expressions, facial state recognition.                         lowering the corners of the mouth, lowering the lower lip,
                                                                               pulling off the lips) [5]. Each manifestation of facial emotions
                       I.    INTRODUCTION                                      of a person can be described by a set of descriptors. As the
    Nowadays, the real importance is given to increasing the                   apparent facial changes there also occur the micro emotions.
aircraft safety conditions, in particular during the passenger                 They can be taken into account in more complicated
control. Commonly, the number of people for each security                      recognition approaches. Table 1 describes the main facial
officer is too high to deal with them in restricted period of                  changes relatively to the six standard types of emotions [6].
time. The employee of Aircraft Company is faced by a hard                         Motion units of the person can be divided into three
task, to ask the number of special questions to understand the                 groups conditionally:
emotional state of the passenger to successful admission of
the flight. The main features that allow solving this problem                      • static – recognition using only the photo is possible;
is emotional changes of the passenger during the control                           • dynamic – it is necessary to continuous frame
conversation. The most successful application of this                                changing, key points initialization or obtaining the
approach was done by El Al aircraft company (Israel). The                            average value of distances between motion units;
challenge is to detect and recognize the facial micro
changing. It’s very difficult task especially with a large                         • empty – actively participate in manifestation of
number of passengers (about 20-30 people on one security                             emotions, however are not registered search
officer) and demand considerable preparation. In this work as                        algorithms (dimples on cheeks).
the solution of given problem offered to use an intellectual
analysis system which consists of the two-level neural                             Now it is possible to review the following recognition
networks of deep learning based on micro emotion                               methods of the human emotional state using a profiling
recognition: at the first level the convolution neural network                 approach. The most efficient ones are: holistic methods, local
[1], on the second – the qualifier constructed with use of                     methods, methods of calculation of forms of objects, methods
indistinct neural network, for example NEFCLASS [2] is                         of calculation of dynamics of objects (Tab. 2) [7]. For the
applied.                                                                       human face it is possible to initialize up to 80 facial
                                                                               landmarks. Commonly, it is borders of eyes, mouth and
              II.     REVIEW OF EXISTING SOLUTIONS                             eyebrows. Molar muscles are not important feature for human
                                                                               expression recognition and its analysis. Each of emotions has
    During the analysis of human emotional state recognition,
                                                                               the unique dynamics parameters.
it’s necessary to consider “profiling” [3]. This is a set of
psychological approaches for analyzing and predicting a
persons behavior, based on particular features, appearance
and verbal behavior. This technique can be successfully


Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0)
   TABLE I. RELATIONS OF EMOTIONAL FACIAL FEATURES CHANGING               identification of 12 key areas, such as the corners of the eye
                                                                          and the corners of the mouth. Pluses: recognizes anger,
Emotion         Eyebrow                          Mouth                    sadness, fear, surprise, disgust and happiness. The software is
 Surprise       Rise                             Open                     not demanding on the computer. Disadvantages: unknown
                                                                          details of the implementation algorithm (no defects
 Fear           Rise and wrinkled                Open and stretch         identified).
 Disgust        Decrease                         Rise and ends will          3) Face Analysis System - developed by “MMER-
                                                 decrease                 Systems” (Germany). System features: Face overlay a certain
 Anger          Decrease and wrinkled            Opens and ends will      deformable mask that allows you to calculate the necessary
                                                 decrease                 parameters in real time. Advantages: recognizes six basic
 Happiness      Bends down                       Ends will rise           emotions, determines gender, age, and ethnicity; identifies the
                                                                          person if the photo was previously uploaded to the database,
 Sadness        End part will decrease           Ends will decrease
                                                                          has an additional module. Disadvantages: incomplete
                                                                          coverage of the loaded data, since you can work with a
                                                                          webcam; inaccurate results for uploading data.
                                                                                   The task of recognizing emotions is part of a
   TABLE II. METHODS FOR FACIAL EMITONAL STATE RECOGNITION OF             comprehensive system for image analysis. For its proper
                            HUMAN FACE
                                                                          operation, it is important to clearly identify the person’s face
 Methods        Holistic methods             Local methods                in the image and properly extract it for further processing,
                                                                          relay on a number of dynamic parameters.
 Methods for    Classificatory:              Classificatory:
 shapes         Artificial Neural network,   Artificial Neural network,
 calculations   Random Forest,               Bayes Classificatory,            Face detection algorithms can be divided into four
                Adaboost,                    Adaboost,                    categories [8]:
                Gabor filters,               Geometric face models.          • empirical method;
                                             Own vectors:
                                             PCA.                            • method of invariant signs;
                2D face models:
                AAM, ASM EBGM.               Local histograms:               • recognition on the template implemented by the
                                             HoG, LBP.                         developer;
 Methods for    Optical flow,                3D dynamic models               • a method detection on external signs (the training
 dynamics       Dynamic models.              Statistical models: HMM,          systems).
 calculations                                DBN.
                                                                             The main stages of algorithms of empirical approach are:

   Therefore, profiling is the relevant method for emotional                 • stay on the image of the person: eye, nose, mouth;
identifying and analyzing of an individual features, which                   • detection: borders of the person, form, brightness,
makes it possible to significantly increase the level of                       texture, color;
security, for example, at airports.
                                                                             • combination of all found invariant signs and their
    These approaches were implemented in the following                         verification.
software for processing video images of a human face subject
to emotions [7]:                                                             Shortcoming is that this algorithm is very sensitive to
                                                                          degree of an inclination and turn of the head.
    1) Face Reader - developed by “Noldus Information
Technology” (Netherlands), requires the facial video for
proper identification. System capabilities: recognition of                      III. MATHEMATICAL PROLEM STATEMENT FOR FACIAL
emotions, the definition of ethnicity, the use of the Active                                  STATE RECOGNITION
Template method, the creation of an artificial face model.                    Mimic reactions of each person have a certain set of
Advantages: recognition with an accuracy of 89%; you can                  standard manifestation parameters and are divided into two
define emotions by frames or completely (video), full                     categories: geometric and behavioral.
visualization (histograms, diagrams of emotions).
Disadvantages: does not recognize children under five;                        To describe the quantitative and qualitative parameters of
inaccurate definition of emotions in a person with glasses;               the face (voluntary and involuntary) use a coding system of
different skin color is perceived by the system in different              facial movements. In this case, the quantitative parameter is
ways; turned face is not detected.                                        the intensity of movement from A to E.

   2) Emotion Software and GladOrs application - developed                    The video data stream is a sequential set of frames. The
by “Visual Recognition” (Netherlands). System capabilities:               goal of recognition is to merge faces on images into disjoint
the system creates a 3D model of the face with the                        classes. The task of face recognition is formulated as follows:
it is required to build a recognition function (1) output that               where m1 + m2 + … + mk = m, m = |M|.
determines the image class w, presented by a vector of sign
                                                                             For a solution of this task in work the combined
(𝑥1(𝑤),…,𝑥𝑛(𝑤)).
                                                                         convolution neural network which consists of convolution
                F(w) = ( 𝐹1(𝑤), 𝐹2(𝑤),…, 𝐹k(𝑤))                 (1)      neural network, the qualifier and scanning neural network is
                                                                         used (Fig. 2). Use of this network allows selecting emotional
   In this case a class is the one of six basic emotions of the          reaction of each of above-mentioned features. Data of exits of
person.                                                                  this network are inputs of the indistinct qualifier which allows
                                                                         making on the basis of the analysis of reaction of separate
                                                                (2)      elements of the person decisions concerning threats which
                                                                         this passenger can represent.
   Search of a solution is carried out by using of artificial               The combined convolution neural network solves such
neural networks.                                                         problems as:
    The invariant is the property of some class (set) of                     1) Recognition and description of features of the person.
mathematical objects remaining invariable at conversions of a
certain type [8].                                                           This task demands definition of position of bodies (eyes, a
                                                                         nose, a mouth, etc.) on a face and also forms of these bodies
     The invariant moments represent characteristic signs                should be defined.
which can meet in each picture. Most often persons on video
frames are exposed to the different deformations inherent to
mimicry of the person. In such conditions it is necessary to
tell about "pseudo-invariants" [9].
   It is reasonable to convert the color images in a gray-scale


                                     Fig 1.   Structural scheme of combined convolutional neural network

style. After preprocessing and normalization the sample
represents a matrix of pixels, each of which matters
brightness in the range [0…1] (Fig. 1).


                       IV.    PROBLEM SOLUTION
   The solution of a problem of recognition of emotions
belongs to a problem of classification, i.e. the neural network
should carry the received data set to emotions, answering to             Fig. 2. Structural scheme of single covolutional neural network
the set of parameters. Let's consider the mathematical
description of a problem of recognition:                                     2) Classification.
   Let the set of M of images of persons (emotion, for                       According to information on features of the person it is
example, surprise) is given as {𝑤1,…,𝑤n}, each of which has a            necessary to define what type of emotion is present at the
vector of values of signs (mimic signs) [9]:                             image. Then it is necessary to define information on mood for
                                                                (3)      further development of the intelligent interface.

    Vector of signs are carried by experts to some classes:                 The combined network consists of convolution neural
                                                                         network, some qualifier and re-convolution neural network.
                                                                         The conceptual structure of such network presented on a
                                                                (4)      Figure 1. Such architecture allows not only to distinguish
                                                                         image elements, but also to notice on it recognition elements.
Re-convolution neural network is specula reflection of            convolution neural network on test selection it is possible to
convolution neural network.                                       determine a number of significant parameters [10]:
    The ability of the multilayer neural networks trained by      1. Number of convolution layers.
method of gradient descent to creation of difficult
multidimensional areas on the basis of a large number of the      2. Number of aggregation layers.
training examples allows applying them as the qualifier to        3. Mutual placement of convolution layers and aggregation
image identification.                                             layers.
    Despite it, in traditional full-meshed neural network there   4. Convolution layers (for each layer separately):
is a number of the shortcomings lowering efficiency of their
work. First of all, it is the big size of images (the image is         •    Convolution core size (for each layer separately);
understood as the graphical representation of a recognizable           •    Number of features maps (for each layer separately);
image presented in the form of set of pixels), which can reach
several hundred. For correct training in such data it is               •    Padding size (for each layer separately);
required to increase number of the hidden neurons that leads
to increase in number of parameters, and, as a result, lowers          •    Parameter of ending effect.
training speed, demands the big training selection. But the       5. Aggregation layers (for each layer separately):
biggest restriction of such networks is that they do not differ
in invariance to different deformations, for example, to               •    Aggregation core size;
transfer or insignificant distortion of an input signal.               •    Aggregation core function.
    The convolution neural network systems (Conv Nets or          6. Fully-connected layers (for each layer separately):
CNNs) are the logical instrument receives an input
parameters as image in the set of pixels view, finds some              •    Number of fully-connected layers;
features on it and due to it sets the parameters (weighted
coefficients) to wide data objects in the images and be able to        •    Sizes of each layer;
highlight any special things among all over objects. Conv              •    Classificatory type: auto encoder.
Nets requires the less clean processing power relatively to
other processing algorithms. Unlike to standard filter methods    7. Existence of extracting operation for each layer: –
that working as hard-engineered unit, the convolution neural      extracting percent and random function.
network can achieve it through the training processes.               For optimization of structure and parameters of
    The structure of a Conv Nets is same as connectivity          convolution neural network the genetic algorithm is used
image of human brain biological neurons and was based on          [10].
group of the “Visual Cortex”. Each one neuron responds to
stimuli only in a specially-restricted area of the visual field
known as the receptive area. A set of such areas overlap to
cover the entire visual area. A Conv Nets are able to clearly
capture the spatial and temporal dependencies in any input
image data through the application based on relevant filters.
CNN structure performs a great setting to the image dataset
with help of reduction in the number of parameters involved
and reusability of weights. Therefore, the neural network can
be trained to understand the sophistication of the image
better. On the Fig. 2 presented the conceptual structure of
convolution neural network.
                                                                  Fig. 3. Facial expression recognition example obtained using complex CNN
     As the input commonly we will get the RNG image which
means R – red color, G – greed and B – blue. Also, image can                                 V.     CONCLUSION
be in different color types: gray-scale, HSV, RGB, CMYK,              In this work the effective approach for emotional state
etc.                                                              recognition of human face using digital images analysis is
    For the while, take into the mind which the computational     proposed. It is developed the ways of application the
power needed for process the image of 8K (7680×4320) sizes.       combined convolution neural network for assigned task and
The main purpose of Conv Nets is to compress it into much         algorithms of digital image processing was applied. For the
more lighter state without losing important parts (features).     solution of the classification task it is used fuzzy neural
It’s necessary to take care when we’re designing structure        network classifier such as “NEFCLASS”, inputs of which are
with low features learning possibilities and huge data sets       outputs of CNN. Given approach has the acceptable
(Fig.                                                     3).     recognition level and good enough accuracy. This system can
    On the basis of results of check of the combined              be successfully applied to perform the security purposes in
                                                                  the airports and able to increase the security level. However
                                                                  the system has some limitations as the sensitivity to lightning
level, dependence on head rotation degree. These limitations
will be eliminated at continuation of this work in the
following directions: gain of compensation of light difference
using the complication of an algorithm of adaptive filtering;
expansion the range of emotions. In figure 3 presented the
result of combined convolution neural network for expression
and analysis of human emotional state.

                              REFERENCES
[1]  A. Krizhevsky, I. Sutskeyever, and G. E. Hinton, “ImageNet
     Classification with Deep Convolutional Neural Network” 2012.
     (references)
[2] D. Nauck, R. Cruse, “NEFCLASS – a neuro-fuzzy approach for
     classification of data”, Junuary 1995.
[3] Y. M. Volinsky-Bosmanov “Profayling. Technology for preventing
     crime acts” in Magnetism, vol. III, G. T. Rado and H. Suhl, Eds. New
     York: Academic, 2015, pp. 220.
[4] R. Li, S. Ma, “Impact of depression on response to comedy: A
     dynamic facial coding analysis” Journal of abnormal psychology 116
     (4): 804–9, 2007.
[5] B. A. Knyazev, Y. E. Gapanyuk “Recognition of aberrant behavior of
     the person” Journal of engineering, 2013, p 512.
[6] P. Ekman and W. Friesen, “Facial Action Coding System: A
     Technique for the Measurement of Facial Movement”, consulting
     Psychologists Press, Palo Alto, 1978.
[7] D. Stutz, “Introduction to Neural Networks. Seminar on Selected
     Topics in Human Language Technology and Pattern Recognition”,
     2014.
[8] D. A. Tatarenkov, “Analysis of face recognition methods on images”,
     2015, p. 270.
[9] G. Deco and D. Obradovic, “An Information-theoretic Approach to
     Neural Computing, Springer”, New-York, 1996.
[10] G. K. Voronosky, S. N. Petrashev, S. A. Sergeev, “Genetic algorithms.
     Artificial intelligence systems and problems of virtual reality”, 1997.