<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Selection of Face Detection and Recognition Algorithms for the E-learning System</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Oleh Shkodzinsky</string-name>
          <email>shkod@tntu.edu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mykhailo Lutskiv</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marian Smoliy</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Ternopil Ivan Puluj National Technical University</institution>
          ,
          <addr-line>Ternopil, 46001</addr-line>
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The relevance of the development and implementation of photo fixation and face recognition tools in e-learning systems is substantiated, and an extended list of problems that should be solved is formulated. The main algorithms and approaches to the detection and recognition of faces were considered, as a result of which an effective combination of algorithms was chosen, suitable for use in the means of testing knowledge in learning management systems.</p>
      </abstract>
      <kwd-group>
        <kwd>1 Face recognition</kwd>
        <kwd>photo fixation</kwd>
        <kwd>knowledge testing</kwd>
        <kwd>image recognition algorithms</kwd>
        <kwd>person identification</kwd>
        <kwd>identification accuracy</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The development of modern society is characterized by the powerful influence of computer
technologies that permeate all spheres of human activity. Informatization of education is an important
component of these processes. Computer technologies are becoming an integral part of the holistic
educational process, which creates the prerequisites for its transformation and significant increase in
efficiency.</p>
      <p>
        Two leading trends have the greatest use in informatization of the educational process:
 use of information technology products as a means of teaching and cognition;
 use of information technologies to control the student's educational and cognitive activity,
that together became the foundation for building a virtual educational environment of a modern
educational institution [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>Computer-based knowledge testing is increasingly used as a form of monitoring students'
performance. The introduction of first testing, and later computer-based testing into the educational
process became a purposeful step on the way to eliminating subjectivism in the assessment of the
student's cognitive activity. Forms of student knowledge assessment in the form of tests have proven
themselves as one of the most promising means of increasing the effectiveness of quality management
of the educational process, despite the presence of both supporters and opponents of knowledge
testing. The ratio between supporters and opponents is in a significant correlation to the quality
indicators of the test material and implemented algorithms of knowledge testing.</p>
      <p>The development of qualitative tests is a time-consuming task and requires compliance with such
requirements as: significance; scientific credibility; representativeness (presence in the test of the
main structural elements of the course content in required amount for assessment); increasing
complexity of educational material; variability depending on the content of the studied material and
the amount of hours; content systematicity; validity; comprehensiveness and balance of the test;
relationship between content and form.</p>
      <p>But there is another side of the issue, which opponents of testing use as an argument in their favor
- it is the emergence of new opportunities for an unfair attitude towards passing knowledge test,
especially in the conditions of distance learning, when the test-takers are dispersed in space and are
out of examiner’s visual control. This requires additional tools and measures for monitoring and
verifying the progress of the testing process, which would provide confirmation of the integrity of the
testing for each participant.</p>
      <p>The most common abuse is the unauthorized use of the Internet resources or electronic resources
in another operating system or browser and handy tools (books, notes, mobile devices, etc.), as well as
the replacement of the test person. Effective solutions to these problems can be the development and
use of means for activity monitoring of the browser tab where test takes place and the introduction of
photo fixation and recognition tools implemented on the basis of modern information technologies.</p>
      <p>Another feature of solving this problem is that the vast majority of educational institutions already
use educational content management systems based on web technologies, and photo fixation and
recognition tools should be able to be integrated into such systems and not require significant funds
for the implementation of appropriate software and hardware, it can be implemented on standard user
platforms, including mobile.</p>
    </sec>
    <sec id="sec-2">
      <title>2. The formulation of the problem</title>
      <p>
        Face detection, which is a variation of the general object detection problem, can be defined as
determining whether a given image contains a face, and if it does, finding the location of each face
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Face detection is a key task because it is a prerequisite for other tasks such as face locating [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ],
face recognition [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], face analysis [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], face verification [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], face labeling and extraction [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], face
tracking [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], recognition of emotions and facial expressions [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>Facial recognition is the main task in the identification or verification of an individual using the
features of his face. This is one of the most important problems of computer vision with great
commercial interest.</p>
      <p>
        Nowadays, there are several dozens of computer-based face detection and recognition methods
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. However, these methods do not provide 100% reliability of identification and, at the same time,
have limitations in recognition performance. Most of the early algorithms (before 2012) could not
provide sufficient performance and accuracy due to high variability of images. Among the main
challenges and problems of recognition are:
 illumination – some images may have very high or low illumination relative to the
background (contrast) and may also be partially shaded;
 facial expressions – can vary widely, representing the full range of human emotions and
states;
 skin types – different shades of facial skin;
 distance - if the distance to the camera is too large, the size of the face image may be too
small;
 orientation - the orientation and angle of the face relative to the camera can vary
significantly;
 complex background - a significant number of objects in the scene reduces the accuracy
and speed of detection;
 several faces in one image - images with a large number of faces are very difficult for
accurate and quick detection;
 overlapping faces - faces can be partially hidden by objects such as glasses, scarves,
hands, hair, hats, medical masks and other objects, which affects the accuracy and speed of
detection;
 low resolution - images with low resolution, as well as with high visual noise, are
recognized with less accuracy and more slowly.
      </p>
      <p>Despite the fact that today there is a great variety of methods, the general structure of the face
detection and recognition process can be distinguished (see Fig. 1).</p>
      <p>At the first stage, detection and locating (finding coordinates) of the face on the image is
performed. The best results are achieved when the person is looking directly into the camera when the
image is captured, but modern algorithms also allow face detection in situations where the person is
not looking directly into the camera (within certain limits, of course). The result of detection and
locating are the found coordinates of the face(s) and its dimensions.</p>
      <p>At the stage of determining the basic parameters, the face image is aligned and normalized
(geometrically and by brightness), the face is encoded into a set of basic parameters formed by a
parametric vector (array). After that, direct recognition takes place - a comparison of calculated
parametric vectors with identified ones located in the database of identified persons. The main
difference between all the considered algorithms will be the face detection mechanism itself and the
mechanism for calculating the basic parameters - translating the face into a parametric vector.</p>
      <p>The most common facial recognition algorithms available today include:
 principal component analysis, PCA
 Viola-Jones algorithm
 HOG algorithm and its combination with the SVM classifier
 deep convolutional neural networks (Deep CNNs): AlexNet, VGG, ResNet, etc.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Analysis of face detection and recognition methods</title>
      <p>Let's consider the most promising of the listed methods of detecting and recognizing faces in
images, analyze their advantages and disadvantages and choose the most optimal one for our project.</p>
      <p>The requirements for algorithms for this project are:
 accuracy of identification - more than 95%;
 recognition speed - up to 1 second.</p>
    </sec>
    <sec id="sec-4">
      <title>3.1. Principal component analysis</title>
      <p>
        Principal component analysis (PCA) is a way to reduce the dimensionality of data while losing the
minimum amount of information. The process of calculating the principal components is reduced to
the calculation of the eigenvectors and eigenvalues of the covariance matrix of the input data, or to the
singular decomposition (SVD) of the data matrix. The method of principal components is statistical
and operates not with images, but with vectors in linear space [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. In cases where there are
significant changes in the level of illumination or the facial expression of a person in the image, the
effectiveness of the method is significantly reduced [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>Among the advantages of this algorithm, we can single out low requirements for computing
power. However, it has significant disadvantages: high sensitivity (compared to other algorithms) to
lighting, facial expressions, and facial angle; stricter requirements for image quality compared to
other algorithms. These disadvantages causes low recognition accuracy (usually not higher than
8090%).</p>
    </sec>
    <sec id="sec-5">
      <title>3.2. Viola-Jones algorithm</title>
      <p>
        The Viola-Jones algorithm is based on the integral representation of the image by Haar-like
features, the construction of a classifier based on the adaptive boosting algorithm (AdaBoost), and the
method of combining classifiers into a cascade structure. This method demonstrates high efficiency in
solving the task of searching for objects in images and video streams in real time. Has a low
probability of false face detection. Allows to detect faces at angles up to 30°. The accuracy of
identification can reach values of more than 90% [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. The method was developed in 2001, has a
large number of implementations and is widely used, as it is simple and effective. The Viola-Jones
algorithm is also implemented in the OpenCV free library.
      </p>
      <p>Advantages over the PCA algorithm:
 low percentage of false positives;
 high speed of operation (up to several tens of milliseconds on modern CPUs);
 slightly higher accuracy;
 simplicity of software implementation (thanks to ready-made implementation in OpenCV).</p>
    </sec>
    <sec id="sec-6">
      <title>3.3. HOG-algorithm with SVM-classifier</title>
      <p>
        The Histogram of Oriented Gradients (HOG) algorithm combined with the Support Vector
Machine (SVM) can be used to train highly accurate object classifiers, including human faces - first
demonstrated by N. Dalal and B. Triggs in their paper Histogram of Oriented Gradients for Human
Detection [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. HOG counts the number of specific gradient orientations in local areas of the image.
The idea is that the distribution of the local intensity and directionality of the gradient describes the
local appearance and shape of the object [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>Advantages:
 much higher accuracy than Haar-like cascades (Viola-Jones method);
 more stable recognition than Haar-like cascades;
 good operating speed (less than a second on modern CPUs).</p>
      <p>Disadvantages:
 sensitivity to the angle of the face, requires a frontal view;
 accuracy is inferior to deep convolutional networks.</p>
      <p>This algorithm satisfies the set requirements, it has a fairly high accuracy with still sufficient
speed, and can be used in this system.</p>
    </sec>
    <sec id="sec-7">
      <title>3.4. Deep convolutional neural networks</title>
      <p>
        Deep convolutional neural networks (Deep CNNs) are a type of artificial neural network (ANN)
modeled after the mammalian visual cortex. The main components of a CNN are convolutional filters,
an aggregation layer, a pooling layer, a fully connected (FC) layer, and a loss function layer. CNNs
are used in a wide range of solutions for the tasks of object and action recognition, object detection,
computational photography, and natural language processing. To date, deep CNNs have achieved
outstanding success in most computer vision tasks and dominated many well-known competitions
such as the ImageNet Large Scale Visual Recognition Challenge (ILSVRC)[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>
        In 2015, deep convolutional neural networks surpassed the human level in image classification
[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>Advantages:
 very high recognition accuracy can be achieved (&gt;99%);
 resistance to variability of data and input noise (angle of the face, lighting, shadows, etc.).
Disadvantages:</p>
      <p> high computational complexity, requires graphics processor (GPU) resources.</p>
    </sec>
    <sec id="sec-8">
      <title>4. Conclusions</title>
      <p>Considering all of the above, to improve the quality of the system under development, it may be
relevant to create hybrid methods that use the advantages and disadvantages of several considered
algorithms. Thus, combining the faster HOG+SVM method and the slower, but more accurate CNN
method (for cases when the first one does not work satisfactorily) will allow to create an effective
detector.</p>
      <p>
        Considering that it is the type of CNNs - residual neural networks (ResNet) that have overcome
the human level of image classification, and also dominated ImageNet competitions for several years
in a row [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] and demonstrate high recognition accuracy at sufficient speed - it is reasonable to
implement the recognition stage in this solution using ResNet.
      </p>
    </sec>
    <sec id="sec-9">
      <title>5. References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Dyachuk</surname>
            <given-names>S. F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Konovalenck</surname>
            <given-names>I. V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shkodzinsky</surname>
            <given-names>O. K.</given-names>
          </string-name>
          <article-title>Virtual educational environment of TNTU based on LMS ATutor // International scientific-practical seminar "Theory and practice of distance learning of foreign citizens: domestic and international experience"</article-title>
          <source>NURE</source>
          <volume>12</volume>
          :
          <fpage>11</fpage>
          -
          <lpage>15</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>M. H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kriegman</surname>
            ,
            <given-names>D. J.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Ahuja</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <year>2002</year>
          .
          <article-title>Detecting faces in images: A survey</article-title>
          .
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          ,
          <volume>24</volume>
          (
          <issue>1</issue>
          ):
          <fpage>34</fpage>
          -
          <lpage>58</lpage>
          . ISSN 01628828. doi:
          <volume>10</volume>
          .1109/34.982883.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Lam</surname>
            ,
            <given-names>K.-M.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Yan</surname>
          </string-name>
          , H.,
          <year>1994</year>
          .
          <article-title>Fast algorithm for locating head boundaries</article-title>
          .
          <source>Journal of Electronic Imaging</source>
          ,
          <volume>3</volume>
          (
          <issue>4</issue>
          ):
          <fpage>351</fpage>
          -
          <lpage>359</lpage>
          . ISSN 1017-
          <fpage>9909</fpage>
          . doi:
          <volume>10</volume>
          .1117/12.183806.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Chi</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          , H., and
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <year>2017</year>
          .
          <article-title>End-To-End Face Detection and Recognition</article-title>
          .
          <source>arXiv preprint</source>
          ,
          <volume>1703</volume>
          .10818:
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Moghaddam</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Pentland</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <year>1997</year>
          .
          <article-title>Probabilistic visual learning for object representation</article-title>
          .
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          ,
          <volume>19</volume>
          (
          <issue>7</issue>
          ):
          <fpage>696</fpage>
          -
          <lpage>710</lpage>
          . ISSN 01628828. doi:
          <volume>10</volume>
          .1109/34.598227.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Ou</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>You</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tao</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , Zhang,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            , and
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <surname>Z.</surname>
          </string-name>
          ,
          <year>2014</year>
          .
          <article-title>Robust face recognition via occlusion dictionary learning</article-title>
          .
          <source>Pattern Recognition</source>
          ,
          <volume>47</volume>
          (
          <issue>4</issue>
          ):
          <fpage>1559</fpage>
          -
          <lpage>1572</lpage>
          . ISSN 00313203. doi:
          <volume>10</volume>
          .1016/j.patcog.
          <year>2013</year>
          .
          <volume>10</volume>
          .017.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Qi</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <year>2005</year>
          .
          <article-title>Robust visual similarity retrieval in single model face databases</article-title>
          .
          <source>Pattern Recognition</source>
          ,
          <volume>38</volume>
          (
          <issue>7</issue>
          ):
          <fpage>1009</fpage>
          -
          <lpage>1020</lpage>
          . ISSN 00313203. doi:
          <volume>10</volume>
          .1016/j.patcog.
          <year>2004</year>
          .
          <volume>12</volume>
          .006.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Essa</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Pentland</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <year>2002</year>
          .
          <article-title>Facial expression recognition using a dynamic model and motion energy</article-title>
          .
          <source>In Proceedings of 5th IEEE International Conference on Computer Vision</source>
          , pp.
          <fpage>360</fpage>
          -
          <lpage>367</lpage>
          . IEEE. doi:
          <volume>10</volume>
          .1109/ iccv.
          <year>1995</year>
          .
          <volume>466916</volume>
          .
          <fpage>20</fpage>
          -
          <issue>23</issue>
          <year>June 1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Khatri</surname>
          </string-name>
          ,
          <article-title>"</article-title>
          <source>Facial Expression Detection Techniques: Based on Viola and Jones Algorithm and Principal Component Analysis," 2015 Fifth International Conference on Advanced Computing &amp; Communication Technologies</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>108</fpage>
          -
          <lpage>112</lpage>
          , doi: 10.1109/ACCT.
          <year>2015</year>
          .
          <volume>32</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>K.</given-names>
            <surname>Dang</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <article-title>"Review and comparison of face detection algorithms,"</article-title>
          <source>2017 7th International Conference on Cloud Computing, Data Science &amp; Engineering - Confluence</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>629</fpage>
          -
          <lpage>633</lpage>
          , doi: 10.1109/CONFLUENCE.
          <year>2017</year>
          .
          <volume>7943228</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Kwang</given-names>
            <surname>In</surname>
          </string-name>
          <string-name>
            <surname>Kim</surname>
          </string-name>
          , Keechul Jung and
          <article-title>Hang Joon Kim, "Face recognition using kernel principal component analysis,"</article-title>
          <source>in IEEE Signal Processing Letters</source>
          , vol.
          <volume>9</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>40</fpage>
          -
          <lpage>42</lpage>
          , Feb.
          <year>2002</year>
          , doi: 10.1109/97.991133.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Yi-Qing</surname>
            <given-names>Wang</given-names>
          </string-name>
          ,
          <article-title>An Analysis of the Viola-Jones Face Detection Algorithm</article-title>
          , Image Processing On Line,
          <volume>4</volume>
          (
          <year>2014</year>
          ), pp.
          <fpage>128</fpage>
          -
          <lpage>148</lpage>
          . https://doi.org/10.5201/ipol.
          <year>2014</year>
          .104
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>N.</given-names>
            <surname>Dalal</surname>
          </string-name>
          and
          <string-name>
            <given-names>B.</given-names>
            <surname>Triggs</surname>
          </string-name>
          ,
          <article-title>"Histograms of oriented gradients for human detection,"</article-title>
          <source>2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05)</source>
          ,
          <year>2005</year>
          , pp.
          <fpage>886</fpage>
          -
          <lpage>893</lpage>
          vol.
          <volume>1</volume>
          , doi: 10.1109/CVPR.
          <year>2005</year>
          .
          <volume>177</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>ImageNet</given-names>
            <surname>Winning CNN</surname>
          </string-name>
          <article-title>Architectures (ILSVRC)</article-title>
          . URL: https://www.kaggle.com/gettingstarted/149448
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>ImageNet</given-names>
            <surname>Large Scale</surname>
          </string-name>
          <article-title>Visual Recognition Challenge 2015</article-title>
          . URL: http://imagenet.org/challenges/LSVRC/2015/
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>