<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>GraphiCon</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Operator's Gaze Direction Recognition</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alexey Popov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vlad Shakhuro</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anton Konushin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Lomonosov Moscow State University</institution>
          ,
          <addr-line>1, Leninskie Gory, 119991, Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>National Research University Higher School of Economics</institution>
          ,
          <addr-line>11, Pokrovsky boulevard, 109028, Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Samsung AI Center</institution>
          ,
          <addr-line>10, Butyrskiy Val Ulitsa, 127055, Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>31</volume>
      <fpage>27</fpage>
      <lpage>30</lpage>
      <abstract>
        <p>This work is devoted to the algorithm for recognizing the direction of a operator's gaze. The paper considers the method of classifying the direction of gaze. The algorithm proposed in this paper allows to determine the driver's gaze direction, which helps to protect the driving process. In the proposed work, we reviewed the algorithms and methods used to recognize the human gaze in various conditions. For each given method, a list of their positive and negative qualities is indicated. Based on the results of the review, an algorithm for classifying the car driver's gaze direction is proposed. The proposed algorithm consists of two components. The first part is responsible for the regression of the gaze direction. The second part is responsible for classifying the results of the first part. Experimental evaluation of the developed algorithm has shown that it is efective for the task of classifying the gaze direction. An important advantage of this algorithm is that it does not require repeated training of the algorithm to adapt to other scenarios of gaze direction classification.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Gaze direction recognition</kwd>
        <kwd>Convolution neural network</kwd>
        <kwd>Driver gaze classification</kwd>
        <kwd>Gaze recognition</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The task of the operator’s gaze direction recognizing is key in many computer vision systems,
such as driver assistance systems, analyzing human attention, determining the most interesting
contextual data (for example, on a WEB- page), intelligent compression of media information,
and many others. Thus, the creation of a new, more accurate and powerful algorithm for the
gaze direction recognizing will improve the quality of these systems.</p>
      <p>In this paper we propose an algorithm for solving the problem of classifying the car’s driver
gaze direction. Nowadays systems using computer vision algorithms are often used to ensure
active security, for example: ADAS. The system for classification of the operator’s gaze, in
particular the driver of the vehicle, is necessary for ensuring active safety, as it can prevent the
driver from being dangerously distracted from the process of driving the vehicle.</p>
      <p>In many countries, there are a number of rules that prohibit drivers of motor vehicles from
using the phone while driving, to prevent mass violations of such rules, driver’s gaze detection
system also can be used, determining whether the driver is looking at the road or distracted. If
the driver is distracted by the smartphone, warn him about the violation or fine him.</p>
      <p>Also, the algorithm for classifying the direction of the driver’s gaze can be used to build
a rating of taxi drivers and carsharing, which will ensure the safety of taxi passengers and
businesses in the field of carsharing.</p>
      <p>To solve the problem of classifying the gaze direction, the use of automatic tools is required.
Computer vision algorithms are actively developing, aimed at solving the problem of classifying
the gaze direction without using additional technical means. The introduction of such algorithms
into the daily life of drivers will increase the level of driving safety.</p>
      <p>At the moment, most of the methods used to determine the gaze direction, namely its
classification or regression of a three-dimensional angle, use classical approaches and do not
resort to the use of powerful, modern neural network algorithms. Based on this, the development
of a neural network algorithm for recognizing the direction of the operator’s gaze is a promising
task, since neural network methods are much more powerful and accurate than classical methods.</p>
      <p>There are single-stage neural network algorithms for classifying the gaze direction, which
allow obtaining high-quality results for a fixed work scenario. To apply such algorithms in new
work scenarios, a complete retraining of the algorithm is required, which is a big problem. This
problem is especially noticeable in the problem of classifying the direction of the driver’s gaze,
since car drivers can often change, the driver’s position in the seat can change, and the target
set of classes can change depending on the scenario of using the algorithm. Therefore, the task
of developing a two-stage algorithm that allows adapt to new scenarios of using the method
without retraining.</p>
      <p>In the paper we propose a method for classifying the direction of the operator’s gaze, according
to the data from the camera that captures the operator, namely the driver of a motor vehicle.
The proposed method for determining the gaze direction consists of several important parts. To
localize the operator’s face, a face detector is used, and a neural network with a new architecture
for this task is used to determine the vector of the three-dimensional gaze angle. Using the
results of the algorithm for determining the direction of the three-dimensional gaze angle, it is
possible to classify the gaze directions on the zones, which solves the chain that is used to solve
the problem of classifying the gaze direction in this work. The proposed method was tested on
several diferent data sets, including a self-assembled data set. As a result, the implemented
algorithm of driver’s gaze direction classification allows to achieve high accuracy in diferent
scenarios of driver’s gaze direction classification.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
      <p>The existing methods for determining the direction of the gaze are divided into two categories,
the first of which is the methods for classifying the direction of the gaze, and the second is the
methods for regressing the direction of the gaze.</p>
      <p>Consider the existing methods for recognizing the gaze direction:
• Methods that classify the gaze direction to predefined sectors in a specific work scenario.
• Methods that determine the three-dimensional gaze angle.</p>
      <sec id="sec-2-1">
        <title>2.1. Gaze direction classification algorithm</title>
        <p>The task of classifying the gaze direction arose quite a long time ago, so there are a lot of
diferent approaches to solving this problem, in each work the authors ofer their own heuristics
and their own experimental conclusion.</p>
        <p>
          To solve the problem of classifying the gaze direction, eye models are often developed, since
the eye is simply described as a geometric object. In the article [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], the authors suggest using
classical methods and a two-stage algorithm. In the first stage of the algorithm, the area
containing the face is extracted from the image using Viola Jones, and in the second stage using
CLNF (Constrained Local Neural Field) key points inside the eye are extracted. Using the data
obtained at the second stage of the algorithm, it is necessary to obtain feature vectors that can
be used to solve the classification problem.
        </p>
        <p>
          In the article [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], the authors propose a neural network method for classifying the gaze
direction, they take the Dlib-ml detector of the face and key points of the face, and then
determine the position of the eyes. After that, both eyes are fed to the neural network and it
builds the probability distribution of the classes of the gaze direction from the two images.
        </p>
        <p>
          One of the first articles in which the authors proposed to replace the main stage of the
algorithm with a neural network was the article [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. In particular, in this paper, the algorithm
for classifying the gaze direction was replaced by a neural network one. Since, according to the
authors, the features extracted by the neural network from the image of the eye area can well
classify the gaze direction, due to the greater power of neural algorithms.
        </p>
        <p>
          There are approaches in which the authors try to implement the most qualitative method
of gaze classifying, only by the image of the eye area. In such works, the authors pay more
attention to the classification method, without being distracted by the task of determining the
position of the eye and face. In the article [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], the authors prepared a data set containing placed
images of the eye areas in advance and trained their own lightweight neural network classifier
on them.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Three-dimensional gaze angle recognition algorithm</title>
        <p>Developing an algorithm for recognizing a three-dimensional angle that determines the exact
gaze direction is a dificult task. The accuracy of the existing methods increases every year.
Now neural network algorithms are increasingly used for angle regression, since this approach
can significantly increase the quality of the final method.</p>
        <p>
          More and more complex neural network models are used to accurately determine the gaze
angle. In the article [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], the authors propose to consider and submit to the network input 3
images at once, this is the image of each eye and face as a whole. The main idea of the authors
is that the weight of each of these three images varies from case to case. In this way, the authors
try to force the network to independently distribute attention between the input images. To
do this, a weight regressor is added to the algorithm, which calculates the weight of each of
the input images in the final result, namely, the calculated angle of gaze direction. In order to
correctly assess the contribution of each of the images to the final result, the article proposes a
new loss function that distributes the weight between the features of each of the input images
in the final value of the viewing angle.
        </p>
        <p>
          The old methods often use not only an ordinary camera, but also an infrared camera to reduce
the influence of external factors on the operation of the algorithm, since the power of classical
methods is much less than that of neural network methods. For example, the article [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] uses
an infrared flashlight and a camera with a removed filter to make the lighting in the car more
uniform and reduce the influence of external light sources, such as lights on poles, headlights of
an oncoming car, sun glare.
        </p>
        <p>
          Often, in solving the problem of recognizing the gaze direction, the main problem for authors
is the fight against occlusions. For example, in a situation where the car is moving, the position
and tilt of the head can change significantly, and the lighting can also vary greatly, for example
on a sunny day when entering a tunnel. There are many algorithms in which the authors try to
come up with a method to deal with the described problem. In the article [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], a new network
architecture is proposed, in which an additional network is added before the regressor, designed
to create a gaze direction map that is invariant in various scenarios of the algorithm application.
To train this network, the authors use marked-up images that already have a gaze map, which
imposes significant restrictions on the application of this method in real conditions.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Datasets used in gaze direction recognition tasks</title>
        <p>To solve the problem of determining the gaze direction, a lot of data is required. This is due
to the fact that the problem is mainly solved using machine or deep learning methods. Many
data sets exist to solve this problem, but most of them are designed to solve highly specialized
problems of classifying the gaze direction.</p>
        <p>
          One of the main datasets with eye images is the MPIIGaze dataset proposed in the article [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
It contains 213659 images depicting 15 diferent people.
        </p>
        <p>
          Special methods for generating synthetic data sets are also often developed. One of the most
popular synthetic datasets for training models for determining the gaze direction is [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
        </p>
        <p>
          There are various specific conditions in which the method of determining the gaze direction
has to work, so there are many diferent forms of data sets. In general, data sets difer in the
way they are collected and marked up, but there are also more significant diferences. Some
data sets are captured on an infrared camera, some are collected using a camera with depth
data, such as the data set [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
        <p>There are also simpler datasets that contain only color images of people’s faces taken with a
conventional camera. Such datasets are often more popular due to their simplicity and versatility.
It is important to note that such data sets are easier to collect, which afects their quality and
size. An example of such a dataset is XGaze.</p>
        <p>
          Some data sets are collected using new markup tools, such as using the voice commands of a
person who is in the field of view of the camera and is filmed to collect data. An example of
such a data set is Driver Gaze in the Wild, it was collected using the method described in the
article [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. The Driver Gaze in the Wild contains 29050 images of 383 diferent people.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Gaze direction recognition</title>
      <p>Based on the conclusions drawn from the review of methods and the task of constructing a
universal method for determining the direction of the operator’s gaze, the following scheme of
the method was developed.</p>
      <p>The method proposed in this paper consists of two parts, the first of which is the regression of
the gaze angle, the second is the classification of the obtained gaze angle for a certain scenario
of the method.</p>
      <sec id="sec-3-1">
        <title>3.1. Gaze direction regression</title>
        <p>All methods for determining the direction of the gaze, in which there is a stage of predicting
the vector of the direction of the gaze, are combined by a common task of choosing the target
predicted vector and the metric in which the error of the method will be calculated during
training.</p>
        <sec id="sec-3-1-1">
          <title>3.1.1. Metrics and loss function</title>
          <p>There are many diferent metrics (loss functions) for training neural networks, but each task
has its own specifics, which significantly limits the choice. Later in this section, the metric used
to evaluate the model is also a loss function when training this model.</p>
          <p>In our problem, we can predict the gaze direction vector as a three-dimensional vector,
namely  = (, , ), where , ,  are the coordinates of the gaze direction vector in three
- dimensional space. In this problem, we only care about the directions of this vector, which
allows us to use the cosine similarity metric (cosine similarity), which is expressed by the formula
1, where  and  are two vectors compared by this metric, and  is the number of elements in
each of the vectors  and .</p>
          <p>= ( ) =</p>
          <p>·  ∑︀=1( · )
‖‖ · ‖ ‖ = √︁∑︀=1 2 · √︁∑︀=1 2
(1)</p>
          <p>The main advantage of such a metric is that after training the method, there is no need to
map the vector to a diferent dimension, and it is enough only to normalize the values that
the neural network outputs. But in this context, we can also consider the disadvantages of
this approach, namely: an increase in the number of network parameters and a lower speed of
calculating such a metric on GPU. It is also worth noting that training a neural network with
such a metric does not limit the length of the vectors that the network outputs, since the metric
penalizes the method only for divergence in the direction of the vector, which in turn can cause
large computational inaccuracies.</p>
          <p>On the other hand, to solve the problem of determining the gaze direction, a spherical
coordinate system with a fixed radius of the sphere can be used. In this case, it is necessary
to predict the vector  = (,  ), where ,  are angles in a spherical coordinate system, the
notation of angles in a three-dimensional coordinate system can be found in the figure ??.
Using the angles ℎ and  , as well as the system of formulas 2 with subsequent normalization
of the vector  = (, , ) by the formula  = ‖ ‖ , it is possible to uniquely express the
three-dimensional vector of the gaze direction.</p>
          <p>⎧⎪ = 1,
⎪
⎪⎨⎪ =  · (),
⎪ = √2 +  2 · ( ),
⎪
⎪⎩⎪ = (, , )
(2)</p>
          <p>The unambiguous of converting angles into a vector is confirmed by the fact that the vector
of the direction of gaze is located in a positive half-space along the  axis.</p>
          <p>Using the representation described above for the vector of the gaze direction has several
advantages, including: reducing the number of network parameters, simplifying the loss function
used, as well as optimizing the values of angles in some neighborhood of the true value, which
does not allow the optimized response vector to increase indefinitely without afecting the
value of the quality metric. In this approach, the quality metric   is best suited, which is
described by the formula 3, where  and  are two vectors compared by this metric, and  is
the number of elements in each of the vectors  and .</p>
          <p>1 
  =  · ∑︁( − )2 (3)</p>
          <p>=1</p>
          <p>For a more objective comparison of the models and metrics described above, a basic neural
network for predicting the vector of the gaze direction was developed to solve the problem of
determining the three-dimensional angle of the gaze direction. With the help of this neural
network, it was possible to compare the quality of these models.</p>
          <p>It is also necessary to compare the quality of the base model on color and black-and-white
images in order to optimize the hardware needs of the method described in this work. It can be
argued that for the task at hand, black and white images contain all the necessary information.</p>
          <p>Thus, at this stage of the method development, the following conclusions were made: the use
of the method of predicting the angles  and  using the   metric shows the best quality,
the use of black-and-white images for training the method allows to increase the final quality
of the method.</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>3.1.2. Proposed method</title>
          <p>This section describes the main idea of the method of regression of the gaze direction. To
implement this idea, a new neural network architecture is proposed, which allows solving the
task with high accuracy.</p>
          <p>To determine the direction of the gaze, the proposed method uses an image of a person’s face.</p>
          <p>The reason for using the face image as a whole was the problem of using only eye images
in real conditions of using the method. When using eye images, overlaps often occur and an
error introduced by eye detection algorithms has a great influence. In the figure 1, there is an
example of overlapping the eye with a pair of glasses in a real scenario of using the method of
determining the direction of the gaze, such an example of an image is practically insoluble for
the method of determining the direction of the gaze only by the eyes.</p>
          <p>The method proposed in this paper has a simple idea-to solve the problem of determining the
gaze direction without significantly increasing the requirements for input data. For this purpose,
an extensive neural network architecture was developed, the input of which is supplied only
with a black-and-white image of the operator’s face. This network architecture allows decrease
of the error and the operating time of the method due to inaccuracies in eye detection, and also
allows train the model using a high-quality data set XGaze.</p>
          <p>The main purpose of the proposed architecture shown in the figure 2, is a one-stage fully
neural network solution to the problem of determining the three-dimensional gaze angle. To
do this, several architectures were tried, which include several diferent-scale paths within
the network, which allow the network to focus on diferent-scale features. The architecture
proposed in this paper uses three branches, two of which coincide in architecture and in the
scale of the features under consideration, and the third is diferent and is designed to work
with larger-scale features. Such an architecture, according to the assumption, should use two
identical branches to isolate small-scale features for each of the eyes from the image, and the
third branch should allocate larger-scale features that are responsible for the position of the
face, which can be used to determine the direction of head rotation.</p>
          <p>This architecture allows to get rid of a more complex scheme of the algorithm, when it is
necessary to cut out the parts of the image that depict the eyes, and then submit three images at
once to the network input, although with this approach, the complexity of the neural network
architecture will not decrease.</p>
          <p>In this paper, an alternative approach is also tested, a neural network that receives three
images at once: a face, a left eye, and a right eye. This approach does not win over the basic
approach described above in terms of the complexity of the neural network architecture, since it
also contains three branches. To implement the approach with multiple input images, firstly, it
is necessary to select parts of the face image that contain the eyes. To solve the problem of eye
detection, it is necessary to involve a separate method that will supply additional noise to the
described algorithm. Based on the facts described above, we can conclude that the alternative
approach does not win in terms of complexity and speed, and also requires additional image
processing methods.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Gaze direction classification</title>
        <p>Since the proposed method pursues the goal of achieving a high level of universality, namely,
the possibility of applying the method in various scenarios without retraining. To classify the
gaze direction, a three-dimensional angle of the gaze direction is used, determined at the first
step of the algorithm.</p>
        <p>It is necessary to solve the problem of classifying pairs of the form (,  ) for fixed classes,
since from the pair (,  ) the three-dimensional gaze angle is uniquely determined. The number
of zones into which the field of gaze direction can be divided can be arbitrary, but in this
work we will consider the classification of the driver’s field of gaze into 7 zones: left mirror,
speedometer, center console, right mirror, right part of the windshield, interior rearview mirror,
left part of the windshield. An example of the location of classes in the figure 3.</p>
        <p>To solve the classification problem, several diferent classical methods are tested in this article.
For this classification problem, the neural network approach is not very well applicable, since
the geometry of the classes is quite simple, and the amount of training data is small, so the
classical methods were also tested: Nearest Centroid, K Nearest Neighbors, Tree Classifier .</p>
        <p>The K Nearest Neighbors method was chosen as the basic one, according to the set of positive
qualities for the problem being solved, since this method allows for a small amount of training
data with a fairly simple arrangement of classes to obtain a higher classification quality compared
to other methods.</p>
        <p>To apply the proposed method, it is proposed to calibrate the classification method for each
new car driver, in order to look at each fixed zone for 2 seconds.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental evaluation</title>
      <p>
        All neural networks were trained using an Nvidia Geforce GTX 1080 Ti graphics card. The basic
model of the regression of the gaze direction was trained on the data set UnityEyes from the
article [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] using the same optimizers. The training of the classification method was carried out
on the Drivers data set, which was prepared as part of the work on this article.
      </p>
      <sec id="sec-4-1">
        <title>4.1. Gaze direction regression</title>
        <p>In this paper, several architectures are proposed for determining the three-dimensional angle of
the gaze direction from the image to the face. The main architecture is shown in the figure 2.</p>
        <p>
          The comparison of the quality of the architectures proposed in this paper can be found in the
table 1, also in this table the results obtained by the authors of the article [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] on the dataset
XGaze are presented, on which the quality of the models is compared.
        </p>
        <p>According to the results in the table 1, it can be concluded that the proposed basic architecture,
shown in the figure 2, shows the best results at the moment, while surpassing the result obtained
by the authors of the dataset XGaze.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Gaze direction classification</title>
        <p>In the proposed work, several diferent classifiers of the gaze angle are compared. In the 2 table,
there are the results of each of the classifiers on the part of the DriveDS data set that shows one
person.</p>
        <p>The results in the table 2 are obtained on classifiers that are trained on 120 random examples
of angles from each class. It also follows from the presented table that the K Nearest Neighbors
method is best suited for solving this problem, which shows the highest results. An important
advantage of K Nearest Neighbors is the high speed of operation.</p>
        <p>For a more detailed description of the classification methods, we present the class distribution
schemes on a two-dimensional plane, Figure 4.</p>
        <p>Also, for the classification method K Nearest Neighbors, we give an error matrix for a part of
the data set DriveDS that shows one person, the table 3.</p>
        <p>The work carried out a cross-comparison of the method’s work on diferent people from the
data set DriveDS, the classification method was trained on the examples of one of the three
people, and then tested on the full data set of each of the people. The results of cross-testing are
shown in the 4 table.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>The algorithm proposed in this article allows us to obtain high accuracy of the classification
of the operator’s gaze direction using a two-stage algorithm, the first stage of which is the
regression of the gaze direction, and the second is the classification of the obtained gaze direction
vector. The main advantage of the proposed method is a high level of versatility. There is no
need to retrain the method to switch to a new scenario of the classification algorithm.
scale dataset for gaze estimation under extreme head pose and gaze variation, 2020.
arXiv:2007.15837.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>N. H.</given-names>
            <surname>Jabber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. A.</given-names>
            <surname>Hashim</surname>
          </string-name>
          ,
          <article-title>Robust eye features extraction based on eye angles for eficient gaze classification system</article-title>
          ,
          <source>in: 2018 Third Scientific Conference of Electrical Engineering (SCEE)</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>13</fpage>
          -
          <lpage>18</lpage>
          . doi:
          <volume>10</volume>
          .1109/SCEE.
          <year>2018</year>
          .
          <volume>8684107</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Saha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ferdoushi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Emrose</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Das</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. M. M. Hasan</surname>
            ,
            <given-names>A. I.</given-names>
          </string-name>
          <string-name>
            <surname>Khan</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Shahnaz</surname>
          </string-name>
          ,
          <article-title>Deep learning-based eye gaze controlled robotic car</article-title>
          ,
          <source>in: 2018 IEEE Region 10 Humanitarian Technology Conference (R10-HTC)</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          . doi:
          <volume>10</volume>
          .1109/R10-HTC.
          <year>2018</year>
          .
          <volume>8629836</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>George</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Routray</surname>
          </string-name>
          ,
          <article-title>Real-time eye gaze direction classification using convolutional neural network</article-title>
          ,
          <source>in: 2016 International Conference on Signal Processing and Communications (SPCOM)</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          . doi:
          <volume>10</volume>
          .1109/SPCOM.
          <year>2016</year>
          .
          <volume>7746701</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>X.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>Appearance-based gaze block estimation via cnn classification</article-title>
          ,
          <source>in: 2017 IEEE 19th International Workshop on Multimedia Signal Processing (MMSP)</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          . doi:
          <volume>10</volume>
          .1109/MMSP.
          <year>2017</year>
          .
          <volume>8122270</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Fang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <article-title>Learning a 3d gaze estimator with adaptive weighted strategy</article-title>
          ,
          <source>IEEE Access 8</source>
          (
          <year>2020</year>
          )
          <fpage>82142</fpage>
          -
          <lpage>82152</lpage>
          . doi:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2020</year>
          .
          <volume>2990685</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>F.</given-names>
            <surname>Vicente</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>De la Torre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , D. Levi,
          <article-title>Driver gaze tracking and eyes of the road detection system</article-title>
          ,
          <source>IEEE Transactions on Intelligent Transportation Systems</source>
          <volume>16</volume>
          (
          <year>2015</year>
          )
          <fpage>2014</fpage>
          -
          <lpage>2027</lpage>
          . doi:
          <volume>10</volume>
          .1109/TITS.
          <year>2015</year>
          .
          <volume>2396031</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Spurr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Hilliges</surname>
          </string-name>
          ,
          <source>Deep pictorial gaze estimation, Lecture Notes in Computer Science</source>
          (
          <year>2018</year>
          )
          <fpage>741</fpage>
          -
          <lpage>757</lpage>
          . URL: http://dx.doi.org/10.1007/978-3-
          <fpage>030</fpage>
          -01261-8_
          <fpage>44</fpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -01261-8_
          <fpage>44</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sugano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fritz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bulling</surname>
          </string-name>
          ,
          <article-title>Appearance-based gaze estimation in the wild</article-title>
          ,
          <source>in: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>4511</fpage>
          -
          <lpage>4520</lpage>
          . doi:
          <volume>10</volume>
          .1109/CVPR.
          <year>2015</year>
          .
          <volume>7299081</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>E.</given-names>
            <surname>Wood</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Baltrusaitis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.-P.</given-names>
            <surname>Morency</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Robinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bulling</surname>
          </string-name>
          ,
          <article-title>Learning an appearancebased gaze estimator from one million synthesised images</article-title>
          ,
          <year>2016</year>
          , pp.
          <fpage>131</fpage>
          -
          <lpage>138</lpage>
          . doi:
          <volume>10</volume>
          . 1145/2857491.2857492.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>R. F.</given-names>
            <surname>Ribeiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. D. P.</given-names>
            <surname>Costa</surname>
          </string-name>
          ,
          <article-title>Driver gaze zone dataset with depth data</article-title>
          ,
          <source>in: 2019 14th IEEE International Conference on Automatic Face Gesture Recognition (FG</source>
          <year>2019</year>
          ),
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          . doi:
          <volume>10</volume>
          .1109/FG.
          <year>2019</year>
          .
          <volume>8756592</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dhall</surname>
          </string-name>
          , G. Sharma,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <surname>N. Sebe,</surname>
          </string-name>
          <article-title>Speak2label: Using domain knowledge for creating a large scale driver gaze zone estimation dataset</article-title>
          ,
          <year>2021</year>
          . arXiv:
          <year>2004</year>
          .05973.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Park,
          <string-name>
            <given-names>T.</given-names>
            <surname>Beeler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bradley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Hilliges</surname>
          </string-name>
          ,
          <article-title>Eth-xgaze: A large</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>