=Paper= {{Paper |id=Vol-1940/paper11 |storemode=property |title=The System for Determining the Human Head Position and Orientation for Vehicle Simulators |pdfUrl=https://ceur-ws.org/Vol-1940/paper11.pdf |volume=Vol-1940 |authors=Alexei Zakharov,Alexei Barinov,Arkady Zhiznyakov }} ==The System for Determining the Human Head Position and Orientation for Vehicle Simulators== https://ceur-ws.org/Vol-1940/paper11.pdf
    The System for Determining the Human Head Position
           and Orientation for Vehicle Simulators




              Alexei Zakharov, Alexei Barinov and Arkady Zhiznyakov

            Murom Institute, Vladimir State University, Murom, Russia
aa-zaharov@ya.ru, alexey.barinov.murom@ya.ru, lvovich@newmail.ru



          Abstract. The system for determining the human head position and orienta-
       tion for vehicle simulators is considered in the work. It is necessary to take into
       account the current position and orientation of the driver's head when simulat-
       ing a virtual space. The model for determining the human head position and
       orientation is developed and studied. In the course of the study, the developed
       position and orientation algorithm was compared with analogs belonging. This
       model is characterized by the joint use of three-dimensional reconstruction, ste-
       reo vision, the spectral theory of graphs, spectral embedding of graphs into a
       vector subspace, and allowing the head to rotate to 50° with an accuracy of 3°,
       which exceeds known approaches. A system for determining the human head
       position and orientation based on stereo images is constructed, which imple-
       ments the developed algorithms. The developed system can be represented in
       the form of the following structure: stereo module, initialization module, track-
       ing module, module for calculating the angles of the head orientation, head de-
       tection module, information transfer module.


       Keywords: Human Head Posе, Vehicle Simulators, Image Recognition.


1      Introduction

To date, an integral part of the high quality training of drivers is the use of simulators,
which can be of great help in the training process of driving on the basis of a set of
certain exercises.
An integral part of modern simulators is the system of surrounding space visualiza-
tion. Throughout the development cycle of simulators, much attention has been paid
to the development and improvement of this system. But even now, the presence ef-
fect created in the simulator is very different from the driver's feelings he experiences
in a real vehicle. One of the reasons for this is that in the synthesis of the environment
in the transport simulator, the visualization system does not change the displayed
picture of the world, depending on the position and orientation of the driver's head in
                                                                                             81


the cabin (Fig. 1). In turn, this leads to the impossibility of realistic emulation of rear-
view mirrors and the entire area around the car to the extent that would enable a per-
son to drive a real vehicle well. To solve this problem, it is necessary to take into
account the current position and orientation of the driver's head when simulating a
virtual space.




                           a)                                     b)

Fig. 1. Change in the overview when the position and orientation of the drivers' head change:
a) change in the overview through the vehicle's glass; b) change in the overview in the rearview
mirror.

Recently, a large number of methods and systems for determining the position and
orientation of the head based on images have been developed. However, the existing
systems have low accuracy of determining parameters, do not provide the necessary
speed and accuracy. In addition, the use of such systems is limited to specific applica-
tions, so they are difficult to use in common vehicle simulators.
The creation of a system for determining the human head position and orientation on the
basis of stereo images will increase the effect of the trainee's presence in the synthesized
medium by changing the overview. In addition to increasing the realism of the external
space display, the trainee will have the opportunity to master such an important skill as
controlling the traffic situation through an overview of rear-view mirrors. This will
definitely improve the quality of drivers' training process and subsequently reduce the
number of accidents


2      The Model for Determining the Human Head Position And
       Orientation Using Stereo Images

To determine the head position and orientation, a stereo system consisting of two
chambers is used. The optical axis of the chambers are parallel to each other. The
three-dimensional coordinates of the head point x , y , z are found according to the
known distance between the cameras and the camera focal length. Stereo settings are
determined during calibration. A preliminarily three-dimensional reconstruction of the
82


head model based on stereo images using the Sum of Absolute Differences (SAD)
algorithm is performed (Scharstein D. et al., 2002) . Based on the three-dimensional
model, the head current position and orientation are calculated. To track the head
orientation, work is done with special points, highlighted in the image using the
SURF algorithm. Angles of yaw  yaw , roll  roll and pitch  pitch are calculated
with respect to the coordinate system of the stereo settings. The definition of these
angles is reduced to the calculation of the angle between the vectors formed by the
corresponding key points in the initial and subsequent positions. Such points are:
extreme lateral points (A, C); extreme upper and lower points (B, D); the central point
(E) (Fig. 2).




                а)                               b)                                c)




                d)                               e)                                f)


Fig. 2. Determination of the head orientation with the help of singular points. a) angles of
rotation of the head; b) key points; c) pitch calculation; d) yaw calculation; e), f) roll calculation.
                                                                                                       83


Thus, for example, to calculate the head tilt, the following formula is used

                                                          AC  AC CC
                                       α roll  arccos                     ,                          (1)
                                                          AC AC CC

where AC is a vector, formed by points in the initial position, AC CC is a vector
formed by points in the current position.
The model for determining the human head position and orientation is a model of
actions and consists of the following steps: initialization of the head image using the
clustering of features based on the spectral theory of graphs and constraints; the head
tracking on images by finding matches using graphs; the head detection in the image
based on the embedding of the graph of singularities in the vector space; calculating
the angles of nod, rotation and tilt.


2.1     Initialization of the Head Image using the Clustering of Features Based on
        the Spectral Theory of Graphs and Constraints
For the purpose of controlling the clusterization process limiting conditions are used
(Barinov A.E. et al., 2016). For this the prior information about the proportions of the
head of the man is used. It is proposed to use the network of the head proportions.
This idea was borrowed from art for depicting a human head. It was found out that
regardless of the sex and the race the proportions of the faces of different people are
equal.
The peculiarities of the photos are used to create the network of proportions. The
network is applied on the image of the face containing 5 cells horizontally and 7 ver-
tically. For the purpose of using the network of proportions in the course of clusteriza-
tions it is necessary to calculate an averaged template. To achieve this purpose 500
images of people were chosen, special points were calculated, a network of propor-
tions was applied manually (Fig. 3, a).
Finally an averaged position of special points was obtained with respect to the net-
work of proportions (Fig. 3, b).




                            a)                                        b)                   c)

Fig. 3. The network of proportions for identifying points belonging to the head of the man а) The examples
of the identification of different faces; b) The averaged template of the network of proportions; c) The
results of the clusterization.
84


2.2    The Head Tracking on Images by Finding Matches Using Graphs
The use of computer vision is limited by sound and optical effects, the textured back-
ground, mutual overlapping of objects. To increase the reliability it is proposed to use
structured information in the form of graphs. The advantage of structural methods is
that they give an opportunity to analyze a big set of elements on the basis of a small
quantity of simple components and rules of forming the graphic model. Also structur-
al methods allow to describe the characteristics of the object excluding its reference to
another class that increases the reliability of the recognition. Scott and Longuet-
Higgins used graphs to find the corresponding. Our method of finding correspon-
dences based on the combined use of graphs and descriptors (Zakharov A. et al.,
2015), (Fig. 4).




            Fig. 4. The head tracking on images by finding matches using graphs.


2.3    Detection of Human Head on Images Using Graph Embedding In Vector
       Space
Upon detection of the head on the image, it is proposed to compare spectral characte-
ristics of graphs at the current frame and at the picture obtained at the stage of initiali-
zation. It is proposed to use the thermodynamic equation. This approach is widely
used when studying temperature effects on gases, with the gradual tracking of
changes in the current state. The practical application of thermodynamic equations
can be a solution of matching images problem.
Embedding the graph in a vector subspace, we have to deal with differential geometry
where to describe relationships between points, we have to use composite curves. The
construction of the graph is shown in figure 5.




                   Fig. 5. Construction of Graphs on the Basis of Images.
                                                                                       85


    The results of embedding graphs are shown in the figure 6.




                           Fig. 6. Results of embedding graphs


3       The Structure of the System for Determining the Head
        Position and Orientation Using Stereo Images

The developed system is implemented in the Visual Studio Community 2015 pro-
gramming environment using the C # programming language. When developing the
system, both standard functions of the environment and third-party libraries were
used: OpenCV library (for working with video images), ALGLIB library (for using
advanced mathematical functions), WebCamLib interface (for working with a video
camera), Camera calibration functions Camera Calibration Tools.
To save the program settings, the recording technology in the "ini" files was used.
The developed system can be represented in the form of the following struc-
ture (Fig. 7).
Stereo module. It is responsible for the coordinated work of video cameras that are a
part of the stereo settings. The module includes a calibration algorithm.
Initialization module. It is necessary to perform the initial search process of the user's
head on the image and calculate the descriptor for later detection. It contains an algo-
rithm for spectral clustering of image features with constraints and an algorithm for
the structural descriptor calculation.
Tracking module. This module tracks the user's head descriptor between the frames of
the video sequence. It contains an algorithm for finding correspondences on images
using thermal cores on graphs and calculating the model for determining the human
head position and orientation.
Module for calculating the angles of the head orientation. Using the developed model,
the current values of orientation angles and three-dimensional coordinates of the head
in space are calculated.
86


Head Detection Module. It searches for an object if it is lost from the stereo view
overview area. It contains an algorithm of spectral clustering with constraints and an
algorithm for detecting the head based on embedding graphs in a vector space.
Information transfer module. It is an interface between the system for determining the
human head position and orientation and the transport simulator.


                 Camera 1                 Stereo Module               Camera 1



                                     Initialization Module



             Head Detection             Tracking Module
                Module


                                    Module for Calculating
                                    the Angles of the Head
                                         Orientation



                                     Information Transfer
                                           Module


Fig. 7. The structure of the developed system of determining the human body position and
orientation with the help of stereo images using graphs

The hardware of the system includes two webcams Logitech C300, with a matrix of
1.3 megapixels. They are mounted on a special bracket, which in turn is mounted on
the monitor (Fig. 8). Each camera is connected to the computer using a USB 2.0 ca-
ble.




                 Fig. 8. Fixing the stereo system on the simulator monitor
                                                                                    87


4      The Investigation of The Model for Determining the Human
       Head Position And Orientation

In the course of the study, the developed position and orientation algorithm was com-
pared with analogs belonging to well-known groups of methods: appearance template
methods (Sharma S., 2013), detector arrays (Jones M. et al., 2003), nonlinear regres-
sion (Drouard V. et al., 2015), manifold embedding methods (Sundararajan K. et al.,
2015), flexible models (Chen Y. et al., 2014) [64], geometric methods (Hatem H. et
al., 2015), hybrid methods [Liao W.K. et al., 2010; Burger P. et al., 2013; Cabrera
C.R. et al., 2015 ].
The comparison was made according to the following parameters: the range of de-
fined head positions; the maximum value of the error; the presence of automatic in-
itialization, which does not require human intervention at the beginning of the system
and in case of loss of the monitored object during operation; the processing of situa-
tions of mutual overlapping of objects in the scene; the ability to produce the correct
result when a person is wearing accessories in the form of glasses or beards; the type
of position and orientation determination (discrete or continuous).
It can be seen from the table that majority of the algorithms under consideration have
a similar range of detectable angles of rotation, nod and tilt. At the same time, the
maximum error belongs to the methods of arrays of detectors and is 9°. The maximum
error of the developed algorithm does not exceed 3°. All hybrid methods have auto-
matic initialization and search of the monitored object in case of its loss from the
scope of the camera. Also, the methods of nested varieties and the developed algo-
rithm allow further determination of position and orientation in the presence of situa-
tions with mutual overlap. Most of the methods considered before work need to be
trained, which can cause some difficulties. Also, all hybrid methods and the devel-
oped algorithm make it possible to conduct a continuous tracking of the position and
orientation, which allows us to obtain the rotation, nod and tilt angles, as well as
three-dimensional head coordinates at any time. The latter is an important characteris-
tic when used in transport simulators.
When compared with performance algorithms, it is established that processing is per-
formed at an average rate of 30 frames per second. This allows us to use the devel-
oped algorithm in real applications.


5      Conclusion

The model for determining the human head position and orientation is developed and
studied. This model is characterized by the joint use of three-dimensional reconstruc-
tion, stereo vision, the spectral theory of graphs, thermal nuclei on heat kernels and
spectral embedding of graphs into a vector subspace, and allowing the head to rotate
to 50 ° with an accuracy of 3 °, which exceeds known approaches.
A system for determining the human head position and orientation based on stereo
images is constructed, which implements the developed algorithms. The system has a
modular structure and is implemented in Visual Studio Community 2015 using
88


OpenCV and ALGLIB libraries, WebCamLib interface and Camera Calibration Tools
functions. The system allows us to increase the effect of the trainee's presence in the
synthesized virtual environment of the transport simulator due to the change in the
field of view.

Acknowledgements. This work was supported by the RFBR grant 16-37-00235,
project number 2.1950.2017/ПЧ in the framework of the basic tasks of the state of the
Russian Ministry of Education.


References
1. Barinov A.E., Zakharov A.A.: Clustering using a random walk on graph for head pose
    estimation. 2015 international conference on mechanical engineering, automation and con-
    trol systems (MEACS) (2016).
2. Burger P., Rothbucher M.: Self initializing head pose estimation with a 2D monocular USB
    camera. Technischer bericht, technische universität münchen (2013).
3. Cabrera C.R., García-Montero M., López-Sastre R., Tuytelaars T.: Fast head pose estima-
    tion for human-computer interaction. Iberian conference on pattern recognition and image
    analysis ( 2015).
4. Chen Y., Fu M., Yang Y., Song W.: A method of head pose estimation based on active
    shape model and stereo vision. Control conference (2014).
5. Drouard V., Ba S., Evangelidis G., Deleforge A., Horaud R.: Head pose estimation via
    probabilistic high-dimensional regression. In Proc. IEEE International Conference on Im-
    age Processing, (2015).
6. Hatem H., Beiji Z., Majeed R., Waleed J., Lutf M.: Head pose estimation based on detect-
    ing facial features. International journal of multimedia and ubiquitous engineering, Vol.
    10, №. 3, pp. 311-322, (2015).
7. Jones M., Viola P.: Fast multi-view face detection. Mitsubishi electric research laboratories
    (2003).
8. Liao W.K., Fidaleo D., Medioni G.: Robust, real-time 3D face tracking from a monocular
    view. EURASIP Journal on image and video processing (2010).
9. Scharstein D., Szeliski R.: A taxonomy and evaluation of dense two-frame stereo corres-
    pondence algorithms. International journal of computer vision, №. 47, pp. 7-42 (2002).
10. Sharma S.: Template matching approach for face recognition system. International Journal
    of Signal Processing Systems, Vol. 1, №. 2, pp. 284-289 (2013).
11. Sundararajan K., Woodard D. L.: Head pose estimation in the wild using approximate view
    manifolds. In IEEE Conference on Computer Vision and Pattern Recognition Workshops,
    pp. 50-58, (2015).
12. Zakharov A., Tuzhilkin A., Zhiznyakov A.: Finding correspondences in images using de-
    scriptors and graphs. Procedia Engineering, № 129, pp. 391-396 (2015).