=Paper= {{Paper |id=Vol-3300/short_392 |storemode=property |title=Driver Monitoring Systems in Automated Interactions: A Realtime, Thermographic-based Algorithm |pdfUrl=https://ceur-ws.org/Vol-3300/short_392.pdf |volume=Vol-3300 |authors=Saifeddine Aloui,Raphaël Morvillier,Christophe Prat,Jaka Sodnik,Carolina Diaz-Piedra,Francesco Angioi,Leandro L. Di Stasi |dblpUrl=https://dblp.org/rec/conf/hci-si/AlouiMPSDAS22 }} ==Driver Monitoring Systems in Automated Interactions: A Realtime, Thermographic-based Algorithm== https://ceur-ws.org/Vol-3300/short_392.pdf

Driver Monitoring Systems in Automated Interactions: A
Realtime, Thermographic-based Algorithm
Saifeddine Aloui1, Raphaël Morvillier1, Christophe Prat1, Jaka Sodnik2, Carolina Diaz-
Piedra3, Francesco Angioi3 and Leandro L. Di Stasi3
1
Univ. Grenoble Alpes, CEA, Leti, F-38000 Grenoble, France
2
University of Ljubljana. Faculty of Electrical Engineering, Tržaška c. 25, 1000 Ljubljana, Slovenia
3
Mind, Brain, and Behavior Research Center – CIMCYC, University of Granada, Campus de Cartuja s/n,
Granada 18011, Spain

Abstract
Due to the progressive shift of responsibility from the driver to the vehicle itself in automated
vehicle technologies, driver-centered innovations represent a key point for its advance. The so-
called Driver Monitoring Systems (DMS) are therefore increasingly gaining importance in this
context. One of the main aims of DMS is to estimate the driver’s arousal levels in order to infer
their cognitive state and capabilities. Even though the scientific literature is riddled with useful
psychophysiological indices to estimate arousal levels [1], nowadays, arousal estimation is
based on broad, mostly blink/gaze-related, indices. The reason is that actual implementation of
reliable sensors in a feasible system able to collect, analyze, and interpret measurements in
real-life conditions is still an open challenge. One of the alternatives to signal different
cognitive states is facial skin temperature [2][3]. Infrared sensors that monitor heat loss have
been shown useful to track facial skin temperature that indicate arousal modulations while
driving [2][3]. Such intensive, laborious work to extract and analyze temperature changes in
some facial landmarks is not reasonable in real-life applications [2]. Here, we present the
preliminary results obtained with a new software able to track, in real-time, drivers’ facial-skin
temperature changes.

Keywords 1
Driver state; Workload; Facial thermography; Real-time algorithm; Automated vehicle;
Sensoring and real-time information

1. Introduction
Due to the progressive shift of responsibility from the driver to the vehicle itself in automated vehicle
technologies, driver-centered innovations represent a key point for its advance. The so-called Driver
Monitoring Systems (DMS) are therefore increasingly gaining importance in this context. One of the
main aims of DMS is to estimate the driver’s arousal levels in order to infer their cognitive state and
capabilities. Even though the scientific literature is riddled with useful psychophysiological indices to
estimate arousal levels [1], nowadays, arousal estimation is based on broad, mostly blink/gaze-related,
indices. The reason is that actual implementation of reliable sensors in a feasible system able to collect,
analyze, and interpret measurements in real-life conditions is still an open challenge. One of the
alternatives to signal different cognitive states is facial skin temperature [2,3]. Infrared sensors that
monitor heat loss have been shown useful to track facial skin temperature that indicate arousal
modulations while driving. Such intensive, laborious work to extract and analyze temperature changes
in some facial landmarks is not reasonable in real-life applications [2]. Face landmarks extraction using
color images has become of common use [4] thanks to several libraries (e.g., Google's MediaPipe

Human-Computer Interaction Slovenia 2022, November 29, 2022, Ljubljana, Slovenia
EMAIL: saifeddine.aloui@cea.fr (S. Aloui); raphael.morvillier@cea.fr (R. Morvillier); christophe.prat@cea.fr (C. Prat);
jaka.sodnik@fe.uni-lj.si (J. Sodnik); dipie@ugr.es (C. Diaz-Piedra); frangioi@ugr.es (F. Angioi); distasi@ugr.es (L. L. Di Stasi)
ORCID: 0000-0001-7020-4461 (S. Aloui); 0000-0003-2868-331X (R. Morvillier); 0000-0001-8074-885X (C. Prat); 0000-0002-8915-9493
(J. Sodnik);0000-0002-8168-2546 (C. Diaz-Piedra); 0000-0001-8231-5580 (F. Angioi); 0000-0001-6763-6546 (L. L. Di Stasi)
©️ 2022 Copyright for this paper by its authors.
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR Workshop Proceedings (CEUR-WS.org)
library [5]). However, when applied to thermographic images, these libraries produce unsatisfactory
results: the face is either not detected or the landmarks are not correctly aligned with the real face.
Therefore, two main methods have been developed to perform landmark detection on thermographic
images. The first method is to develop a dedicated system trained on annotated thermographic images
[see 6]. This approach is still limited due to the lack of large thermographic databases. For example,
Kopaczka and colleagues used a database containing 2,935 images [7]. A database of this kind would
not be useful for our data. In the present study, the drivers had to wear transparent face masks due to
the COVID19 pandemic. This made it harder to apply landmark detection on thermographic images
where the mask was visible. Indeed, although the masks were transparent to visible light, they were not
in the wavelength used to measure the temperature, therefore hiding part of the driver’s face. The second
method uses an additional color camera to detect the facial landmarks and transfers them on the
thermographic image (this process of aligning images from different sources is often referred to as
“image registration”). In previous studies, authors detected the edges in both color and thermal images
and match them to align the images [8, 9]. A simpler method is described in another work [10], based
on an initial optical calibration between the two cameras. Goulart and colleagues used the same
principle and add a post-processing step to enhance the transferred landmark position, based on a trained
expert manual annotation [11]. Here, we present the preliminary results obtained with a system based
on this second method, able to track drivers’ facial-skin temperature changes automatically after an
initial calibration. It is a first step towards a fully automatic system, which could run in real-time in
future vehicles. We present the principle of the system and analyze its performance. In a future work,
we intend to show the usefulness of extracting the face temperature in an automated driving condition.

2. Material and methods
2.1. Instruments
We used a sensorized driving simulator (Nervtech™ solution, see Figure 1) running a SCANeR
studio software (AVSimulation, v.DT2.5). Participants’ facial skin temperature was constantly
monitored with a thermographic camera (FLIR A325sc, with a resolution of 320 × 240, a NETD <
50mK and an accuracy of ±2°C or ±2% of reading) synchronized with a color camera (infrared color
camera, Intel® Realsense).

Figure 1: The driving simulator employed in the study. Left, the simulator and its dome; right, the
interior of the dome with the thermographic and color cameras on the top of the main screen.

2.2. Face temperature extraction algorithm
To extract participants’ facial-skin temperature at specific locations, we developed an algorithm
(Figure 2) able to identify two facial landmarks (Points of Interest [POIs]), the tip of the nose and the
forehead, as well as the background in a thermographic image. The solution was based on a dual camera
setup (i.e., color camera and thermographic camera), with a spatial correspondence between the two.
Figure 2: Architecture of the temperature extraction algorithm

Specifically, the color camera output allows the extraction of the POIs using conventional facial
landmarks extraction tools. Here, we used MediaPipe [5], the state-of-the-art landmark detection
library. Once the POIs were detected, their positions were fitted into the thermographic camera output,
using a geometric transformation [12]. The algorithm uses a 3 × 3 transformation matrix
(𝑇 𝑐𝑜𝑙𝑜𝑟→𝑡ℎ𝑒𝑟𝑚𝑎𝑙 ) to convert each POI position from the color camera spatial output to the thermographic
spatial output. Each POI (landmark, 𝑙𝑖 ) is defined by its coordinates in the color camera space (𝑙𝑖𝑐𝑜𝑙𝑜𝑟 ).
As detailed below, the coordinates in the thermographic camera space (𝑙𝑖𝑡ℎ𝑒𝑟𝑚𝑎𝑙 ) are obtained by
multiplying 𝑙𝑖𝑐𝑜𝑙𝑜𝑟 by the transformation matrix 𝑇 𝑐𝑜𝑙𝑜𝑟→𝑡ℎ𝑒𝑟𝑚𝑎𝑙 (1). This transformation matrix is the
multiplication of three matrices (2). The first describes a translation with coordinates [𝑡𝑥 ,𝑡𝑦 ] (3), the
second describes a rotation around the center of the screen with angle 𝜃 (4) and the third describes a
scaling with parameters [𝑠𝑥 ,𝑠𝑦 ] (5). Once the positions of the landmarks in the thermographic image
space were found, the POIs temperature values were read in the image. Finally, we multiplied the result
by the skin emissivity (0.98) to obtain the skin temperature.

𝑙𝑖 𝑡ℎ𝑒𝑟𝑚𝑎𝑙 = 𝑇 𝑐𝑜𝑙𝑜𝑟→𝑡ℎ𝑒𝑟𝑚𝑎𝑙 𝑙𝑖 𝑐𝑜𝑙𝑜𝑟 (1)

𝑇 𝑅𝐺𝐵→𝑡ℎ𝑒𝑟𝑚𝑎𝑙 = 𝑇 𝑡𝑟𝑎𝑛𝑠 × 𝑇 𝑟𝑜𝑡 × 𝑇 𝑠𝑐𝑎𝑙𝑖𝑛𝑔 (2)

1 0 𝑡𝑥
𝑇 𝑡𝑟𝑎𝑛𝑠 = [0 1 𝑡𝑦 ] (3)
0 0 1

𝑐𝑜𝑠(𝜃) −𝑠𝑖𝑛(𝜃) 0
𝑇 𝑟𝑜𝑡 = [ 𝑠𝑖𝑛(𝜃) 𝑐𝑜𝑠(𝜃) 0] (4)
0 0 1

𝑠𝑥 0 0
𝑇 𝑠𝑐𝑎𝑙𝑖𝑛𝑔 = [ 0 𝑠𝑦 0] (5)
0 0 1

2.3. Calibration
The described system first needed to be calibrated to determine the parameters of the transformation
matrix 𝑇 𝑐𝑜𝑙𝑜𝑟→𝑡ℎ𝑒𝑟𝑚𝑎𝑙 : 𝑡𝑥 , 𝑡𝑦 , 𝜃, 𝑠𝑥 and 𝑠𝑦 . Filippini and colleagues used a similar set-up and performed
the calibration using a custom checkerboard [10], a method we found to be less precise in our situation.
We therefore developed a dedicated calibration software. It allows an operator to visualize
simultaneously the color and the thermographic camera outputs, as shown in Figure 3.

Figure 3: Calibration software interface. On the left, the color image with the landmarks detected
thanks to MediaPipe [5]. On the right, the thermographic image with the corresponding landmarks
that the operator has to translate, rotate, and scale to correspond to the driver’s face.
On the color image, the operator can inspect the landmark detection performed by MediaPipe. A
thermographic image shows if these landmarks are transferred correctly. If the result is not satisfying,
additional translations, rotations and scaling of the landmarks “mask” can be done manually with the
mouse. These transformations are recorded by the calibration software to compute the matrix
𝑇 𝑐𝑜𝑙𝑜𝑟→𝑡ℎ𝑒𝑟𝑚𝑎𝑙 . The calibration software finally saves the conversion parameters in a dedicated file
which is used by the extraction software to automatically detect the POIs on the thermographic output.
In our experiment, we repeated the calibration procedure for each driver to compensate for slight
differences in the positions of the cameras and head among different drivers.

2.4. Experimental design
To test our algorithm, we designed a 2 (traffic vs. low-traffic Traffic density) × 2 (automated
[ADL4] vs. manual [MD] Driving modality) within-participants experiment. Thirty-five expert drivers
(mean age = 41.61 years, standard deviation = 6.26 years) drove along two virtual scenarios (∼ 20
minutes [min] each) with varying traffic density. In both scenarios, the participants performed 10 min
in MD and 10 min in ADL4. The order of the traffic density and driving mode was randomly balanced
across drivers. During ADL4, they were instructed to supervise the system. We expect the arousal level
of the drivers to be modulated by these conditions, as the manual driving mode and the high traffic
condition are more demanding than the autonomous one and the low traffic condition respectively.

Figure 4: Experimental design. Each participant performed the tasks as it is illustrated above. The
arrows indicate that traffic conditions and the driving mode were randomized across participants.

2.5. Validation method

Figure 5: Annotation software interface. Left side: the annotator selects the POI and the frames to
be annotated. Right side: the annotator points at the location of the POI on the thermographic image
(in this example, the forehead and the nose tip are already annotated).

In order to validate the proposed algorithm, we selected randomly one of the two 20-min recordings
(high or low traffic) for each driver. Then, we extracted one pair of color and thermographic images
each 20 seconds. We obtained 65 images per driver and 2,340 in total. We then developed an annotation
software to manually extract the temperature on these images. For each image, we pointed at two
landmarks: the driver’s forehead and the driver’s nose tip. Four trained annotators performed the same
procedure on the 2,340 images. Then, we computed on each image the mean of the four annotated
positions of the nose tip and the forehead to establish the reference location of the nose tip and the
forehead. Finally, we extracted the temperature at these locations to define the reference temperature.

2.6. Statistical study on the obtained data
After removing the drivers on who the algorithm performed worse (see section 3.1.2), we used the
algorithm described in Section 2.2 to extract the face temperature of the remaining drivers (n = 28). In
order to obtain more measurement points, the algorithm at this stage ran at a higher frequency compared
to the validation phase: one each 2 sec instead of one each 20 sec. We were therefore able to remove
extreme values (lower than 25°C and higher than 37°C) as well as the outliers by applying a moving
median thresholding procedure. We finally took the mean of the remaining points on each of the four
segments: High traffic – Manual driving, High traffic – Automated driving, Low traffic – Manual
driving, Low traffic – Automated driving. This gave us four data points per driver that we later used in
our statistical analysis.

3. Results
3.1. Validation of the algorithm
We first analyzed the algorithm performance in terms of position error in the thermographic image,
measured in pixels. We computed the position error between the algorithm’s output and the mean values
provided by the four annotators (see 2.5). We also compared the position error of the mean position
error of each annotator with respect to the overall mean value. Then, we analyzed the consequences of
the algorithm position error in terms of temperature error, measured in degrees Celsius (°C).

3.1.1. Position error
In Figure 6, we present the errors’ distributions of the algorithm and the annotators. As a reference,
in our setup the nose tip measures approximately 10 x 10 pixels. When pointing at the nose, the
algorithm performed worse than the annotators with respect to the mean of the annotators. The two
main causes for high mismatches were landmarks estimation errors of MediaPipe and spatial
correspondence errors due to head movements (head turning or bending). Surprisingly, the algorithm
outperformed slightly the annotators on the forehead with respect to the mean of the annotators. Our
interpretation is that for a human, it could be hard to define a precise location on a large area with no
points of reference such as the forehead.

Figure 6: Position error (a 2D distance in pixels) distribution for the forehead and the nose. At the top,
we compared the algorithm to the mean of the annotators. At the bottom, we compared each
annotator to the mean of the annotators.
3.1.2. Temperature error
Figure 7 shows the errors of the final temperature values computed by the algorithm. On the
forehead, the temperature gradient was low, so the temperature error resulting from the position error
was small. On the nose, however, the temperature gradient was higher, so the temperature error was
also much higher compared to the forehead. Interestingly, the temperatures computed at the positions
annotated by one annotator are consistently smaller than the temperatures computed at the mean of the
annotated positions. This is because the face temperature exhibits a local peak on the nose and one
individual annotator is further from this peak than the mean position of the four annotators.

Figure 7: Temperature error (in °C) distribution for the forehead and the nose. At the top, we
compared the algorithm to the mean of the annotators. At the bottom, we compared each annotator
to the mean of the annotators.

Looking at Figure 8, we see that the mean absolute error of the nose temperature highly depends on the
driver (it goes up to 1.6 °C for some drivers). For the statistical study, we excluded the 6 participants
with an absolute error higher than 0.8 °C.

Figure 8: Algorithm mean absolute temperature error, for each driver (in °C).

4. Conclusion and future works
The present work describes the first results obtained with an algorithm for tracking a driver’s facial
skin temperature during driving interactions. The algorithm consistently and effectively tracked
participants’ facial-skin temperature without interfering with their driving tasks. We have analyzed the
position and temperature errors and for some drivers, tracking the nose tip temperature remains a
challenge. Future systems should improve both the initial landmarks detection and the landmark
transfer. The later could be achieved by measuring the distance between the cameras and the driver’s
face like previous studies [10] or considering the face as a 3D shape. Also, a calibration-less process
should be developed to be implemented in a real car. More analysis should be conducted before
publishing the results of a statistical study based on this work.

5. Acknowledgements
This study was funded by the European Union's Horizon 2020 research and innovation programme
under grant agreement No. 875597 - HADRIAN (Holistic Approach for Driver Role Integration and
Automation Allocation for European Mobility Needs) project. This document reflects only the authors'
view, the European Climate, Infrastructure and Environment Executive Agency (CINEA) is not
responsible for any use that may be made of the information it contains. We thank Leila Maboudi
(Polytechnic University of Turin, Italy) for her comments and assistance in language edition.

6. References
[1] L.L. Di Stasi, E. Gianfranchi, C. Diaz-Piedra, Hand-skin temperature response to driving fatigue:
an exploratory study, in: Krömker, H. (Eds), HCI in Mobility, Transport, and Automotive Systems.
Driving Behavior, Urban and Smart Mobility, HCII 2020, vol 12213 of Lecture Notes in Computer
Science, Springer, Cham, 2020, pp. 3–14. doi:10.1007/978-3-030-50537-0_1.
[2] C. Diaz-Piedra, E. Gomez-Milan, L.L. Di Stasi. Nasal skin temperature reveals changes in arousal
levels due to time on task: An experimental thermal infrared imaging study, Applied Ergonomics
81 (2019). doi: 10.1016/j.apergo.2019.06.001.
[3] Panasonic Corporation, Panasonic develops drowsiness-control technology by detecting and
predicting driver’s level of drowsiness, 2017. URL:
https://news.panasonic.com/global/press/data/2017/07/en170727-3/en170727-3.html
[4] M. Bodini. A review of facial landmark extraction in 2D images and videos using deep learning,
big data and cognitive computing 3, 14 (2019). doi: 10.3390/bdcc3010014.
[5] C. Lugaresi, J. Tang, H. Nash, C. McClanahan, E. Uboweja, M. Hays, F. Zhang, C.-L. Chang,
M.G. Yong, J. Lee, W.-T. Chang, W. Hua, M. Georg, M. Grundmann, 2019. MediaPipe: A
framework for building perception pipelines. arXiv:1906.08172.
[6] W.-T. Chu, Y.-H. Liu. Thermal facial landmark detection by deep multi-task learning, in 2019
IEEE 21st International Workshop on Multimedia Signal Processing (MMSP), 2019, pp. 1–6.
doi:10.1109/MMSP.2019.8901710.
[7] M. Kopaczka, R. Kolk, D. Merhof. A fully annotated thermal face database and its application for
thermal facial expression recognition, in 2018 IEEE International Instrumentation and
Measurement Technology Conference (I2MTC), 2018, pp. 1–6. doi:
10.1109/I2MTC.2018.8409768.
[8] H. Yoshikawa, A. Uchiyama, T. Higashino. ThermalWrist: Smartphone thermal camera correction
using a wristband sensor †, Sensors 19 (2019). doi: 10.3390/s19183826.
[9] L. Sun, Z. Zheng. Thermal-to-visible face alignment on edge map, IEEE Access 5 (2017) 11215–
11227. doi: 10.1109/ACCESS.2017.2712159.
[10] C. Filippini, E. Spadolini,D. Cardone, D. Bianchi, M. Preziuso, C. Sciaretta, V. Del Cimmuto, D.
Lisciani, A. Merla. Facilitating the child–robot interaction by endowing the robot with the
capability of understanding the child engagement: The case of mio amico robot, International
Journal of Social Robotics 13 (2019) 677-689. doi: 10.1007/s12369-020-00661-w.
[11] C. Goulart, C. Valadão, D. Delisle-Rodriguez, D. Funayama, A. Favarato, G. Baldo, V. Binotte,
E. Caldeira, T. Bastos-Filho. Visual and thermal image processing for facial specific landmark
detection to infer emotions in a child-robot interaction, Sensors (Basel) 19 (2019). doi:
10.3390/s19132844.
[12] R. Artzy, Linear Geometry, Dover Publications, New York, NY, 1993.