=Paper=
{{Paper
|id=Vol-3777/short2
|storemode=property
|title=Estimation and Visualization of Webcam Eye Tracking for Text Reading
|pdfUrl=https://ceur-ws.org/Vol-3777/short2.pdf
|volume=Vol-3777
|authors=Anastasiia Grynenko,Olena Turuta,Ruslan Kasheparov,Olga Kalynychenko,Oleksii Turuta
|dblpUrl=https://dblp.org/rec/conf/profitai/GrynenkoTKKT24
}}
==Estimation and Visualization of Webcam Eye Tracking for Text Reading==
Estimation and Visualization of Webcam Eye Tracking
for Text Reading
Anastasiia Grynenko1, Olena Turuta1, Ruslan Kasheparov1, Olga Kalynychenko1 and
Oleksii Turuta1
1
Kharkiv National University of Radio Electronics, Nauky Ave. 14, Kharkiv, 61166, Ukraine
Abstract
With increasingly popular eye-tracking research, we wondered if it was necessary to use special equipment
– eye trackers – for such experiments. Webcams are inexpensive and everyone has them at home. But are
they precise enough to be used in text reading experiments? We tested whether an ordinary webcam can
determine where the reader's eyes are located when reading text from a screen. We found that a webcam
can be used to detect a line at certain text sizes and line spacing. However, the webcam is not suitable for
capturing individual words or letters. This requires more accurate eye trackers.
Keywords
Eye-tracking, gaze detection, NLP, Corpus, Artificial Intelligence Ethics, Diagnostic Accuracy1
1. Introduction
Eye-tracking is the measurement of eye movements and determination of the gaze, which is a
location where the person is looking. Eye movement has been of interest to scientists since the 19th
century. One of the first important milestones was the discovery by Louis Émile Javal that reading
is not a smooth movement over a text, but a series of short stops (fixations) and quick jumps
(saccades). Since the 1970s, research on gaze tracking in reading has gained popularity, especially
the Rayner study [1]. An important idea is the hypothesis put forward in 1980 by Just and Carpenter
about the strong eye-mind, according to which what the gaze is fixed on is processed [2]. There are
also other hypotheses, in particular the immediacy hypothesis, according to which the eyes do not
move until all processing is completed. Moreover, there are opposing ideas that eye movements do
not really reflect moment-by-moment cognitive processing demands during reading. In any case,
scientists agree that gaze tracking can provide important data about reading.
Initially eye movement was studied to determine its role in reading and language comprehension.
Later psycholinguists began to use this technology to organize the process of language learning for
different social groups, to improve cross-cultural communication in business, as well as for many
theoretical questions about understanding how people perceive language. Eye-tracking is not only
used in reading text, but also in listening to music, typing, and visual search. There are many
applications of this technology [3].
2. Relevance
The technology is widely used in education, both in the form of student eye tracking (tasks to
determine student engagement, academic performance, and skills assessment) and in the form of
teacher eye tracking (professional eye tracking). In addition, there is an idea to explore the
methodology of performing tasks with eye tracking with the possibility of turning them into
ProfIT AI 2024: 4th International Workshop of IT-professionals on Artificial Intelligence (ProfIT AI 2024), September 25–27,
2024, Cambridge, MA, USA
anastasiia.hrynenko@nure.ua (A. Hrynenko); olena.turuta@nure.ua (O. Turuta); ruslan.kasheparov@nure.ua (R.
Kasheparov); olga.kalynychenko@nure.ua (O. Kalynychenko); oleksii.turuta@nure.ua (O. Turuta)
0000-0003-0263-3701 (A. Hrynenko); 0000-0002-1089-3055 (O. Turuta); 0000-0001-7526-7912 (O. Turuta); 0000-0003-
1466-3967 (O. Kalynychenko); 0000-0002-0970-8617 (O. Turuta)
© 2024 Copyright for this paper by its authors.
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
Wor
Pr
ks
hop
oceedi
ngs
ht
I
tp:
//
ceur
-
SSN1613-
ws
.or
0073
g
CEUR Workshop Proceedings (CEUR-WS.org)
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
educational material. Eye movement has also helped in the training of some professions, such as
pilots, by developing methods for modeling complex realized events, detecting errors, and assessing
consequences.
Marketing, human-computer interaction, and neuroergonomics are also well-known applications
of eye tracking for market research, audience research, as an input device, and for adaptation in the
workplace, respectively [4]. There are ideas for using tracking data to improve NLP models, to
evaluate them, and to explore the possibility of applying the models to other languages and problems
[5-8]. There are also many other applications of gaze tracking systems, such as research on
perception and cognitive processes, lie detection [9-10], and detection of deepfakes [11-12].
Another interesting use of the system is to diagnose a person's mental state, which is especially
relevant in times of full-scale war. An example of the realization of such an idea is the Ukrainian
startup Anima [13]. Indeed, many Ukrainians, especially the military, suffer from constant stress that
puts a strain on their psychological state. It is important to constantly monitor and prevent possible
deterioration of psychological health in order to understand the readiness of the military for combat.
Gaze tracking systems can help with this, as a shifted distribution of attention to stimuli of different
emotional colors may indicate certain deviations. This idea can be expanded to diagnose various
other mental illnesses, as there are already publications describing ways to diagnose them by
studying a person's gaze.
3. Eye-tracking features
Calibration is necessary to adapt the system algorithm before applying it to the person sitting in
front of the eye-tracker.
Some systems do not need calibration, but most still require it. Calibration is possible when the
point the person is looking at is known. Mobile device calibrations vary, and their instructions are
in the manuals. For a screen-based calibration, you need to focus your gaze on the targets shown on
the screen. The more targets in different parts of the screen are analyzed, the more accurate the
system will be, but this can be time consuming and inconvenient for some experiments.
In our experiment, we want to collect data that can be useful to different professionals interested
in the application of eye tracking in their field, and to evaluate the quality and accuracy of the
resulting system.
Since in our experiment we will not use special high-precision equipment that may not require
calibration, and the format will be screen-based, we will use points from different parts of the screen
for calibration, namely 3 rows of 3 points, with one point appearing in the center of the screen and
the rest in the perimeter of the screen with 4 points in the corners.
4. Datasets
Eye-tracking reading experiments often use fixations (place, duration) and saccades (length,
duration, start and end point) as captured primitives. For more detailed experiments, the number of
fixations per 100 words, whether they are progressive or regressive, and the frequency of regressions
are observed.
There are eye-tracking datasets with 2 different types of input data: the original video with the
specified characteristics of the experiment and already processed format, where with the original
parameters of the eyes the resultant point on the screen is indicated [14].
Two methods are used for text experiments: rapid serial visual presentation (RSVP), in which the
words are presented at a set rate in the same location, and self-paced reading, in which only several
words are presented on the screen at a time and readers advance the text by pushing a button. There
are variations of the self-paced method. Some experiments include the ability to look back in the
text. Some display only one word, while others display several at once. Sometimes the words are in
the same place, and sometimes they are presented spatially, as in a normal text. There are also studies
that consider a text as a time-series process [15] or based on an attention mechanism [16].
As for textual datasets, they often consist of a single sentence and are monolingual, although
there are also longer texts and multilingualism available, such as GECO [17] and MECO [18] datasets
[19]. It is also important to note that most of the existing datasets are in English. The Ukrainian
language is considered to be a low resource language with very few existing datasets [20]. Therefore,
the text in Ukrainian will be used in our experiment.
5. Hardware
Hardware differs by specialty, human interface, scope of tracking, and technical characteristics.
Special trackers refer to any additional gaze-detection hardware, when non-special is a standard
webcam. The interface can be head-stabilized, remote, and mobile (also called "head-mounted"). In
terms of tracking area, devices are differentiated into those that use a computer screen as the
stimulus area; those that can operate in more complex geometries, such as a multi-screen booth; and
those that can operate in the real world. Technical characteristics include accuracy, sampling rate,
resolution, etc.
We divided trackers by human interface, additionally adding non-special devices (webcams). We
get the following groups: head-stabilized, remote, mobile and non- special.
Head-stabilized eye tracking is usually much more accurate than other types of trackers, so they
are used in neurophysiological experiments, where participants' comfort is less important than
system accuracy and precision, as well as in experiments with animals. However, such trackers are
uncomfortable and not designed for experiments where natural responses with head movements are
important.
Remote systems consist of a camera and an infrared source. Most often such cameras are mounted
under the screen, as the pupil is more visible from that position due to the shape of the eye and
eyelids. Remote eye tracking is useful in experiments with infants and in natural interactions.
Limitations for such trackers are a fixed working area beyond which gaze tracking is difficult, the
tolerance of head movements affecting the inaccuracy of results, sunlight reflected in participants'
eyes, and the problem of capturing more than one participant.
Mobile gaze tracking is often in the form of goggles that include a scene recording camera, gaze
capture cameras, and illuminators. Such devices are a great option for real-world research. There is
the problem of sunlight, but it is solvable. In addition there is a problem of the difficulty of tracking
the gaze at the periphery and the use of a relative coordinate system that differs greatly from
participant to participant.
Non-special equipment does not have infrared beams or a head stabilizer, so its effectiveness is
orders of magnitude lower. However, a significant advantage is the popularity of web cameras.
As a result of comparing different hardware, it was concluded that greater accuracy is given by
special devices, especially with a head lock. But since webcams are much more common and do not
require additional costs, we would like to investigate camera accuracy. The camera used in the
experiment has a frame rate of 30 FPS and a resolution of 1280x720.
6. Software
These days, there are various programs that allow eye-tracking, both with special eye trackers and
with general webcams [21]. Our focus was on those of them that can be applied to general webcams.
A comparison of different eye-tracking software is shown in Table 1.
Table 1
Comparative characteristics of the software
Name Format Programming Detection objects Accessibility
language
PyGaze Toolbox for Python Coordinates on the Open source
eye- screen, position and
tracking duration of fixations,
saccades
WebGazer.js Web JavaScript Coordinates on the Open source
application screen
PyGazeAnalyser Toolbox for Python Position and duration of Open source
analysis and fixations, saccades, than
plotting fixation map, scanpath
eye- and heatmap
tracking
data
xLabs Browser JavaScript, Coordinates on the No longer
extension C++ screen supported
for Google
Chrome
OGAMA Windows C#.NET Coordinates on the Open source, no
desktop screen, position and longer supported
application duration of fixations,
saccades
Although some software can detect, for example, saccades, it can only be achieved using special
equipment, because the frequency of the web-cameras is not sufficient for this.
Having compared and worked with the above-mentioned programs, it was decided to use
WebGazer.js for the experiment. Although this software works relatively roughly and cannot capture
saccades and fixations, due to its well-developed community and available documentation it is
excellent for conducting small experiments.
7. Data preparation
In preparation for an upcoming experiment, the task at hand involves the careful and meticulous
preparation of data. To accomplish this, we have developed a specialized service designed to generate
custom images with specific parameters. These parameters include the picture resolution, aimed at
optimizing the display quality on the user's screen, as well as the desired text, font style, font size,
spacing between words, spacing between sentences, distance to the screen border, and the desired
position of the text on the screen.
By leveraging our innovative service, a seamless process is established for creating a high-quality
PNG image that precisely adheres to the specified parameters. Additionally, a corresponding JSON
file is generated, which contains metadata that comprehensively describes the various parameters of
the image. This includes vital information such as the exact placement coordinates for each word
within the image.
By providing this comprehensive and meticulously detailed output, our service ensures that the
necessary data is readily available for the future experiment, enabling a smooth and accurate analysis
of the acquired information.
The final generated image contains the text itself and additional borders. The word boundaries
calculated by the program according to the text metadata are displayed in green.
Figure 1 shows the generated final image, using the data taken from the metadata.
Figure 1: The generated final image, using the data taken from the metadata
The user perceives the image without the boundaries of individual words.
8. Experiment description
The primary objective of this paper is not to conduct a comprehensive study on eye movement
during text-related tasks. Instead, its main focus is to introduce a system that can potentially facilitate
such experiments. The paper includes an experiment that serves two purposes: firstly, to showcase
the viability of employing an eye-tracking system, and secondly, to examine the feasibility of
substituting costly eye trackers with a simpler webcam solution.
The goal of the experiment is to determine the extent to which webcams can be useful using
WebGazer.js to determine the position in the text, where the participant focuses. For this purpose,
Ukrainian text was taken, which was projected on a 14'' FullHD screen with different font sizes and
line spacing. The result of the experiment was to select the optimal ratio of text parameters.
At the stage of preparation for the experiment 16 variants of text with Calibri font and font size
from 24 pt to 32 pt with the step of 4 pt, and with the line spacing from 1.5 to 3 with the step of 0.5
were formed.
The experiment involved people with different eye colors without special features (e.g., glasses).
Each participant had to wait for the camera to capture their face and then underwent calibration at
9 points after starting the app. During the calibration, each point had to be clicked 5 times while
looking at it. After a successful calibration, the system's accuracy was calculated when looking at the
center point and displayed on the screen. The system-determined point of view was displayed as a
blue dot. The text parameters could be changed from the top menu bar. During the experiment,
attention was paid to how accurately the place where the gaze was directed could be determined.
For this purpose, hits in a word were examined.
Percentages of hits in the word was calculated using the following equation
𝑁! (1)
𝑃= ,
𝑁"
where 𝑁! is the fraction of hits, i.e. the number of hits on the container that contains the word, and
𝑁" is the number of the number of times the container was missed calculated for 5 seconds of gaze.
The hit or miss accuracy is evaluated at regular intervals of 10 milliseconds.
To activate the accuracy calculation function, we used the right mouse click at the place where
the gaze is actually directed.
9. Heat maps
Heat maps in eye tracking play a crucial role in visualizing and analyzing gaze focus data. By utilizing
color intensity, heat maps provide a graphical representation that indicates the level of eye attention
in specific areas of an image or screen. This enables researchers and practitioners to gain insights
into the patterns of visual exploration and focus during various tasks.
Real-time display of eye movements is particularly valuable in tasks that require immediate
feedback or interaction. The inclusion of real-time heat maps allows for the visualization of the
current gaze location on the screen, providing instant visual feedback to both the user and the
experimenter.
One of the advantages of heat maps is their ability to accumulate information over time. By
highlighting areas on the screen where the gaze was directed for longer durations with stronger and
more noticeable colors, heat maps effectively capture the salient regions of interest. This feature aids
in reducing noise or transient eye movements, as the emphasis is placed on areas that received
sustained attention.
Heat maps can be employed in various domains, including usability testing, user experience
research, website optimization, and advertising analysis. They provide a valuable tool for
understanding visual attention patterns, optimizing user interfaces, and enhancing the overall user
experience.
Figure 2 shows the heat map display.
Figure 2: Screenshot of the heat map
The image clearly demonstrates the informative nature of heat maps, as they vividly depict the
visual information regarding the focus of attention.
10.Post-processing of received data
During the experiment, the program collects important data about the user, in particular the
coordinates of his gaze, namely the x and y coordinates and the corresponding timestamps. This data
is then organized and stored in a structured format in a CSV (Comma-Separated Values) file. The
collected data is then transmitted to a server for further processing and analysis.
11.Results
Figure 3 shows the application during the accuracy assessment at a particular location.
Figure 3: Screenshot of the application during the experiment
Based on the results of the participation of several people who tested the accuracy of the three
places in the text for each set of parameters, a table 2 was constructed with line spacing on the rows
and font size on the columns, where their intersection shows the percentage of hits in the line. It is
worth mentioning that the values were calculated by taking the arithmetic mean of accuracy across
all participants.
Table 2
Experiment results
\ Font size 24 28 32
Line spacing
1.5 53 60 16
2.0 64 71 31
2.5 69 64 87
3.0 41 67 84
Despite the expected linearity of the data, we got a different result. We can explain it by the
inaccuracies of the program, lack of experiments and by the internal processes of the brain, which
lead to uncontrolled eye movements.
According to the results of the experiment, a word can be caught with a webcam with a font size
not less than 32 pt and a line spacing not less than 2.5 with an accuracy more than 84%, but to
determine the letter that the participant is looking at, the webcam is not enough.
12.Conclusion
The results of the experiment showed that a webcam can be used to recognize a line that reads text.
However, compared to the results that can be obtained using special equipment, webcams do not
work accurately. The plan is to experiment with more accurate special trackers in order to separate
the letter at which the gaze is directed with greater accuracy.
The experiment yielded a discovery regarding the feasibility of utilizing webcams for eye-tracking
in the context of reading texts on general devices. This finding highlights a crucial advancement in
the field, as it demonstrates the successful adaptation of commonly available webcams for capturing
and analyzing eye movement data. The ability to utilize webcams for eye-tracking opens up new
possibilities for researchers and practitioners, as it eliminates the need for specialized and expensive
eye-tracking equipment. This not only expands the accessibility of eye-tracking technology but also
presents opportunities for conducting eye-tracking studies on a larger scale and in diverse settings.
The implications of this finding extend beyond the scope of the experiment and have the potential
to shape future research methodologies and applications in the field of eye-tracking.
Acknowledgements
This publication is based upon work from COST Action CA21131, supported by COST (European
Cooperation in Science and Technology).
References
[1] Rayner, K. (1998). Eye movements in reading and information processing: 20 years of research.
Psychological Bulletin, 124(3), 372–422.
[2] Just, M. A., & Carpenter, P. (1980). A theory of reading: From eye fixations to comprehension.
Psychological Review, 85, 109-130.
[3] Krakowczyk, D. G., Reich, D. R., Chwastek, J., Jakobi, D. N., Prasse, P., Süss, A., . . . Jäger, L. A.
(2023). Pymovements: A python package for eye movement data processing. Paper presented at
the Eye Tracking Research and Applications Symposium (ETRA), doi:10.1145/3588015.3590134
[4] Peysakhovich, Vsevolod & Lefrançois, Olivier & Dehais, Frédéric & Causse, Mickael. (2018). The
Neuroergonomics of Aircraft Cockpits: The Four Stages of Eye-Tracking Integration to Enhance
Flight Safety. Safety. 4. 8. 10.3390/safety4010008.
https://www.researchgate.net/publication/323441671_The_Neuroergonomics_of_Aircraft_Coc
kpits_The_Four_Stages_of_Eye-Tracking_Integration_to_Enhance_Flight_Safety
[5] Hollenstein, N., Barrett, M., & Beinborn, L. (2020). Towards Best Practices for Leveraging Human
Language Processing Signals for Natural Language Processing. LINCR.
https://drive.google.com/file/d/1FxZso4wgjz2PFrKsZC7Elb-L5PXYZdEJ/view
[6] Erdem, Erkut & Kuyu, Menekşe & Yagcioglu, Semih & Frank, Anette & Parcalabescu, Letitia &
Plank, Barbara & Babii, Andrii & Turuta, Oleksii & Erdem, Aykut & Calixto, Iacer & Lloret,
Elena & Apostol, Elena Simona & Truică, Ciprian-Octavian & Šandrih Todorović, Branislava &
Martinčić-Ipšić, Sanda & Berend, Gábor & Gatt, Albert & Korvel, Gražina. (2022). Neural Natural
Language Generation: A Survey on Multilinguality, Multimodality, Controllability and
Learning. Journal of Artificial Intelligence Research. 73. 1131-1207. 10.1613/jair.1.12918.
[7] Hahn, M., & Keller, F. (2016). Modeling Human Reading with Neural Attention. Conference on
Empirical Methods in Natural Language Processing. https://aclanthology.org/D16-1009.pdf
[8] Barkovska, O., Pyvovarova, D., Kholiev, V., Ivashchenko, H., & Rosinskyi, D. (2021). Information
object storage model with accelerated text processing methods. Paper presented at the CEUR
Workshop Proceedings, , 2870 286-299.
[9] Fang, X., Sun, Y., Zheng, X., Wang, X., Deng, X., & Wang, M. (2021). Assessing Deception in
Questionnaire Surveys With Eye-Tracking. Frontiers in Psychology, 12.
https://www.frontiersin.org/articles/10.3389/fpsyg.2021.774961/full
[10] Ge, F., Yang, X.Q., Chen, Y.X., Huang, H.L., Shen, X.C., Li, Y., & Hu, J.M. (2020). Application of
Eye Tracker in Lie Detection. Fa yi xue za zhi, 36 2, 229-232.
https://pubmed.ncbi.nlm.nih.gov/32530172/
[11] Gupta, P., Chugh, K., Dhall, A., & Subramanian, R. (2020). The eyes know it: FakeET- An Eye-
tracking Database to Understand Deepfake Perception. Proceedings of the 2020 International
Conference on Multimodal Interaction. https://arxiv.org/pdf/2006.06961.pdf
[12] Demir, I., & Ciftci, U.A. (2021). Where Do Deep Fakes Look? Synthetic Face Detection via Gaze
Tracking. ACM Symposium on Eye Tracking Research and Applications.
https://arxiv.org/pdf/2101.01165.pdf
[13] Neurobiological test of mental state Anima. https://ua.anima.help/
[14] K.Krafka*, A. Khosla*, P. Kellnhofer, H. Kannan, S. Bhandarkar, W. Matusik and A. Torralba Eye
Tracking for Everyone. IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
2016 rXiv, abs/1606.05814.
[15] A. Yerokhin, O. Turuta, A. Babii, A. Nechyporenko and I. Mahdalina, "Usage of phase space
diagram to finding significant features of rhinomanometric signals," 2016 XIth International
Scientific and Technical Conference Computer Sciences and Information Technologies (CSIT),
2016, pp. 70-72, doi: 10.1109/STC-CSIT.2016.7589871.
[16] Brandl, S., & Hollenstein, N. (2022). Every word counts: A multilingual analysis of individual
human alignment with model attention. ArXiv, abs/2210.04963.
[17] Cop, U., Dirix, N., Drieghe, D. et al. Presenting GECO: An eyetracking corpus of monolingual
and bilingual sentence reading. Behav Res 49, 602–615 (2017). https://doi.org/10.3758/s13428-
016-0734-0.
[18] The Multilingual Eye-tracking Corpus (MECO) https://meco-read.com/category/data-news/
(Siegelman et al., 2022; Kuperman et al., 2022).
[19] D. Dashenkov, K. Smelyakov and O. Turuta, "Methods of Multilanguage Question Answering,"
2021 IEEE 8th International Conference on Problems of Infocommunications, Science and
Technology (PIC S&T), 2021, pp. 251-255, doi: 10.1109/PICST54195.2021.9772145.
[20] Panchenko, D., Maksymenko, D., Turuta, O., Luzan, M., Tytarenko, S., Turuta, O. (2022).
Ukrainian News Corpus as Text Classification Benchmark. In: , et al. ICTERI 2021 Workshops.
ICTERI 2021. Communications in Computer and Information Science, vol 1635. Springer, Cham.
https://doi.org/10.1007/978-3-031-14841-5_37.
[21] Punde, P.A.; Jadhav, M.E.; Manza, R.R. A study of eye tracking technology and its applications.
In Proceedings of the 2017 1st International Conference on Intelligent Systems and Information
Management (ICISIM), Aurangabad, India, 5–6 October 2017; IEEE: Piscataway, NJ, USA, 2017;
pp. 86–90.