Analysing program source code reading skills with
              eye tracking technology
                              Vilius Turenko, Simonas Baltulionis, Mindaugas Vasiljevas, Robertas Damaševičius
                                                    Department of Software Engineering
                                             Kaunas University of Technology, Kaunas, Lithuania
                                                       robertas.damasevicius@ktu.lt

   Abstract—Many areas of software engineering require good               visual attention [8]. For example, Uwano et al. [9] studied
program code reading skills. We analyse the process of program            graduate students conducting code reviews and discovered
reading using gaze tracking technology. We performed a study              that their gaze patterns followed a common scanpath, first
with six subjects, who performed four code reading tasks. The             reading code top to bottom, and then rereading a few parts in
errors the embedded into program sources code and the lines of            more depth. Chandrika et al. [10] confirmed the positive
code with the areas were analysed as Areas of Interest (AoI). We          relationship of eye tracking traits over source code lines and
formulated a research hypothesis and tested it using a one-way            comments for code comprehension. Melo et al. [11] analysed
analysis of variance (ANOVA) test. The results of the study               how programmers debug code with embedded pre-processor
confirmed our research hypothesis that the number of fixations
                                                                          directives. Jbara and Feitelson [12] analysed how code
on AoI is larger than the number of fixations on other areas.
                                                                          repeatability impacts the number of fixations in a predefined
    Keywords—program comprehension, code reading; eye                     area of interest (AOI), and the total fixation time. Beelder and
tracking, gaze tracking, human-centered computing.                        Plesis [13] analysed how the number and durations of
                                                                          fixations are influenced by syntax highlighting. Yennigall et
                         I. INTRODUCTION                                  al. [14] also used fixation counts and duration to analyse how
    Program code reading skills are important in many areas               programming novices understood program code.
of software engineering, especially, in adopting good code                    In this paper, we describe the results of gaze tracking study
writing practices and techniques, understanding how                       on evaluating and analysing the code reading skills of software
programs work, identifying cases of poor programming style                programmers, specifically focusing on the ability to find errors
and bad design, and delivering effective software                         in program code.
maintenance. Examples include program tracing and
searching for bugs, code smells and design anti-patterns [1].                                    II. METHODOLOGY
As automatic methods for finding bugs and poor coding
                                                                          A. Program reading tasks
practices are still not very effective [2], source code reading
and analysis by human experts remain as relevant as ever.                    The study consisted of 4 tasks:
Program comprehension is a crucial part of computer science                  a. In Task 1, the aim was to read the program source code
education, providing an important part of an understanding of             with the aim of finding the result returned (printed) (Fig. 1).
complexity of information technology (IT) systems [3]. The
interest on applying gaze tracking in the context of multimedia              b. In Task 2, the aim was to identify the purpose of the
supported learning is on the rise [4]. Gaze data had been                 algorithm and discover the hidden error associated with the
successfully applied to analyze changes in cognitive load                 incompatibility of the variable types (Fig. 2).
during assimilation of learning materials and are starting to be              c. In Task 3, the aim was to find three syntactic errors
incorporated into adaptive e-Learning systems [5]. However,               related to the incorrect use of variable names, types and basic
currently there are no effective strategies in evaluating code            methods (Fig. 3).
reading skills and assessing program comprehension.
Recently, eye tracking and was proposed as a viable research                 d. In Task 4, the aim was to determine whether the
instrument for evaluating source code reading [6]. The                    algorithm would perform the specified function, and to find a
outcomes of gaze tracking studies are especially relevant in              hidden semantical error (Fig. 4).
the context of Evidence-based Software Engineering (EBSE)
in order to provide detailed insights regarding different
practices in software engineering [7].
     Eye movements are directly related to cognitive and
information processing processes, and through these
processes, visual information is used to stimulate the brain and
to understand the given task. There are two assumptions
related to cognitive processes and fixations: 1) if a person is
seeing an object (such as a word), he/she tries to understand
it; 2) a person fixates his/her gaze on an object until he/she
understands it. A fixation is an aggregation of gaze points
based on a specified area and time span. An Area of Interest
(AoI) is a part of a visual stimulus that is of special importance
Other important characteristics are a scan path, which is a
series of fixations that indicate the path and tendency of eye            Fig. 1. Program source code with Area of Interest (AoI) highlighted for
                                                                          Task 1: calculate output of a program
movements, and a heat map, which identifies the focus of
© 2019 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0)


                                                                     33
                                                                               C. Research hypotheses
                                                                                   We assume that subjects are thinking about the object of
                                                                               interest when they are looking directly at it. Based on this
                                                                               assumption, we formulate the following research hypothesis:
                                                                                  H1: The number of fixations on Areas of Interest is larger
                                                                               than the number of fixations on other areas.
                                                                               D. Testing of hypotheses
                                                                                   For testing of hypotheses we employ a statistical one-way
                                                                               analysis of variance (ANOVA) test. The test, which is a
                                                                               standard statistical method, confirms or rejects the equality of
                                                                               the averages of two or more samples by examining the
                                                                               variances of samples. ANOVA compares the variance
                                                                               between the data samples to variance within each particular
                                                                               sample. If the between-sample variance is much larger than
Fig. 2. Program source code with Area of Interest (AoI) highlighted for        the within-sample variance variation, the average values of
Task 2: find syntactic error                                                   different samples can not be equal.
                                                                                       III. EXPERIMENTAL SETTING AND RESULTS
                                                                               A. Experimental settings
                                                                                   Six participants (1 female and 5 male) were recruited for
                                                                               this study, ages between 20 and 25 with an average of 22.8
                                                                               years. All participants had normal or corrected-to-normal
                                                                               vision. Participants were familiar with computers and had
                                                                               previous experience in using the internet and all of them were
                                                                               studying or working in programming sphere. An informed
                                                                               consent was obtained from subjects before the study.
                                                                                   All subjects were given the same laptop Dell which had an
                                                                               additional monitor used for experiment and the Tobii Eye
                                                                               Tracker 4C eye-tracking device used to record eye movements
                                                                               and gaze fixations. The eye tracker uses infrared corneal
                                                                               reflection to measure point of gaze with data rates of 90 Hz. A
Fig. 3. Program source code with Area of Interest (AoI) highlighted for        24 inch screen was used to show the slides which consisted of
Task 3: find multiple syntactic errors                                         programming source code. The eye tracker using instructions
                                                                               was mounted just below the visible screen area. The operating
                                                                               distance between the eye tracker and subjects’ eyes was
                                                                               between 70-75 cm. Efforts were made to ensure good lighting
                                                                               and a device calibrated before the test. For each subject the
                                                                               eye tracker was re-calibrated using an integrated 5-point
                                                                               calibration to achieve most accurate results.
                                                                                    Before the start of the experiment, the subjects were asked
                                                                               to fill in the Google Form questionnaire before the start of the
                                                                               study on their demographical characteristics (gender,
                                                                               education, age, level of programming skills). All responses
                                                                               were anonymized. After filling personal characteristics
                                                                               subjects had a chance to read some common information
                                                                               about tasks that they will face in this experiment, this way
                                                                               subjects were informed about some important rules, for
                                                                               example, no additional libraries or other extensions were used,
Fig. 4. Program source code with Area of Interest (AoI) highlighted for
                                                                               also that some tasks were bug free and some had hidden bugs,
Task 4: find semantic error                                                    the idea was to stimulate the subjects to be focused by not
                                                                               telling what tasks had bugs and what were bug free. After
B. Data collected by gaze tracking                                             introducing tasks in common, the presentation with the slides
                                                                               containing the source code of tasks was opened, the
    During gaze tracking we collect the number and location
                                                                               observation session started at the start of each task and the
of fixations, which are gaze points that are directed towards a
                                                                               session was stopped after the task was completed, each task
certain part of an image, which is labelled as Area of Interest
                                                                               had a separate observation session. 3 and 4 tasks had some
(AoI). Fixations are indications of visual attention. Here we
                                                                               brief information about given algorithms, for example,
analyze the distribution of the number of fixations between
                                                                               definition of palindrome and Armstrong’s number and
and out of AoIs. The eye movements between fixations are
                                                                               examples of each case. To complete each task, 90 seconds
known as saccades. However, we do not use the saccade data
                                                                               were given. After the completion of each task, the participants
in this study. A scan path is a directed path created by saccades
                                                                               were asked to provide the answers in a Google Form on what
between eye fixations.
                                                                               is the result of program execution (Task1), what is the purpose

                                                                          34
of an algorithm (Task 2), and is the program correct (Task 3,             option to choose screen resolution manually, which will
Task 4).                                                                  allow to select concrete zones of interest.
B. Experimental system
                                                                          C. Results
    Gaze monitoring system was used to measure the number
                                                                             The results of participants (number of fixations) are
and duration of fixations in the Areas of Interest (AOIs). The
                                                                          summarized according to tasks and subjects in Fig. 6.
system consists of components listed below (see Fig. 5).
 The Data Gathering Module reads the raw gaze data from
   the eye tracker device via USB.
 The Data Preprocessing Module filters noise, calculates
   additional metrics and characteristics like saccades.
 The Data Persistence Module saves the acquired gaze data
   to CSV, XML or database.
 The Data Post-processing Module maps persisted gaze
   data to AOIs and calculates additional data features such
   as the total and average number and duration of fixations.
 The Configuration Module configures how data is
   gathered and persisted in the system.


                                                                          Fig. 6. Summary of the number of fixations according to subjects and tasks

                                                                              An example of the gaze path generated from gaze tracking
                                                                          data is presented in Fig. 7. The gaze path shows how and in
                                                                          what sequence the subject has read the code. Note the order of
                                                                          reading is clearly not linear.


Fig. 5. Architecture of a system

       System offers four types of data stream which are used
to gather fixations and saccades directly from gaze tracking
device.

      Unfiltered gaze

      Lightly filtered gaze

      Sensitive fixation

      Slow fixation

      For this experiment, sensitive fixation type was chosen             Fig. 7. Example of a gaze path (Task 1, Subject 1)
because of its accuracy and unnecessary noise reduction.
                                                                              An example of the heatmap generated from gaze tracking
        In addition, the system is running in the background              data is presented in Fig. 8. Note that most of attention was
and it has no effect on the stimulus, thus the subject's attention        focused on and around the Area of Interest centred on code
is concentrated only to source code.                                      line 42 (see also Fig. 1).
       Besides types of data stream, before starting gaze
tracking session, user has an option to choose to record his
screen, but for now it is only a prototype version, which needs
to be improved for better accuracy, also session can have
additional information about subject, for example name, age
and other description, if it is not necessary, user can select
anonymous session. In the near future, system will offer an


                                                                     35
                                                                                               stimulus without considering the quality of responses.
                                                                                               Moreover, due to a small sample of subjects and gender bias
                                                                                               (all participants were male) we could not analyse the gender
                                                                                               and affective differences, which have been noted as significant
                                                                                               in other gaze tracking studies [15]. To minimize threats to
                                                                                               validity, the participants did not know about the hypothesis
                                                                                               formulated for the research. They only knew that they would
                                                                                               be in helping us to understand how program code is read and
                                                                                               understood.
                                                                                                   In three tasks of four performed we were able to confirm
                                                                                               our research hypothesis. In one, task the hypothesis could not
                                                                                               be confirmed. We think that we reason was in poor design of
                                                                                               the task, which we hope to improve in our further research.

Fig. 8. Example of a gaze fixation heatmap (Task 1, Subject 1)
                                                                                                                       IV. CONCLUSION
                                                                                                   We have presented a study aimed at comprehending how
    In Fig. 9, the average gaze fixation numbers for AoI and                                   programmers read and debug program code. Our results
Non-AoI areas is presented. We can see that for all tasks, the                                 indicate that gaze tracking can be used successfully to follow
number of fixations on AoIs was larger, although the                                           and assess the cognitive behaviour of programmers as they
difference was not statistically significant for Task 2 (also see                              correctly identify the errors embedded into the source code.
the results of statistical testing using ANOVA in Table I).                                    The number of gaze fixations is a significant parameter when
                                                                                               assessing the level of attention attributed to a particular Area
                                                                                               of Interest.
                                                                                                   Future work will focus on the methodological
                                                                                               improvement of our research study and collection of a larger
                                                                                               dataset of data from more subjects.
                                                                                                                           REFERENCES
                                                                                               [1]  Obaidellah, U., Al Haek, M., & Cheng, P. C. (2018). A survey on the
                                                                                                    usage of eye-tracking in computer programming. ACM Computing
                                                                                                    Surveys, 51(1) doi:10.1145/3145904
                                                                                               [2] Gupta, A., Suri, B., Kumar, V., Misra, S., Blažauskas, T., &
                                                                                                    Damaševičius, R. (2018). Software Code Smell Prediction Model
                                                                                                    Using Shannon, Rényi and Tsallis Entropies. Entropy, 20(5), 372.
                                                                                                    doi:10.3390/e20050372
                                                                                               [3] Damaševičius, R. (2009). On The Human, Organizational, and
                                                                                                    Technical Aspects of Software Development and Analysis. In
                                                                                                    Information Systems Development (pp. 11–19). Springer US.
                                                                                                    doi:10.1007/b137171_2
                                                                                               [4] Alemdag, E., & Cagiltay, K. (2018). A systematic review of eye
Fig. 9. Average number of fixations on AoI vs non-AoI source code lines                             tracking research on multimedia learning. Computers and Education,
                                                                                                    125, 413-428. doi:10.1016/j.compedu.2018.06.023
    The results of statistical testing using ANOVA are                                         [5] Rosch, J. L., & Vogel-Walcutt, J. J. (2013). A review of eye-tracking
presented in Table I. We found statistically significant                                            applications as tools for training. Cognition, Technology and Work,
differences in the number of fixations on the Areas of Interest                                     15(3), 313-327. doi:10.1007/s10111-012-0234-7
(AoI) vs non-AoI for Tasks 1, 3 and 4. However, we did not                                     [6] Busjahn, T., Schulte, C., & Busjahn, A. (2011). Analysis of code
                                                                                                    reading to gain more insight in program comprehension. In
find such differences for Task 2.                                                                   Proceedings of the 11th Koli Calling International Conference on
                                                                                                    Computing Education Research - Koli Calling ’11. ACM Press.
           TABLE I.           RESULTS OF STATISTICAL TESTING                                        doi:10.1145/2094131.2094133
                                                                                               [7] Sharafi, Z., Soh, Z., & Guéhéneuc, Y. (2015). A systematic literature
                                 Results of ANOVA
  Task                                                                                              review on the usage of eye-tracking in software engineering.
                      F-value                           p-valuea                                    Information      and     Software     Technology,      67,   79-107.
                                                                                                    doi:10.1016/j.infsof.2015.06.008
    1                  37.79                             0 (***)                               [8] Blascheck, T., Kurzhals, K., Raschke, M., Burch, M., Weiskopf, D., &
                                                                                                    Ertl, T. (2017). Visualization of eye tracking data: A taxonomy and
    2                  0.66                              0.4245                                     survey.     Computer      Graphics      Forum,     36(8),   260-284.
                                                                                                    doi:10.1111/cgf.13079
    3                  14.73                       0.0006 (***)
                                                                                               [9] Uwano, H., Nakamura, M., Monden, A., & Matsumoto, K. (2006).
    4                  15.58                       0.0006 (***)                                     Analyzing individual performance of source code review using
                                                                                                    reviewers’ eye movement. In Proceedings of the 2006 symposium on
                                                   a.
                                                        *** - statistically significant             Eye tracking research & applications - ETRA ’06. ACM Press.
                                                                                                    doi:10.1145/1117309.1117357
D. Limitations and threats to validity                                                         [10] Chandrika, K. R., Amudha, J., & Sudarsan, S. D. (2017). Recognizing
    The study is based on the assumption that humans think                                          eye tracking traits for source code review. In 2017 22nd IEEE
                                                                                                    International Conference on Emerging Technologies and Factory
about objects when look at them, however we cannot be sure                                          Automation (ETFA). IEEE. doi:10.1109/etfa.2017.8247637
that is assumption is correct. Our eye-tracking experiment                                     [11] Melo, J., Narcizo, F. B., Hansen, D. W., Brabrand, C., & Wasowski,
only explores the processing of cognitive response to visual                                        A. (2017). Variability through the Eyes of the Programmer. In 2017


                                                                                          36
     IEEE/ACM 25th International Conference on Program Comprehension                    Tracking Experiment. In Lecture Notes in Computer Science (pp. 120–
     (ICPC). IEEE. https://doi.org/10.1109/icpc.2017.34                                 131). Springer International Publishing. https://doi.org/10.1007/978-3-
[12] Ahmad Jbara and Dror G. Feitelson. 2015. How programmers read                      319-39952-2_13
     regular code: a controlled experiment using eye tracking. In                  [15] Ksiazeh, K., Marszalek, Z., Capizzi, G., Napoli, C., Polap, D., &
     Proceedings of the 2015 IEEE 23rd International Conference on                      Wozniak, M. (2019). Faster image filtering via parallel
     Program Comprehension (ICPC '15). IEEE Press, Piscataway, NJ,                      programming. International Journal of Computer Science &
     USA, 244-254.                                                                      Applications, 16(1), pp. 55-67.
[13] Beelders, T., & du Plessis, J.-P. (2016). The Influence of Syntax             [16] Liaudanskaitė, G., Saulytė, G., Jakutavičius, J., Vaičiukynaitė, E.,
     Highlighting on Scanning and Reading Behaviour for Source Code. In                 Zailskaitė-Jakštė, L., & Damaševičius, R. (2019). Analysis of affective
     Proceedings of the Annual Conference of the South African Institute of             and gender factors in image comprehension of visual advertisement.
     Computer Scientists and Information Technologists on - SAICSIT ’16.                Artificial Intelligence and Algorithms in Intelligent Systems.
     ACM Press. https://doi.org/10.1145/2987491.2987536                                 CSOC2018. Advances in Intelligent Systems and Computing, vol 764.
[14] Yenigalla, L., Sinha, V., Sharif, B., & Crosby, M. (2016). How Novices             Springer, Cham, 1-11. doi:10.1007/978-3-319-91189-2_1
     Read Source Code in Introductory Courses on Programming: An Eye-


                                                                              37