=Paper=
{{Paper
|id=Vol-2790/paper37
|storemode=property
|title=
Analysis of Gaze Trajectories in Natural Reading with Hidden Markov Models

|pdfUrl=https://ceur-ws.org/Vol-2790/paper37.pdf
|volume=Vol-2790
|authors=Maksim Volkovich
|dblpUrl=https://dblp.org/rec/conf/rcdl/Volkovich20
}}
==
Analysis of Gaze Trajectories in Natural Reading with Hidden Markov Models
==
<pdf width="1500px">https://ceur-ws.org/Vol-2790/paper37.pdf</pdf>
<pre>
       Analysis of Gaze Trajectories in Natural
        Reading with Hidden Markov Models

                                 Maksim Volkovich

                        Lomonosov Moscow State University


      Abstract. The process of natural reading, registered with a modern
      eye-tracking system generates a signal of complicated structure that can
      be considered as a time series consisting of gaze point coordinates. Signal
      properties are supposed to depend on various properties of presented text
      as well as on current cognitive condition of a reader such as attention
      focus, level of fatigue, level of text understanding and other parameters.
      The task of cognitive state recognition can be approached with the mod-
      eling of gaze trajectories using probabilistic models, which parameters
      may contain information relevant to read text properties and reader’s
      cognitive state. In this work a new approach of gaze trajectories modeling
      based on Hidden Markov Models is proposed. HMM’s transition proba-
      bility matrix corresponds to probabilities of saccades between words and
      emission probability functions correspond to words coordinates and over-
      all measurement noise. Two variants of HMM are proposed: text-related
      HMM models multiple gaze trajectories collected on the same text from
      different readers, subject-related parametric HMM models gaze trajec-
      tories produced by a single reader on a set of consecutive pages from the
      same text. A series of experiments on simulated data were performed
      to estimate a required sample size and a required level of measurement
      accuracy for a forthcoming data collection procedure.

      Keywords: Eye-tracking · Natural reading · Hidden Markov Models


1   Introduction

Natural reading is a complex task that includes eye movement processes, lexico-
semantic processing dependent on reader attention and visual features of the
text being read.
    The process of reading consists of long relatively rare movements (”saccades”)
between areas of high attention where eyes are fixated on a word for some time
depending on the skill of reader.
    Eye movements during reading are under the direct control of linguistic pro-
cessing [1]. There are three properties of a word that influence its ease of pro-
cessing: word’s frequency, length and predictability in context, the so-called Big
Three [2]. These words properties affect time of word’s processing, length of
saccade and probability of word’s skipping. Thereby longer words require more
processing time from reader because they consist of larger number of letters


Copyright © 2020 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).


                                         419
then shorter, so they include more information to process. Although the word’s
frequency is correlated with it’s length in general case, it has been shown that
more frequent words are processed faster then infrequent words with similar
lengths [3]. Moreover, word’s length affects not only the duration of current fix-
ation but the length of next saccade and duration of next fixation [4, 5]. More
predictable words are more likely to be skipped and require shorter fixations.
    Medical conditions also influence eye-movements during reading. For exam-
ple, schizophrenia patients read slower, make shorter saccades, have longer for-
ward fixation durations and make a greater number of regressive saccades [6].
Several studies show that children with dyslexia read slower, make a larger num-
ber of long progressive saccades, make twice more fixations on long words and
less often skip short words then children without this condition [7, 8].
    Eye-tracking signals are successfully used in a large number of different tasks.
Subject’s attention during reading can be determined with respect to type of
reading: reading, skimming and scanning based on features extracted from eye-
tracking data: amplitude, angularity, velocity of saccades and duration of fix-
ations [9]. Eye-movements data may be used in tasks of language proficiency
determining [11]. Students’ eye movements taken while IELTS reading test com-
pletion have been analyzed and significant difference between samples taken from
successful and unsuccessful attempts to pass the exam has been revealed [12]. In
a part-of-speech tagging task data extracted from eye-tracker allows to improve
the performance of an approach based on Second-order Hidden Markov Models.
The extracted data was transformed into 22 features encoding information about
fixation durations, probabilities, number of fixations, refixations and regressions
related to current word and its neighbours [13]. The same features are found to
be useful in a domain of Named Entity Recognition in approach based on a usage
of a bidirectional LSTM. Using embeddings based on these features it became
possible to improve performance of the previous state-of-the art model [14].
    An approach based on applying Slip-Kalman filtering is used to track the
progression of reading. This approach works particularly well for determining
the event of changing a read line but also shows good results with respect to
noise reduction [15].
    At the works discussed above different variations of Hidden Markov Models
are used in order to determine the part of speech, the correct coordinate at
which the eye is directed etc. In this work we propose to model eye-tracking
trajectories using Hidden Markov Models: associate a set of hidden states with
individual read words and fit the matrix of transition probabilities between them.
We assume that this matrix can be used as a set of features for a models solving
various cognitive state recognition tasks.


2     Proposed Approach
2.1   Hidden Markov Models
A first-order Hidden Markov Model is a probabilistic model which is based on
Markov Chain model. An HMM is defined by the following components:


                                       420
 – X = {x1 , . . . , xN }, xn ∈ Rd – a sequence of observed values
                                          K
 – T = {t1 , . . . , tN }, tn ∈ {0, 1}K ,
                                          P
                                            tnj = 1 – a sequence of hidden states that
                                        j=1
   correspond
         (       to observed values.
           1, if a model in a state i
   ti =
           0, otherwise
 – A transition probability matrix AK×K , where aij = p(tn,j |tn−1,i ) is a proba-
                                                                        k
                                                                        P
   bility of transition from the hidden state tn−1,i to the state tnj ,   aij = 1 ∀i
                                                                            j=1
 – p (xn |tn ) – observation likelihoods or emission probabilities, expressing prob-
   abilities that observed value xn would be generated from a hidden state tn . It
   is assumed that conditional distribution p (xn |tn ) is known up to parameters
   φk , k ∈ {1, . . . , K}, so if tni = 1 then xn is from p(xn |φi )
 – π = {π1 , . . . , πK } – an initial probability distribution. πi is the probability
                                            N
                                            P
   of HMM being started in state i.            πi = 1
                                            j=1

A first-order HMM instantiates 2 assumptions. The first one is Markov assump-
tion: the value of the hidden state tn depends only on the state at the previous
moment tn−1 .
                          p(tn |t1 , . . . , tn−1 ) = p(tn |tn−1 )           (1)
Second, the value of the observed value xn depends only on the current hidden
state. It is known as Output Independence assumption.

                                  p(xn |X, T ) = p(xn |tn )                        (2)

    Having a sequences of observed values (in our case, coordinates taken from an
eye-tracker) we need to determine from which hidden states (read words) these
coordinates were led from. It is needed to solve HMM learning problem: learn the
transition probability matrix A and observation likelihoods p (xn |tn ) from given
observation sequences X and set of possible hidden states where each hidden
state would represent read word. Let’s denote the set of HMM parameters as Θ =
{π, A, φ}. For an HMM model parameters estimation Expectation-Maximization
or EM algorithm can be applied.
                                                   X
                 Θ∗ = arg max p(X|Θ) = arg max         p(X, T |Θ)
                              Θ                      Θ
                                                          T
                                                                       !
                   X                                     X
       p(X|Θ) =         p(X, T |Θ) →
                                   − max ⇔ log                p(X, T |Θ)   →
                                                                           − max
                                           Θ                                  Θ
                    T                                    T

1. Initialization step. At the beginning it is needed to set Θ = {π, A, φ} param-
   eters.
    – π and A are usually set randomly but corresponding to restrictions
        K
        P               Pk
            πi = 1 and      aij = 1 ∀i
        j=1             j=1


                                            421
     – φ initialization depends on p(x|φ) distributions
 2. Expectation step. Θold are fixed.
                                          X
              ET |X,Θold log p(X, T |Θ) =   log p(X, T |Θ)p(T |X, Θold )
                                               T

 3. Maximization step. p(T |X, Θold ) are fixed.

                       Θnew = arg max ET |X,Θold log p(X, T |Θ)
                                     Θ

 4. Expectation and Maximization steps are repeated until convergence

    Baum-Welch algorithm is the special case of the standard HMM-training
algorithm which allows to perform E and M steps more efficiently due to opti-
mization related to HMM assumptions 1 and 2.
    Our primary goal is to propose an approach for extracting subject-dependent
features from eye-tracking data that would correspond to cognitive states and
a group of text-dependent features. Examples of cognitive states that can be
extracted are a level of fatigue, stress, focus, a level of text understanding, emo-
tional state. One of the features of the text can be ”difficult” words that take
a longer reading time, require higher number of fixations and longer fixations
that can be explained by the requirements for the level of proficiency in the lan-
guage or subject area. Or we can try to discern the influence of words which are
unpredictable in context on eyes movements patterns. Eye-tracking data can be
represented as a sequence of coordinates taken from eye-tracker with resolution
of several hundred coordinates per second. Each coordinate from a sequence can
be assigned to an area of high attention such as a read word. It is proposed to
learn an HMM using coordinates as observed values and determine the set of
hidden states based on knowledge about the number of words. From practical
point of view, one of two following situations is considered for further analysis:
either a same text is read by a certain number of different subjects, or a sin-
gle subject reads a set of texts of sufficient size. Thus, two different models are
proposed for these scenarios, text-dependent and subject-dependent.

2.2   Task 1: Text Analysis
In a ”same text, different readers” scenario, text-dependent features remain the
same for every session, but reader-dependent features should be different for
different readers.
    The main objective in the analysis of this scenario is to determine a vector
of text-dependent parameters given a set of observation sequences related to
one text fragment read by a certain number of subjects. It is assumed that
a set of observations related to a large enough set of subjects can be used to
estimate a subject-independent HMM parameters which can be analysed for
the purpose of extracting text-related features while subject-dependent features
would be suppressed. Thus the text-dependent HMM consists of the following
components:


                                         422
 – Xj = {x1j , . . . , xnj } – is a set of observed values taken from subject j.
 – T = {t1 , . . . , tK } – is a set of hidden states determine by a number of words
   in the text.
 – p(xn |tn ) – emission probabilities – set of two-dimensional Gaussian distribu-
   tions
   N = {N1 (µ1 , σ1 ), . . . , Nn (µn , σn )} with means defined by geometric centers
   of words C = {c1 , . . . , cK }, µi = ci and standard deviations σ represents
   overall noise of measurements of the eye-tracking device and is estimated
   from the data.
 – π = {π1 , . . . , πK } – initial probability distribution. It also can be estimated
   from samples.
 – AK×K – a transition probability matrix that can be fitted from set of eye-
   tracking measurements taken from different subjects.

Parameters of the model trained from samples taken from different subjects
reading the same text may potentially represent such text characteristics as
sentence structure, word frequency, general text complexity, etc.


2.3   Task 2: Subject Analysis

Let’s consider a scenario when a single subjects reads a certain number of text
fragments. It may be assumed that during one session cognitive states of a sub-
ject should not change significantly over time and therefore model parameters
should be similar for every single page. The objective of the analysis in this
scenario is to estimate these subject-dependent parameters. Since the data is
presented as a set of eye-tracker trajectories collected from different text frag-
ment, each single trajectory is assumed to be sampled from a corresponding
text-dependent HMM. It is assumed that parameters of each HMM can be pre-
sented as a function of higher-level parameters that refer to current cognitive
state of a reader. For example, a level of fatigue can be modeled as the average
duration of a fixation on a word. In order to train a subject-dependent HMM on
these data, a parametric family of HMM is proposed.
    Suppose that θ is a vector of parameters that represents a cognitive state of a
subject. Such parameters as average number of fixations, average fixation dura-
tion, average number of regressions could be represented as functions dependent
on the vector θ and words frequencies. In a simplest case, a parametric family
of HMM can be proposed in a following form:

 – M = {m1 , . . . , mP } is a set of HMMs, mi is a HMM corresponding to text
   i.
 – For each HMM emission probabilities are defined by two-dimensional Gaus-
   sian distributions with parameters considering geometry of text and mea-
   surements noise.
 – For each HMM a set of hidden states is determined by number of words
   W = {w1 , . . . , wKp } presented in a text fragment.


                                        423
 – Reading process is modeled using probabilities α(θ, W ), β(θ, W ), δ(θ, W ),
   (θ, W ), where α(θ, W ) is a probability of continuing a current fixation,
   β(θ, W ) is a probability of saccade to a next word, δ(θ, W ) is a probability
   of saccade to a previous word, (θ, W ) is a probability of long forward or
   backward saccade.

Thus a transition probability matrices have the following form:

                                                                        
              α(θ, w1 ) β(θ, w1 ) (θ, w1 ) . . .    ...      (θ, w1 )
            δ(θ, w2 ) α(θ, w2 ) β(θ, w2 ) . . .     ...      (θ, w2 ) 
                                                                        
            (θ, w3 ) δ(θ, w3 ) α(θ, w3 ) . . .     . .  .   (θ, w3 ) 
      K×K
     A    =
                                                                        
                   ..      ..          ..   ..         ..         ..     
           
                   .       .           .       .       .          .     
                                                                         
            (θ, wK−1 ) . . .       . . . . . . α(θ, wK−1 ) β(θ, wK−1 ) 
              (θ, wK )   ...        . . . . . . δ(θ, wK ) α(θ, wK )

Thus, a parametric form of subject-dependent HMM can be defined by setting a
vector of parameters θ and choosing a set of functions α, β, δ, . The model can
be more detailed if other text-related parameters would be taken into account.


3   Eye-tracking Corpora

For future studies and experiments two existing eye-tracking datasets were cho-
sen: Zurich Cognitive Language Processing Corpus [16] and Ghent Eye-Tracking
Corpus [17].


ZuCo is a publicly available dataset containing eye-tracking and EEG data
recorded from 12 native English speakers reading natural English text. Subjects
were presented with three tasks: normal reading task in which participants had
to give an assessment of movie described in a read text fragment, normal reading
task with multiple choice questions about read content and task-specific reading
task in which subjects had to focus on a certain semantic relation type. The cor-
pus includes high density eye-tracking data recorded with a calibrated infrared
eye-tracker. Fixations, saccades and blinks are identified by the tracker software.
The dataset also includes such statistics of gaze trajectories as time after first
fixation on sentence for every single fixation, number of fixations for each word
and sentence and mean pupil size, gaze duration (GD) during first word reading,
sum of reading time of word (total reading time or TRT), first fixation duration
(FFD), the duration of first and single fixation on a word (single fixation dura-
tion or SFD) and go-past time (GPD) which is the sum of all fixations preceding
saccade to the right.


GeCo is a corpus of eye-tracking data taken from 14 monolingual and 19 bilin-
gual participants reading a novel. Bilinguals were classified as English speakers


                                       424
with proficiency level from lower-intermediate to advanced. Bilingual partici-
pants read half of the novel in their first language and the other half in their
second language. The size of the text read by each subject was about 5,000
sentences. As in ZuCo dataset in addition to raw data extracted by the tracker
there are presented word-level reading measurements: GD, FFD, SFD, TRT and
GPT.


4    Experiments

A set of experiments was executed to prove a concept that eye-tracking trajecto-
ries can be generated using HMM transition probability matrix and then HMM
can be fitted the model parameters can be fitted close to the parameters of the
original model. For a given set of sentences we can obtain coordinates of exact
positions of words on a page. The distance was measured in relation to the size
of a printed letter. A Hidden Markov Model was initialized in the following way
according to our vision of what they should look like. The number of hidden
states was set according to the number of displayed words. A transition proba-
bility matrix was generated as diagonally dominant since the number of saccades
and therefore hidden states transitions has to be much smaller then number of
eye movements inside areas of gaze fixations on a single word when the hidden
state does not change. Initial probability distribution was chosen to be geometric
distribution in order to simulate a task in which the subject must read the text
from the beginning. A set of gaussian 2-d distributions with means located in
word centers was chosen as a set of emission probabilities.
    A simulated training dataset of sequences of observed values was generated
using the defined HMM. For a texts with several dozen words each sequence con-
sists of 300 observed values. We perform an experiment to find out how many
observation is needed for a precise fitting of HMM parameters. The process of
model training was run for datasets with different number of observations. For
each size of training sample a new model was trained. Each model was trained
using its own dataset. Then mean squared error between original transition prob-
ability matrix and trained transition probability matrix and mean squared error
between main diagonals of original and trained matrices for each trained model
were measured.
                                         K K
                                   1 XX
                        MSE =                   (Aij − A∗ij )2                 (3)
                                 K ∗ K i=1 j=1

                                                     K
                                                 1 X
                               MSE diagonal =          (Aii − A∗ii )2                 (4)
                                                 K i=1

MSE mean values and their 95% confidence intervals were also calculated. Con-
fidence interval were defined as x ± Q(1 − α2 ) ∗ s where x is the sample mean,
     s
           N
           P
       (         xi −x)2
           i=1
s=           N −1          is the sample standard deviation, α is the confidence level, N


                                               425
                  0.012

                  0.011

                  0.010

                  0.009
   MSE


                  0.008

                  0.007

                  0.006

                  0.005

                  0.004
                          0   5   10            15           20   25       30
                                       observations number


Fig. 1. Mean Squared Error over the difference of original and a trained transition
probability matrices. The red line indicates mean MSE. The light red area includes the
95% MSE confidence interval.


                  0.200

                  0.175
   MSE diagonal


                  0.150

                  0.125

                  0.100

                  0.075

                  0.050
                          0   5   10            15           20   25       30
                                       observations number


Fig. 2. Mean Squared Error over the diagonals difference of original and a trained
transition probability matrices. The red line indicates mean MSE. The light red area
includes the 95% MSE confidence interval.


is the sample size, Q is quantile finction: Q(p) = inf{x ∈ R : p ≤ F (x)}, F (x) is
the cumulative distribution function for the Student’s t-distribution with N − 1
degrees of freedom. A confidence interval gives a more descriptive estimate of a
parameter than a point estimation. Results of our experiments are presented on
Fig. 1 and Fig. 2.
   The experiments were executed using python package named hmmlearn [18].
As is shown in the figures our metrics reach values close to optimal at observa-


                                            426
tions number between 10 and 15. With a further increase in observations numbers
MSE over full transition probability matrices and MSE over diagonals do not
decrease significantly. So for a suboptimal quality it could be enough to collect
data from dozen of subjects.


5    Conclusions and Future Work

In this work various approaches to the analysis of eye movement in different
text-dependent and subject-dependent scenarios have been considered. A new
approach of gaze trajectories modeling based on Hidden Markov Models is pro-
posed for both scenarios. An experiment conducted as part of a preliminary
study helped us determine the approximate size of the sample we would need for
a further work. It is planned to apply a proposed approach on real data taken
from ZuCo and GeCo datasets in one of the following tasks.

 1. Binary classification task: was the answer given by subject in a sentiment
    task from ZuCo dataset was correct or not.
 2. If information about the answer time is available it may be possible to eval-
    uate this parameter using a model, because it is supposed that time required
    for answer is dependent on how thoroughly the text was read.
 3. The fatigue classification task. It is assumed that at the end of the long
    reading session a fatigue level is higher then at the beginning of the session.
    Data from GeCo corpus would be useful.

It is also planned to collect a new dataset containing samples of eye movement
recordings taken while reading text fragments in Russian. Several dozen subjects
will read several texts at different levels of fatigue. Fatigue level will be measured
in two ways: by interviewing subjects and using a binary classifier trained to
recognize the beginning and end of the session. Using the second method, we
will assume that fatigue level rises at the end of the session.


References

1. Dambacher, M., Slattery, T. J., Yang, J., Kliegl, R., Rayner, K. Evidence for di-
   rect control of eye movements during reading. Journal of Experimental Psychology:
   Human Perception and Performance, 39(5):1468 (2013)
2. Clifton Jr, C., Ferreira, F., Henderson, J. M., Inhoff, A. W., Liversedge, S. P., Re-
   ichle, E. D., Schotter, E. R. Eye movements in reading and information processing:
   Keith Rayner’s 40 year legacy. Journal of Memory and Language, 86:1-19 (2016)
3. Rayner, K., Duffy, S. A. Lexical complexity and fixation times in reading: Effects
   of word frequency, verb complexity, and lexical ambiguity. Memory & cognition
   14.3:191-201 (1986)
4. White, S. J., Rayner, K., Liversedge, S. P. The influence of parafoveal word length
   and contextual constraint on fixation durations and word skipping in reading. Psy-
   chonomic bulletin & review 12.3:466-471 (2005)


                                         427
5. Juhasz, B. J., et al. Eye movements and the use of parafoveal word length infor-
   mation in reading. Journal of Experimental Psychology: Human Perception and
   Performance 34.6:1560 (2008)
6. Whitford, V., et. al. Reading impairments in schizophrenia relate to individual dif-
   ferences in phonological processing and oculomotor control: Evidence from a gaze-
   contingent moving window paradigm. Journal of Experimental Psychology: General
   142.1:57 (2013)
7. Olson, R. K., Kliegl, R., Davidson, B. J. Dyslexic and normal readers’ eye move-
   ments. Journal of Experimental Psychology: Human Perception and Performance
   9.5:816 (1983)
8. De Luca, M., et. al. Eye movement patterns in linguistic and non-linguistic tasks in
   developmental surface dyslexia. Neuropsychologia 37.12:1407-1420 (1999)
9. Mozaffari, S. S. et al. Reading Type Classification based on Generative Models and
   Bidirectional Long Short-Term Memory. Joint Proceedings of the ACM IUI 2018
   Workshops. CEUR Workshop Proceedings 2068 (2018)
10. Kunze, K., et. al. I know what you are reading: recognition of document types using
   mobile eye tracking. Proceedings of the 2013 International Symposium on Wearable
   Computers (2013)
11. Kunze, K. et. al. Towards inferring language expertise using eye tracking. CHI’13
   Extended Abstracts on Human Factors in Computing Systems, 217-222 (2013)
12. Bax, S. Readers’ cognitive processes during IELTS reading tests: Evidence from
   eye tracking. British Council, ELT Research Papers 13-06 (2013)
13. Barrett, M., et. al. Weakly supervised part-of-speech tagging using eye-tracking
   data. Proceedings of the 54th Annual Meeting of the Association for Computational
   Linguistics, Volume 2: Short Papers (2016)
14. Hollenstein, N., Zhang, C. Entity Recognition at First Sight: Improving NER with
   Eye Movement Information. arXiv preprint arXiv:1902.10068 (2019)
15. Bottos, S., Balasingam, B. A Novel Slip-Kalman Filter to Track the Progression of
   Reading Through Eye-Gaze Measurements. arXiv preprint arXiv:1907.07232 (2019)
16. Hollenstein, N., et al. ZuCo, a simultaneous EEG and eye-tracking resource for
   natural sentence reading. Scientific data 5.1:1-13 (2018)
17. Cop, U., et al. Presenting GECO: An eyetracking corpus of monolingual and bilin-
   gual sentence reading. Behavior research methods 49.2:602-615 (2017)
18. hmmlearn library. https://hmmlearn.readthedocs.io
19. Ulutas, B. H., Özkan, N. F., Michalski, R. Application of hidden Markov models to
   eye tracking data analysis of visual quality inspection operations. Central European
   Journal of Operations Research, 28:761–777 (2020)


                                         428

</pre>