-

Analysis of Gaze Tra jectories in Natural Reading with Hidden Markov Models

0 Lomonosov Moscow State University

419 428

The process of natural reading, registered with a modern eye-tracking system generates a signal of complicated structure that can be considered as a time series consisting of gaze point coordinates. Signal properties are supposed to depend on various properties of presented text as well as on current cognitive condition of a reader such as attention focus, level of fatigue, level of text understanding and other parameters. The task of cognitive state recognition can be approached with the modeling of gaze trajectories using probabilistic models, which parameters may contain information relevant to read text properties and reader's cognitive state. In this work a new approach of gaze trajectories modeling based on Hidden Markov Models is proposed. HMM's transition probability matrix corresponds to probabilities of saccades between words and emission probability functions correspond to words coordinates and overall measurement noise. Two variants of HMM are proposed: text-related HMM models multiple gaze trajectories collected on the same text from di erent readers, subject-related parametric HMM models gaze trajectories produced by a single reader on a set of consecutive pages from the same text. A series of experiments on simulated data were performed to estimate a required sample size and a required level of measurement accuracy for a forthcoming data collection procedure.

Eye-tracking Natural reading Hidden Markov Models

Natural reading is a complex task that includes eye movement processes, lexicosemantic processing dependent on reader attention and visual features of the text being read.

The process of reading consists of long relatively rare movements ("saccades") between areas of high attention where eyes are xated on a word for some time depending on the skill of reader.

Eye movements during reading are under the direct control of linguistic processing [ 1 ]. There are three properties of a word that in uence its ease of processing: word's frequency, length and predictability in context, the so-called Big Three [ 2 ]. These words properties a ect time of word's processing, length of saccade and probability of word's skipping. Thereby longer words require more processing time from reader because they consist of larger number of letters then shorter, so they include more information to process. Although the word's frequency is correlated with it's length in general case, it has been shown that more frequent words are processed faster then infrequent words with similar lengths [ 3 ]. Moreover, word's length a ects not only the duration of current xation but the length of next saccade and duration of next xation [ 4, 5 ]. More predictable words are more likely to be skipped and require shorter xations.

Medical conditions also in uence eye-movements during reading. For example, schizophrenia patients read slower, make shorter saccades, have longer forward xation durations and make a greater number of regressive saccades [ 6 ]. Several studies show that children with dyslexia read slower, make a larger number of long progressive saccades, make twice more xations on long words and less often skip short words then children without this condition [ 7, 8 ].

Eye-tracking signals are successfully used in a large number of di erent tasks. Subject's attention during reading can be determined with respect to type of reading: reading, skimming and scanning based on features extracted from eyetracking data: amplitude, angularity, velocity of saccades and duration of xations [ 9 ]. Eye-movements data may be used in tasks of language pro ciency determining [ 11 ]. Students' eye movements taken while IELTS reading test completion have been analyzed and signi cant di erence between samples taken from successful and unsuccessful attempts to pass the exam has been revealed [ 12 ]. In a part-of-speech tagging task data extracted from eye-tracker allows to improve the performance of an approach based on Second-order Hidden Markov Models. The extracted data was transformed into 22 features encoding information about xation durations, probabilities, number of xations, re xations and regressions related to current word and its neighbours [ 13 ]. The same features are found to be useful in a domain of Named Entity Recognition in approach based on a usage of a bidirectional LSTM. Using embeddings based on these features it became possible to improve performance of the previous state-of-the art model [ 14 ].

An approach based on applying Slip-Kalman ltering is used to track the progression of reading. This approach works particularly well for determining the event of changing a read line but also shows good results with respect to noise reduction [ 15 ].

At the works discussed above di erent variations of Hidden Markov Models are used in order to determine the part of speech, the correct coordinate at which the eye is directed etc. In this work we propose to model eye-tracking trajectories using Hidden Markov Models: associate a set of hidden states with individual read words and t the matrix of transition probabilities between them. We assume that this matrix can be used as a set of features for a models solving various cognitive state recognition tasks. 2 2.1

Proposed Approach Hidden Markov Models

A rst-order Hidden Markov Model is a probabilistic model which is based on Markov Chain model. An HMM is de ned by the following components: { X = fx1; : : : ; xN g; xn 2 Rd { a sequence of observed values

K { T = ft1; : : : ; tN g; tn 2 f0; 1gK ; P tnj = 1 { a sequence of hidden states that j=1 ti = correspond to observed values.

(1; if a model in a state i

0; otherwise { A transition probability matrix AK K , where aij = p(tn;j jtn 1;i) is a probak bility of transition from the hidden state tn 1;i to the state tnj , P aij = 1 8i j=1 { p (xnjtn) { observation likelihoods or emission probabilities, expressing probabilities that observed value xn would be generated from a hidden state tn. It is assumed that conditional distribution p (xnjtn) is known up to parameters k; k 2 f1; : : : ; Kg, so if tni = 1 then xn is from p(xnj i) { = f 1; : : : ; K g { an initial probability distribution. i is the probability

N of HMM being started in state i. P i = 1

j=1 A rst-order HMM instantiates 2 assumptions. The rst one is Markov assumption: the value of the hidden state tn depends only on the state at the previous moment tn 1.

p(tnjt1; : : : ; tn 1) = p(tnjtn 1) Second, the value of the observed value xn depends only on the current hidden state. It is known as Output Independence assumption.

p(xnjX; T ) = p(xnjtn) (1) (2)

Having a sequences of observed values (in our case, coordinates taken from an eye-tracker) we need to determine from which hidden states (read words) these coordinates were led from. It is needed to solve HMM learning problem: learn the transition probability matrix A and observation likelihoods p (xnjtn) from given observation sequences X and set of possible hidden states where each hidden state would represent read word. Let's denote the set of HMM parameters as = f ; A; g. For an HMM model parameters estimation Expectation-Maximization or EM algorithm can be applied.

= arg max p(Xj ) = arg max X p(X; T j )

T p(Xj ) =

X p(X; T j ) ! max , log T

X p(X; T j ) T 1. Initialization step. At the beginning it is needed to set eters.

{

K P j=1 and A are usually set randomly but corresponding to restrictions k i = 1 and P aij = 1 8i j=1 !

! max = f ; A; g param{ initialization depends on p(xj ) distributions 2. Expectation step. old are xed.

ET jX; old log p(X; T j ) =

X log p(X; T j )p(T jX; old) T 3. Maximization step. p(T jX; old) are xed.

new = arg max ET jX; old log p(X; T j ) 4. Expectation and Maximization steps are repeated until convergence

Baum-Welch algorithm is the special case of the standard HMM-training algorithm which allows to perform E and M steps more e ciently due to optimization related to HMM assumptions 1 and 2.

Our primary goal is to propose an approach for extracting subject-dependent features from eye-tracking data that would correspond to cognitive states and a group of text-dependent features. Examples of cognitive states that can be extracted are a level of fatigue, stress, focus, a level of text understanding, emotional state. One of the features of the text can be "di cult" words that take a longer reading time, require higher number of xations and longer xations that can be explained by the requirements for the level of pro ciency in the language or subject area. Or we can try to discern the in uence of words which are unpredictable in context on eyes movements patterns. Eye-tracking data can be represented as a sequence of coordinates taken from eye-tracker with resolution of several hundred coordinates per second. Each coordinate from a sequence can be assigned to an area of high attention such as a read word. It is proposed to learn an HMM using coordinates as observed values and determine the set of hidden states based on knowledge about the number of words. From practical point of view, one of two following situations is considered for further analysis: either a same text is read by a certain number of di erent subjects, or a single subject reads a set of texts of su cient size. Thus, two di erent models are proposed for these scenarios, text-dependent and subject-dependent. 2.2

Task 1: Text Analysis

In a "same text, di erent readers" scenario, text-dependent features remain the same for every session, but reader-dependent features should be di erent for di erent readers.

The main objective in the analysis of this scenario is to determine a vector of text-dependent parameters given a set of observation sequences related to one text fragment read by a certain number of subjects. It is assumed that a set of observations related to a large enough set of subjects can be used to estimate a subject-independent HMM parameters which can be analysed for the purpose of extracting text-related features while subject-dependent features would be suppressed. Thus the text-dependent HMM consists of the following components: { Xj = fx1j ; : : : ; xnj g { is a set of observed values taken from subject j. { T = ft1; : : : ; tK g { is a set of hidden states determine by a number of words in the text. { p(xnjtn) { emission probabilities { set of two-dimensional Gaussian distributions N = fN1( 1; 1); : : : ; Nn( n; n)g with means de ned by geometric centers of words C = fc1; : : : ; cK g, i = ci and standard deviations represents overall noise of measurements of the eye-tracking device and is estimated from the data. { = f 1; : : : ; K g { initial probability distribution. It also can be estimated from samples. { AK K { a transition probability matrix that can be tted from set of eyetracking measurements taken from di erent subjects.

Parameters of the model trained from samples taken from di erent subjects reading the same text may potentially represent such text characteristics as sentence structure, word frequency, general text complexity, etc. 2.3

Task 2: Subject Analysis

Let's consider a scenario when a single subjects reads a certain number of text fragments. It may be assumed that during one session cognitive states of a subject should not change signi cantly over time and therefore model parameters should be similar for every single page. The objective of the analysis in this scenario is to estimate these subject-dependent parameters. Since the data is presented as a set of eye-tracker trajectories collected from di erent text fragment, each single trajectory is assumed to be sampled from a corresponding text-dependent HMM. It is assumed that parameters of each HMM can be presented as a function of higher-level parameters that refer to current cognitive state of a reader. For example, a level of fatigue can be modeled as the average duration of a xation on a word. In order to train a subject-dependent HMM on these data, a parametric family of HMM is proposed.

Suppose that is a vector of parameters that represents a cognitive state of a subject. Such parameters as average number of xations, average xation duration, average number of regressions could be represented as functions dependent on the vector and words frequencies. In a simplest case, a parametric family of HMM can be proposed in a following form: { M = fm1; : : : ; mP g is a set of HMMs, mi is a HMM corresponding to text i. { For each HMM emission probabilities are de ned by two-dimensional Gaussian distributions with parameters considering geometry of text and measurements noise. { For each HMM a set of hidden states is determined by number of words W = fw1; : : : ; wKp g presented in a text fragment. { Reading process is modeled using probabilities ( ; W ), ( ; W ), ( ; W ), ( ; W ), where ( ; W ) is a probability of continuing a current xation, ( ; W ) is a probability of saccade to a next word, ( ; W ) is a probability of saccade to a previous word, ( ; W ) is a probability of long forward or backward saccade.

Thus a transition probability matrices have the following form: 0

( ; w1) B ( ; w2)

BB ( ; w3) AK K = BBB ...

B@ ( ; wK 1) ( ; wK ) ( ; w1) ( ; w1) : : : : : : ( ; w1) ( ; w2) ( ; w2) : : : : : : ( ; w2) C ( ; w3) ( ; w3) : : : : : : ( ; w3) CC ... ... . . . ... ... CCC : : : : : : : : : ( ; wK 1) ( ; wK 1) AC : : : : : : : : : ( ; wK ) ( ; wK ) 1 Thus, a parametric form of subject-dependent HMM can be de ned by setting a vector of parameters and choosing a set of functions ; ; ; . The model can be more detailed if other text-related parameters would be taken into account. 3

Eye-tracking Corpora

For future studies and experiments two existing eye-tracking datasets were chosen: Zurich Cognitive Language Processing Corpus [ 16 ] and Ghent Eye-Tracking Corpus [ 17 ].

ZuCo is a publicly available dataset containing eye-tracking and EEG data recorded from 12 native English speakers reading natural English text. Subjects were presented with three tasks: normal reading task in which participants had to give an assessment of movie described in a read text fragment, normal reading task with multiple choice questions about read content and task-speci c reading task in which subjects had to focus on a certain semantic relation type. The corpus includes high density eye-tracking data recorded with a calibrated infrared eye-tracker. Fixations, saccades and blinks are identi ed by the tracker software. The dataset also includes such statistics of gaze trajectories as time after rst xation on sentence for every single xation, number of xations for each word and sentence and mean pupil size, gaze duration (GD) during rst word reading, sum of reading time of word (total reading time or TRT), rst xation duration (FFD), the duration of rst and single xation on a word (single xation duration or SFD) and go-past time (GPD) which is the sum of all xations preceding saccade to the right.

GeCo is a corpus of eye-tracking data taken from 14 monolingual and 19 bilingual participants reading a novel. Bilinguals were classi ed as English speakers with pro ciency level from lower-intermediate to advanced. Bilingual participants read half of the novel in their rst language and the other half in their second language. The size of the text read by each subject was about 5,000 sentences. As in ZuCo dataset in addition to raw data extracted by the tracker there are presented word-level reading measurements: GD, FFD, SFD, TRT and GPT. 4

Experiments

A set of experiments was executed to prove a concept that eye-tracking trajectories can be generated using HMM transition probability matrix and then HMM can be tted the model parameters can be tted close to the parameters of the original model. For a given set of sentences we can obtain coordinates of exact positions of words on a page. The distance was measured in relation to the size of a printed letter. A Hidden Markov Model was initialized in the following way according to our vision of what they should look like. The number of hidden states was set according to the number of displayed words. A transition probability matrix was generated as diagonally dominant since the number of saccades and therefore hidden states transitions has to be much smaller then number of eye movements inside areas of gaze xations on a single word when the hidden state does not change. Initial probability distribution was chosen to be geometric distribution in order to simulate a task in which the subject must read the text from the beginning. A set of gaussian 2-d distributions with means located in word centers was chosen as a set of emission probabilities.

A simulated training dataset of sequences of observed values was generated using the de ned HMM. For a texts with several dozen words each sequence consists of 300 observed values. We perform an experiment to nd out how many observation is needed for a precise tting of HMM parameters. The process of model training was run for datasets with di erent number of observations. For each size of training sample a new model was trained. Each model was trained using its own dataset. Then mean squared error between original transition probability matrix and trained transition probability matrix and mean squared error between main diagonals of original and trained matrices for each trained model were measured.

MSE =

K 1

K K

X X(Aij

K i=1 j=1 MSE diagonal = 1 K

X(Aii K i=1

Aij )2 Aii)2 (3) (4)

N 1 s = MSE mean values and their 95% con dence intervals were also calculated. Condence interval were de ned as x Q(1 2 ) s where x is the sample mean, s N ( P xi x)2 i=1 is the sample standard deviation,

is the con dence level, N 0.012 0.011 0.010 0.009 E SM0.008 0.007 0.006 0.005 0.004 0.200 0.175 lan0.150 o g a id0.125 E S M0.100 0.075 0.050 0 5 10 is the sample size, Q is quantile nction: Q(p) = inffx 2 R : p F (x)g, F (x) is the cumulative distribution function for the Student's t-distribution with N 1 degrees of freedom. A con dence interval gives a more descriptive estimate of a parameter than a point estimation. Results of our experiments are presented on Fig. 1 and Fig. 2.

The experiments were executed using python package named hmmlearn [ 18 ]. As is shown in the gures our metrics reach values close to optimal at observations number between 10 and 15. With a further increase in observations numbers MSE over full transition probability matrices and MSE over diagonals do not decrease signi cantly. So for a suboptimal quality it could be enough to collect data from dozen of subjects. 5

Conclusions and Future Work

In this work various approaches to the analysis of eye movement in di erent text-dependent and subject-dependent scenarios have been considered. A new approach of gaze trajectories modeling based on Hidden Markov Models is proposed for both scenarios. An experiment conducted as part of a preliminary study helped us determine the approximate size of the sample we would need for a further work. It is planned to apply a proposed approach on real data taken from ZuCo and GeCo datasets in one of the following tasks. 1. Binary classi cation task: was the answer given by subject in a sentiment task from ZuCo dataset was correct or not. 2. If information about the answer time is available it may be possible to evaluate this parameter using a model, because it is supposed that time required for answer is dependent on how thoroughly the text was read. 3. The fatigue classi cation task. It is assumed that at the end of the long reading session a fatigue level is higher then at the beginning of the session.

Data from GeCo corpus would be useful.

It is also planned to collect a new dataset containing samples of eye movement recordings taken while reading text fragments in Russian. Several dozen subjects will read several texts at di erent levels of fatigue. Fatigue level will be measured in two ways: by interviewing subjects and using a binary classi er trained to recognize the beginning and end of the session. Using the second method, we will assume that fatigue level rises at the end of the session.

1. Dambacher , M. , Slattery , T. J. , Yang , J. , Kliegl , R. , Rayner , K. Evidence for direct control of eye movements during reading . Journal of Experimental Psychology: Human Perception and Performance , 39 ( 5 ): 1468 ( 2013 )

Clifton

Jr , C. , Ferreira , F. , Henderson , J. M. , Inho , A. W. , Liversedge , S. P. , Reichle , E. D. , Schotter , E. R. Eye movements in reading and information processing: Keith Rayner's 40 year legacy . Journal of Memory and Language , 86 : 1 - 19 ( 2016 )

3. Rayner , K. , Du y, S. A. Lexical complexity and xation times in reading: E ects of word frequency, verb complexity, and lexical ambiguity . Memory & cognition 14 .3: 191 - 201 ( 1986 )

4. White , S. J. , Rayner , K. , Liversedge , S. P. The in uence of parafoveal word length and contextual constraint on xation durations and word skipping in reading . Psychonomic bulletin & review 12 .3: 466 - 471 ( 2005 )

5. Juhasz , B. J. , et al. Eye movements and the use of parafoveal word length information in reading . Journal of Experimental Psychology: Human Perception and Performance 34 .6: 1560 ( 2008 )

6. Whitford , V. , et. al. Reading impairments in schizophrenia relate to individual differences in phonological processing and oculomotor control: Evidence from a gazecontingent moving window paradigm . Journal of Experimental Psychology: General 142 .1: 57 ( 2013 )

7. Olson , R. K. , Kliegl , R. , Davidson , B. J. Dyslexic and normal readers' eye movements . Journal of Experimental Psychology: Human Perception and Performance 9 .5: 816 ( 1983 )

8. De Luca , M. , et. al. Eye movement patterns in linguistic and non-linguistic tasks in developmental surface dyslexia . Neuropsychologia 37 .12: 1407 - 1420 ( 1999 )

9. Moza ari, S. S. et al. Reading Type Classi cation based on Generative Models and Bidirectional Long Short-Term Memory . Joint Proceedings of the ACM IUI 2018 Workshops. CEUR Workshop Proceedings 2068 ( 2018 )

10. Kunze , K. , et. al. I know what you are reading: recognition of document types using mobile eye tracking . Proceedings of the 2013 International Symposium on Wearable Computers ( 2013 )

11. Kunze , K. et. al. Towards inferring language expertise using eye tracking . CHI'13 Extended Abstracts on Human Factors in Computing Systems , 217 - 222 ( 2013 )

12. Bax , S. Readers' cognitive processes during IELTS reading tests: Evidence from eye tracking . British Council, ELT Research Papers 13-06 ( 2013 )

13. Barrett , M. , et. al. Weakly supervised part-of-speech tagging using eye-tracking data . Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics , Volume 2 :

Short

Papers ( 2016 )

14. Hollenstein , N. , Zhang , C. Entity Recognition at First Sight: Improving NER with Eye Movement Information . arXiv preprint arXiv: 1902 . 10068 ( 2019 )

15. Bottos , S. , Balasingam , B. A Novel

Slip-Kalman

Filter to Track the Progression of Reading Through Eye-Gaze Measurements . arXiv preprint arXiv: 1907 . 07232 ( 2019 )

16. Hollenstein , N. , et al. ZuCo, a simultaneous EEG and eye-tracking resource for natural sentence reading . Scienti c data 5 .1: 1 - 13 ( 2018 )

17. Cop , U. , et al. Presenting GECO: An eyetracking corpus of monolingual and bilingual sentence reading . Behavior research methods 49 .2: 602 - 615 ( 2017 )

18. hmmlearn library. https://hmmlearn.readthedocs.io

19. Ulutas , B. H. , Ozkan, N. F. , Michalski, R. Application of hidden Markov models to eye tracking data analysis of visual quality inspection operations . Central European Journal of Operations Research , 28 : 761 { 777 ( 2020 )