=Paper=
{{Paper
|id=Vol-2068/uistda2
|storemode=property
|title=Reading Type Classification based on Generative Models and Bidirectional Long Short-Term Memory
|pdfUrl=https://ceur-ws.org/Vol-2068/uistda2.pdf
|volume=Vol-2068
|authors=Seyyed Saleh Mozaffari Chanijani,Federico Raue,Saeid Dashti Hassanzadeh,Stefan Agne,Syed Saqib Bukhari,Andreas Dengel
|dblpUrl=https://dblp.org/rec/conf/iui/ChanijaniRHABD18
}}
==Reading Type Classification based on Generative Models and Bidirectional Long Short-Term Memory==
<pdf width="1500px">https://ceur-ws.org/Vol-2068/uistda2.pdf</pdf>
<pre>
   Reading Type Classification based on Generative Models
         and Bidirectional Long Short-Term Memory
       Seyyed Saleh Mozaffari                                Federico Raue                   Saeid Dashti Hassanzadeh
               Chanijani                                   TU Kaiserslautern                        TU Clausthal
          TU Kaiserslautern                            German Research Center for           saeid.dashti.hassanzadeh@tu-
      German Research Center for                          Artificial Intelligence                    clausthal.de
         Artificial Intelligence                         Federico.Raue@dfki.de
       mozafari@dfki.uni-kl.de

             Stefan Agne                                  Syed Saqib Bukhari                     Andreas Dengel
      German Research Center for                       German Research Center for               TU Kaiserslautern
         Artificial Intelligence                          Artificial Intelligence           German Research Center for
         Stefan.Agne@dfki.de                             Saqib.Bukhari@dfki.de                 Artificial Intelligence
                                                                                              dengel@dfki.uni-kl.de


ABSTRACT                                                               Author Keywords
Measuring the attention of users is necessary to design smart          Eye Tracking; reading type; classification; synthetic data;
Human Computer Interaction (HCI) systems. Particularly, in             generative models; Hierarchical Hidden Markov Models;
reading, the reading types, so-called reading, skimming, and           Gaussian Mixture Models; LSTM; Recurrent Neural
scanning are signs to express the degree of attentiveness. Eye         Networks; reading; skimming; scanning;
movements are informative spatiotemporal data to measure
quality of reading. Eye tracking technology is the tool to             INTRODUCTION
record eye movements. Even though there is increasing usage            Reading, is the ability to extract visual information from the
of eye trackers in research and especially in psycholinguistics,       page and comprehend the meaning of underlying text [16].
collecting appropriate task-specific eye movements data is             Considering attention, as presented in Figure 1, the reading
expensitive and time consuming. Moreover, machine learn-               types is divided into three categories: reading, skimming, and
ing tools like Recurrent Neural Networks need large enough             scanning. On eye tracking context, the reading is a method of
samples to be trained. Hence, designing a generative model             moving the eyes over the text to comprehend the meaning of
in order to have reliable research-oriented synthetic eye move-        it. The skimming is a rapid eye movement over the document
ments is desirable. This paper has two main contributions.             with the purpose of getting only the main ideas and a general
First, a generative model in order to synthesize reading, skim-        overview of the document whereas scanning rapidly covers
ming, and scanning in reading is developed. Second, in order           a lot of contexts in order to locate specific fact or piece of
to evaluate the generative model, a bidirectional Long Short-          information.
Term Memory (BLSTM) is proposed. It was trained with                   The fixation progress on words expressed in character
synthetic data and tested with real-world eye movements to             units must be measured in order to detect the reading type,
classify reading, skimming, and scanning where more than               i.e., deciding whether observed eye movement patterns in
95% classification accuracy is achieved.                               the reading types [4, 12]. This approach applies in cases
                                                                       where the eye tracking accuracy is high enough to provide
                                                                       word level resolution [16]. Example applications include
ACM Classification Keywords
                                                                       ScentHighlight [6], which highlights related sentences
                                                                       during reading; the eyeBook [2], where ambient effects
I.5.4. Computing Methodologies: PATTERN RECOGNI-
                                                                       are to be triggered in proximity of the reading position; or
TION; Applications;
                                                                       QuickSkim [3], where non-content words may be faded out in
                                                                       real time with an increase of skimming speed to make reading
                                                                       more efficient.
                                                                           Due to the noisy nature of the eye tracking apparatus
                                                                       where the point of gaze cannot be determined exactly, it
                                                                       can be desirable to automatically decide to what extent eye
                                                                       movements resemble a reading type patterns. In this regard,
                                                                       a psycholinguist is able to determine what segments of the
©2018. Copyright for the individual papers remains with the authors.   scanpath belong to reading, skimming, or scanning, even
Copying permitted for private and academic purposes.                   though the fixations do not match the underlying text.
UISTDA ’18, March 11, 2018, Tokyo, Japan
                                                                 Figure 2: The top-level architecture of the eyeReading frame-
                                                                 work. The eyeReading Server and eyeReading Client compo-
                                                                 nents are connected via WebSocket protocol. [14]


Figure 1: The three reading types. A: Reading; the saccades      original dataset as well as the synthetic dataset.
are short and progressive over the context. B: Skimming; here
saccades are longer compare to the reading pattern. C: Scan-     The paper is structured as follows. We start with presenting
ning; compare to skimming the saccades are less unstructured.    an experiment conducted to collect real-world eye movements
                                                                 in reading in order to build a reference for data synthesization
                                                                 and the reading type classification. Then, a two-layered
  Biedert et al. [4] proposed a reading-skimming classifier.     Hierarchical Hidden Markov Model (HHMM) for eye
Despite the model classifies the reading and the skimming        movement data synthesization is proposed. Moreover, we
patterns, it does not cover scanning patterns. It is desirable   present a BLSTM-based sequential model to detect and
to have scanning patterns in order to have better estimation     classify reading, skimming, and scanning. This model built
on the degree of attiontion in reading to designing a proper     on features described in section Features and Training. In the
Human Document Interaction system. In addition, in the           Evaluation and Result section, we evaluate our models and
domain of information retrieval, it has been shown that          describe our results. This is followed by our conclusion.
acquiring implicit feedback from a reading type detection,
including scanning, can significantly improve search accuracy
through personalization [5].                                     REAL-WORLD DATA ACQUISITION
Detecting of the reading types is a sequence classification      The first step of constructing a system which is able to learn
problem. The term sequence classification encompasses all        and distinguish eye movement patterns of reading types is
tasks where sequences of data are transcribed with sequences     to record the real-world eye movement data during reading
of discrete labels [9]. The discrete labels in the reading       mode. The recorded data must comprise all possible state
type classification problem are shown in Figure 1. Long          categories: reading, skimming, and scanning. To perform the
Short-Term Memory (LSTM) [10] is a variation of Recurrent        task, we designed an experiment to record eye movements of
Neural Networks (RNNs) which is suitable to classify the         ten participants from the local university. We used this data as
sequential data. However, such networks need large enough        a reference to build up a Hierarchical Hidden Markov Model
samples for training. Unfortunately, the task-specific eye       (HHMM) for synthetic data generation. Furthermore, this real-
tracking data size would not big enough to be applied in         world data partially employed for testing and evaluating the
the deep networks. Hence, there is a need to synthesize          classifiers.
task-specific eye movement patterns in order to deploy deep
neural networks.                                                 Apparatus
In this paper, we propose a generative model which syn-          In this study, we deploy SensoMotoric Instrument iViewX
thesizes the reading types patterns. We also designed and        scientific REDn eye tracker operating at 60Hz. The tracking
evaluated a BLSTM model. The model trained on both the           error reported by the manufacturer was less than 0.4 degree,
Figure 3: The feature components of our study: The fixations are shown by yellow circles. A regression is an implicit sign that the
reader is having difficulty understanding the material. It is shown by the red gaze path. When processing the fixations to form
forward reads, a forward read will be stopped when (1) a regression is encountered, (2) a forward saccade is too large and likely a
forward skip through the text, or (3) the eye gaze moves to another line of text. The last called sweep return and indicated with
dashed red color in the figure. [16] [13].


which makes it appropriate for fixation-based eye movement                of them were native Engish speaker but they were fluent in
studies. 1                                                                English as the second language. In the first phase of the
                                                                          experiment, the participants were requested to write a compre-
Experimental Setup                                                        hensive report on the article they had given to read. Hence,
The experiment was designed in which all the three reading                they would read the selected article thoroughly. In the second
types could be obtained. Two articles in plain English were               phase, the participants were asked to find the specific informa-
chosen from Wikipedia. They are about two airplane crashes                tion in the second article, e.g., how many crews were in the
took place in Colombia2 and Pakistan3 in 2016. Ten partic-                airplane or what was the flight number. Therefore, most of the
ipants from local university participated in our study. None              eye movement patterns associated with skimming and scan-
                                                                          ning. Consequently, all three reading type patterns recorded
1 https://www.smivision.com/eye-tracking/product/redn-scientific-
                                                                          during trials. The trials have been recorded in specialized eye-
eye-tracker/                                                              tracking interface eyeReading [14]. The eyeReading facilitates
2 https://en.wikipedia.org/wiki/LaMia_Flight_2933
3 http://en.wikipedia.org/wiki/Pakistan_International_Airlines_Flight_661
                                                                          research in reading psychology and provides a framework for
                                                                          gaze-based Human Document Interaction. Figure 2 shows
                                                                          top-level architecture of eyeReading.

                                                                        FEATURES AND ANNOTATION
                                                                       The recorded raw gaze information must be processed in order
                                                                       to extract saccadic features associated with reading. The ex-
                                                                       tracted saccadic features are the length of the saccade, velocity
                                                                       of the saccadic movement, fixation duration associated with
                                                                       the saccades, and angularity of the saccade. In this section,
                                                                       first, we demonstrate the features extraction step and then the
                                                                       process of making ground-truth will be explained.

                                                                        Features
                                                                        On account of inevitable noise in the eye tracking trials, we
                                                                        first applied a virtual median filter on the input raw gaze points
                                                                        E 0 = e01 , ..., e0n to eliminate any possible outliers.
Figure 4: The application designed for two steps annotation.
A sequence buffer of window size 5 presented to the annotator.                 ei = (medx (e0i−2 , ..., e0i+2 ), medy (e0i−2 , ..., e0i+2 ))   (1)
This made to facilitate annotator’s decision for the state label        In the second step, the fixations were detected using dispersion
as well as the saccade label of the red saccade. The orange             method [11]. We considered 100ms temporal and 50px spatial
circles are the fixation durations and the bigger size implies          for dispersion parameters. The saccades are considered as two
the bigger duration.                                                    consecutive fixations with the following features:
                                                                              MFR(l, θ, ν, γ)                            MFR(l, θ, ν, γ)
                                                                                                FR             FS
                                                                            COVFR(l, θ, ν, γ)                          COVFS(l, θ, ν, γ)


                                                                              MFR(l, θ, ν, γ)                            MFR(l, θ, ν, γ)

                                                                                                RG             LS
                                                                            COVRG(l, θ, ν, γ)                          COVLS(l, θ, ν, γ)


                                                                              MFR(l, θ, ν, γ)                            MFR(l, θ, ν, γ)

                                                                                                SW             SR
                                                                            COVSW(l, θ, ν, γ)                          COVSR(l, θ, ν, γ)

                                                                                                     State i


                                                                         Figure 6: The second layer of the probabilistic model for
Figure 5: The first layer of the probabilistic model for                 the reading types data generation. Here, in the GMM-
the reading types data generation. An HMM where the                      HMM, the states are the saccades’ labels and the compo-
states are our class labels and the emissions are the saccade            nents (features) value calculated through Gaussian distri-
labels. The output is the sequence of class labels with the              bution with respect to the covariance matrix and the mean
corresponding length.                                                    matrix of the feature set F : `, θ , ν, γ.


1. amplitude(`): the distance between two progressive                 • FR: Forward Read (FR) is a progressive saccade associated
   fixations in virtual character unit (vc).                            with reading. The amplitude is between 7 to 10 characters
                                                                        [16].
2. angularity(θ ): the angle of the saccade respect to its            • FS: Forward Skim(FS)is also a progressive saccade which
   starting point. The α indicates the direction of the saccade         the amplitude is larger than FR but not too large.
   in circle domain: −180◦ ≤ α ≤ 179◦ .
                                                                      • LS: Long Saccades(LS) is those saccades which have the
                                                                        bigger amplitude than the threshold considered for the con-
3. velocity(ν): the speed of the saccade:                               text. The direction of saccade does not apply to LS.
                                   `
                           ν=                                   (2)   • RG: The regressions(RG) are regressive saccades usually
                                te − ts                                 associated with reading which is the sign of difficulties in
  where ts and te are the first timestamps in the start fixation        reading. The amplitude is varied and it must target the
  and the end fixation of a saccade in milliseconds.                    passed context.
                                                                      • SR: Sweeping back to the left on the next line of text is
4. duration(γ): The start fixation duration in each saccade in          called the Sweep Returns (SR).
   milliseconds respectively.
                                                                      • SW : Unstructured sweeping the text to look up information
                                                                        are labeled as Sweeps (SW ).
Therefore, F(`, θ , ν, γ) are the features selected for the sac-
cades in the collected data.                                          Figure 1 intuitively shows the difference between the six sac-
                                                                      cade labels.
Data Annotation
                                                                      Sequence Annotation
After the feature extraction step, we designed a labeling appli-
                                                                      After all the saccades were labeled in the previous section into
cation to make ground truth from data. The figure 4 shows
                                                                      the six categories, in the second step, the sequences made by
the interface of the labeling application. Ground-truthing data
                                                                      these saccades annotated as reading, skimming, or scanning.
was accomplished in two steps: the saccade labeling and the
                                                                      At the end, 396 annotated sequences for reading, 378 for
sequence labeling.
                                                                      skimming, and 118 for scanning were acquired.

Saccade Labeling                                                      SYNTHETIC EYE MOVEMENTS IN READING
In the first step of labeling, the expert made a judgment             It is always desirable to have enough data samples to construct
about the saccades with respect to the provided features              robust machine learning models. Especially, in deep neural
F : `, α, ν, θ .The saccades grouped into six categories or la-       networks, a very big training set is usually required. Unfortu-
bels [13].                                                            nately, appropriate data acquisition in eye tracking studies is
  Algorithm 1: Algorithm to simulate a HMM states sequence                        often not flexible enough for the analysis of real-world data, as
  S given the model λ = {Π, A, B}.                                                the state corresponding to a specific event (observation) has to
                                                                                  be known. However, in many problems of interest, this is not
  Data: statesground−truth = s1 , s2 , ..., sn and
                                                                                  given. Hidden Markov models (HMM) as originally proposed
         observationground−truth = o1 , o2 , ..., on where n is the
                                                                                  by Baum et al. (1970) [1] can be viewed as an extension of
         number of saccades in the ground-truth.
                                                                                  Markov chains. The only difference compared to common
  Result: States sequence Ssequence = ([s1 , l1 ], [s2 , l2 ], ..., [sk , lk ])
                                                                                  Markov chains is, that the state sequence corresponding to a
            where k is number of sequences, s is the sequence
                                                                                  particular observation sequence, i.e., reading types in our case,
            label si ∈ C, and li is the length of si .
                                                                                  is not observable but hidden. In other words, the observation
1 Π = (0.34, 0.33, 0.33);
                                                                                  is a probabilistic function of the state, whereas the underlying
2 A = {ai j |i = 1, 2, 3; j = 1, 2, 3}: State transition probability
                                                                                  state sequence itself is a hidden stochastic process [15]. That
  where ai j = P(st+1 = j|st = i) and ∑3j=1 ai j = 1;                             means the underlying state sequence can only be observed
3 B = {bk (ot )|k = 1, 2, 3;t = 1, ..., n}: Observation probability               indirectly through another stochastic process that emits an ob-
  where bk (ot ) = P(ot |st = k);                                                 servable output. Hidden Markov models are extremely popular
4 Choose an initial state S1 according to the initial state                       when dealing with sequential data, such as speech recognition,
  distribution π;                                                                 character recognition, gesture recognition as well as biolog-
5 for time t = {1, ..., n} do                                                     ical sequences. Therefore, the HMM is a right candidate to
6     Draw ot from the probability distribution Bst;                              handle the eye movement patterns where they are sequential
7     Go to state st + 1 according to the transition probabilities                and by nature stochastic. In order to synthesize our data, the
      Ast ;                                                                       graphical model should be able to generate both saccadic se-
8     Set t = t + 1;                                                              quences and reading state sequences. Therefore, in this paper,
                                                                                  a two-layered Hierarchical HMM is designed. In an HHMM
                                                                                  each state is considered to be a self-contained probabilistic
                                                                                  model [8]. Briefly, in the first layer as shown in Figure 5 we
  Algorithm 2: Algorithm to generate the emissions sequence                       modeled the reading, skimming, and scanning as states of the
  S0 from S.                                                                      Markov model and emissions are FR (Forward Reading), FS
  Data: S and observationground−truth = o1 , o2 , ..., on where n                 (Forward Skimming), SR (Sweep Return), RG (Regression),
           is the number of saccades in the ground-truth data.                    SW (Sweeps), LS (Long Saccades). As shown in Figure 6,
  Result: synthetic emissions for all s ∈ S                                       each of states in the first level is self-contained mixture graphi-
1 Π0 = (π10 , ..., π60 ): Initial state probabilities where                       cal model so-called GMM-HMM (Hidden Markov Model with
  πi0 = P(s01 = i) and ∑6i=1 πi0 = 1;                                             Gaussian Mixture Model). This layer responsible to generate
2 A0 = {a0i j |i = 1, ..., 6; j = 1, ..., 6}: Emissions transition                values for the four mentioned saccadic features F : `, α, ν, θ .
  probability where a0i j = P(st+1   0     = j|st0 = i) and ∑6j=1 a0i j = 1;
3 foreach states reading, skimming, scanning calculate the                        Method
  covariance matrix COV of the emissions                                          The 2-layered HHMM constructs the probabilistic model that
  FR, FS, RG, LS, SW, SR for the features `, θ , ν, γ;                            generates saccades associated with the reading types. It is a
4 foreach states reading, skimming, scanning calculate the                        top-down approach to synthesize natural reading types. The
  mean matrix M of the emissions FR, FS, RG, LS, SW, SR for                       task of the first layer, which is shown in Figure 5, is to generate
  the features `, θ , ν, γ;
5 Choose an initial state S10 according to the initial state
  distribution Π0 ;
6 for time t = {1, ..., n} do
7      Draw ot from the Gaussian distribution with covst and
       meanst ;
8      Go to state st + 1 according to the transition probabilities
       A0st ;
9      Set t = t + 1;


  not an easy task. It needs eye trackers which are still expen-
  sive in the market as well as an appropriate experimental setup
  designed for a specific goal, i.e., reading type classification.
  Hence, the idea is to generate task-specific eye movement data
  from the real-world data in which these synthetic data can be                   Figure 7: BLSTM architecture: the forward (resp. backward)
  used to construct a better model and even to use in other appli-                layer processes the input sequence in the (resp. reverse) or-
  cations and research. It motivated us to design a Hierarchical                  der. Output layer concatenates hidden layers values at each
  Hidden Markov Model (HHMM) to generate synthetic eye                            timestep to make a decision by considering both past and
  movements in reading. Usually, ordinary Markov chains are                       future contexts [9].
             N    Precision     Recall    Accuracy                    data synthesization. The other half used for testing. In all
             5    0.81          0.79      0.784                       cases, the train set first fitted with standard scale function to
             8    0.89          0.90      0.896                       scale the mean (µ) to 0 and the standard deviation(δ ) to 1.
             10   0.93          0.93      0.925                       Then the validation and test set transformed respect to fitted
                                                                      data.
Table 1: The results of BLSTM model on the original dataset.          The model trained with different sequence length N = 5, 8, 10.
196 testing sequences distributed in 75 for reading, 83 for
skimming, and 38 scanning.                                            Baseline: Actual data and SVM-RBF classifier
                                                                      The baseline is to test and evaluate SVM-RBF classifier
                                                                      proposed by Biedert et al. [4]. For the test set, there are only
the sequence of the states reading, skimming, and scanning.           196 sequences to support the model. 75 for reading, 83 for
Algorithm 1 constructs this layer. In order to build the 1st layer    skimming, and 38 scanning. By 5 cross-fold validation the
HMM, λ (π, A, B), the transition state matrix A and emission          best accuracy acquired was 69% with parameters C = 1000
matrix B are built upon the labeled data explained in section         and gamma = 0.001. With a closer look at the confusion
Data Labeling. We considered equal probabilities (33%) for            matrix in figure 9, it is obvious that there is an unacceptable
the states of reading types in π. Then, the states reading,           confusion in the class scanning. This problem is on account
skimming, and scanning is generated based on multinomial              of the less number of supports for the class scanning where
distribution.                                                         there is just 19% of the class labels. Another reason is about
                                                                      sequential characteristics of the data. It shows Support Vector
The algorithm 2 presents the construction of the second layer         Machines are not the best machine learning model tailored for
of our graphical model so-called GMM-HMM model. Where                 such data.
the input is the state sequences produced from the first layer.
In contrast to the first layer, the emissions (observations) as-
sociated with each sequence are generated based on Gaussian           Proposed method: Actual data and BLSTM
distribution. Hence, for each state (reading, skimming, and           The model presented in Figure 8 is used to train and test the
scanning), it needs to compute the transition matrix of obser-        original data. The model trained with different sequence length
vation, the mean of each component (features F : `, θ , ν, γ) as      N = 5, 8, 10. In case, the input sequence has a different length,
well as the covariance matrix of the features. Figure 6 presents      the sequence padded or truncated to the fixed length n. Table
the second layer of the model.                                        1 show the accuracies for the different length. 92.5% accuracy
                                                                      achieved for sequence length of 10.
READING TYPE CLASSIFICATION WITH BLSTM                                roposed method: Synthetic data and BLSTM
Recurrent neural networks (RNNs) are able to access a wide            Finally, the BLSTM model trained with synthetic data which
range of context and sequences [17]. However, standard RNNs           was generated with the half of the original dataset. The same
make use of the previous context only whereas bidirectional           half of the original dataset also was used to validating the
RNNs (BRNNs) are able to incorporate emissions on both                model. The second half of the original dataset used for testing.
sides of every position in the input sequence [18]. This is           Table 2 presents the results for different variation of data size
useful in the problem of reading type detection since it is often     and sequence length. While the larger sequence windows has
necessary to look at both sides to the right and to the left of       the better results in the model, an instant user interface favors
a given sequence in order to identify it. BLSTM is a BRNN             smaller sequences. The length of a sequence is related to
that has hidden layers, which are made up of the so-called            the number of fixations. If we consider the average fixation
Long Short-Term Memory (LSTM) cells. LSTM is an RNN                   duration in reading 250ms [16], the model must wait for the
architecture specifically designed to bridge long time delays         input sequence for 2.5s. This result supports the reliability of
between relevant input and target events, making it suitable for      the synthetic data. The larger data, the better performance as
problems where long-range emission sequences are required             expected in deep net frameworks.
to disambiguate individual labels [10]. In fact, BLSTM net-
works suit well for the reading type detection. Figure 7 shows        CONCLUSION
BLSTM-RNN architecture and                                            Recurrent Neural Networks (RNNs) are suitable to model spa-
Figure 8 presents the model implemented for our study with            tiotemporal data, i.e, eye movements in reading. They usually
keras [7]. It consists of two BLSTMs, one dropout to prevent          need large enough data sample to be trained. On account of
overfitting, and two dense networks. Where n is the sequence          constraints in the experimental setups, accessibility to the ex-
length of the input. The loss function is categorical crossen-        pensive eye tracking apparatus, and finding appropriate and
tropy and SoftMax is used as the activation function.                 enough participants, there is usually lack of enough data in
                                                                      eye tracking research to employ RNNs. In this paper a novel
EVALUATION AND RESULTS                                                probabilistic approach for eye movements data synthesization
In this section, the evaluation of both generative model and the      in reading is proposed. Also a BLSTM model for both the
classifier is presented. Here, two types of the data are available;   original recorded data and the synthetic data in order to clas-
the original data recorded with the eye tracker (actual data)         sify the reading types: reading, skimming, and scanning is
and the synthetic data. The actual data was randomly split into       presented. The RNN-based classifier proposed in this paper
the train set 60%, validation set 10%, and test set 30%.First         achieved more than 95% accuracy in the reading type detec-
half of the actual data was used in the generative model for          tion which not only outperforms the previous works but also
Figure 8: The BLSTM model used in our study. It consist of two BLSTMs, one dropout to prevent overfitting, and a two fully
connected networks. Here, N is the sequence length of the input. The loss function was categorical cross-entropy and Softmax is
used as the activation function.

                                  Synthetic Data Size    N     Precision   Recall    Accuracy
                                  5K                     5     0.87        0.86      0.86
                                  5K                     8     0.92        0.91      0.91
                                  5K                     10    0.93        0.93      0.93
                                  10K                    5     0.88        0.87      0.87
                                  10K                    8     0.92        0.92      0.92 *
                                  10K                    10    0.94        0.93      0.94
                                  50K                    5     0.88        0.92      0.92
                                  50K                    8     0.95        0.94      0.94
                                  50K                    10    0.96        0.95      0.95 *

Table 2: The result of BLSTM model trained with synthesize data. The result supports the reliability of the synthetic data.
The larger data, the better performance. The larger sequences show the better results but a realtime classifier favors the shorter
sequences.


Figure 9: Confusion matrix for two base lines. The left confusion matrix shows high confusion between scanning and skimming
whereas BLSTM confusion matrix (the synthetic data with size of 105 with sequence length N = 5) shows robustness of the model.
contains the scanning reading type.                               in reading i.e., in the research about dyslexia. Also, it gives
One important note is on the sterategy on selecting the se-       insight to other eye-tracking research to generate eye move-
quence length (N) for the model. Even though the longer           ment transitions in different area of interests which is very
sequence length would lead to the higher accuracy, in order to    helpful to distinguish experts and novices in several domains
design instant user interface the shorter length is more desir-   of education. It is also desirable to explore alternative to HMM
able as the model waits for N number of saccades to classify.     for data synthesization. In this regard, using LSTM itself as
Depend on the application, the sequence length could be se-       a generative model for eye tracking data is in our agenda of
lected occasionally.                                              research for the future work.
The outcome of this research is promising in which appropri-
ate data synthesization breaks limitations on using RNNs in       Acknowledgment
eye tracking research in general. It also may offers the possi-   This work was funded by the Federal Ministry of Education
bilities to provide standard eye movement datasets not only       and Research(BMBF) for the project AICASys.
for the reading type detection but for other research aspects
REFERENCES                                                      10. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long
1. Leonard E Baum, Ted Petrie, George Soules, and                   short-term memory. Neural computation 9, 8 (1997),
   Norman Weiss. 1970. A maximization technique                     1735–1780.
   occurring in the statistical analysis of probabilistic
                                                                11. Kenneth Holmqvist, Marcus Nyström, Richard
   functions of Markov chains. The annals of mathematical
                                                                    Andersson, Richard Dewhurst, Halszka Jarodzka, and
   statistics 41, 1 (1970), 164–171.
                                                                    Joost Van de Weijer. 2011. Eye tracking: A
2. Ralf Biedert, Georg Buscher, and Andreas Dengel. 2010a.          comprehensive guide to methods and measures. OUP
   The eyebook–using eye tracking to enhance the reading            Oxford.
   experience. Informatik-Spektrum 33, 3 (2010), 272–281.
                                                                12. Aulikki Hyrskykari. 2006. Eyes in attentive interfaces:
3. Ralf Biedert, Georg Buscher, Sven Schwarz, Jörn Hees,            Experiences from creating iDict, a gaze-aware reading
   and Andreas Dengel. 2010b. Text 2.0. In CHI’10                   aid. Tampereen yliopisto.
   Extended Abstracts on Human Factors in Computing
   Systems. ACM, 4003–4008.                                     13. Seyyed Saleh Mozaffari Chanijani, Mohammad Al-Naser,
                                                                    Syed Saqib Bukhari, Damian Borth, Shanley EM Alleny,
4. Ralf Biedert, Jörn Hees, Andreas Dengel, and Georg               and Andreas Denge. 2016a. An eye movement study on
   Buscher. 2012. A robust realtime reading-skimming                scientific papers using wearable eye tracking technology.
   classifier. In Proceedings of the Symposium on Eye               In Mobile Computing and Ubiquitous Networking
   Tracking Research and Applications. ACM, 123–130.                (ICMU), 2016 Ninth International Conference on. IEEE.
5. Georg Buscher, Andreas Dengel, Ralf Biedert, and             14. Seyyed Saleh Mozaffari Chanijani, Syed Saqib Bukhari,
   Ludger V Elst. 2012. Attentive documents: Eye tracking           and Andreas Dengel. 2016b. eyeReading: Interaction
   as implicit feedback for information retrieval and beyond.       with Text through Eyes. In Proceedings of The Ninth
   ACM Transactions on Interactive Intelligent Systems              International Conference on Mobile Computing and
   (TiiS) 1, 2 (2012), 9.                                           Ubiquitous Networking, Vol. 2016. 1–2.
6. Ed H Chi, Lichan Hong, Michelle Gumbrecht, and               15. Lawrence R Rabiner. 1989. A tutorial on hidden Markov
   Stuart K Card. 2005. ScentHighlights: highlighting               models and selected applications in speech recognition.
   conceptually-related sentences during reading. In                Proc. IEEE 77, 2 (1989), 257–286.
   Proceedings of the 10th international conference on
   Intelligent user interfaces. ACM, 272–274.                   16. Keith Rayner, Alexander Pollatsek, Jane Ashby, and
                                                                    Charles Clifton Jr. 2012. Psychology of reading.
7. François Chollet. 2015. keras.
                                                                    Psychology Press.
   https://github.com/fchollet/keras. (2015).
8. Shai Fine, Yoram Singer, and Naftali Tishby. 1998. The       17. Raúl Rojas. 2013. Neural networks: a systematic
   hierarchical hidden Markov model: Analysis and                   introduction. Springer Science & Business Media.
   applications. Machine learning 32, 1 (1998), 41–62.          18. Mike Schuster and Kuldip K Paliwal. 1997. Bidirectional
9. Alex Graves and others. 2012. Supervised sequence                recurrent neural networks. IEEE Transactions on Signal
   labelling with recurrent neural networks. Vol. 385.              Processing 45, 11 (1997), 2673–2681.
   Springer.

</pre>