<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Spatiotemporal windows for fixation detection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tyler Thrash</string-name>
          <email>tyler.thrash@gess.ethz.ch</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Iva Barisic</string-name>
          <email>iva.barisic@gess.ethz.ch</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Chair of Cognitive Science</institution>
          ,
          <addr-line>ETH Zurich</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2014</year>
      </pub-date>
      <fpage>2</fpage>
      <lpage>6</lpage>
      <abstract>
        <p>Eye fixations are periods of relative stability derived from continuous eye position (or eye movement) data. In order to define eye fixations, researchers often assume that the eye(s) will not move beyond a particular spatiotemporal window (i.e., a spatial area towards which the eye is directed within a particular period of time). However, exact specifications of this window vary from field to field and even from one experiment to another. Efforts to standardize these specifications have assumed (either implicitly or explicitly) that there is one appropriate window size for describing eye behavior. The present paper explores an alternative approach. Specifically, we provide a method for determining the most appropriate spatiotemporal window that can vary from participant to participant (or task to task). This approach may also be extended to provide a metric for detection algorithm comparison.</p>
      </abstract>
      <kwd-group>
        <kwd>eye tracking • fixation detection • scene perception</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        In order to be meaningful, eye tracking data needs to be classified into periods of
movement (e.g., saccades) and periods of stability (e.g., fixations). During periods of
movement, visual stimuli are usually considered inaccessible to the human observer.
This phenomenon is called saccadic suppression [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Most of visual perception is
based on information that is accessible during periods of stability [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Fixation
detection algorithms attempt to determine what information is perceptually available by
inferring which eye tracking data points represent periods of stability [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        All of these algorithms essentially rely on the definition of what we call a
“spatiotemporal window” (i.e., a spatial area towards which the eye is directed within a
particular period of time). Some detection algorithms (e.g., dispersion-based algorithms;
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]) emphasize the two spatial dimensions of this window by evaluating possible
fixations in terms of the dispersion of data points around possible foci. However, these
algorithms also typically incorporate lower and upper bounds for the “reasonable”
duration of a fixation. Other detection algorithms (e.g., velocity-based algorithms;
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]) emphasize the temporal dimension of this window by classifying eye tracking
data in terms of velocity and/or acceleration. These algorithms also typically include
lower and upper bounds for the size of a fixation along spatial dimensions. Thus, the
three-dimensional spatiotemporal window is a critical consideration for the
implementation of both dispersion-based and velocity-based algorithms.
      </p>
      <p>
        One assumption underlying most efforts to standardize specifications of the
spatiotemporal window is that one set of parameters can be used to describe the eye
behavior of all healthy adults (e.g., [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]), even though there is a good deal of variability in
this behavior both within an individual and across individuals [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The
variability not described by this set of parameters is typically considered “noise” (e.g., as
resulting from the imprecision of the eye tracking equipment). Even algorithms that
can be adapted to different noise profiles (e.g., [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]) assume the same spatiotemporal
window for defining fixations. In contrast, the current approach allows for variability
in the size of the spatiotemporal window across individuals and tasks.
      </p>
      <p>
        The specification of spatiotemporal windows is especially critical when it is
difficult to define the direction of a stimulus from the observer objectively (i.e., without
relying on designations by other observers). This scenario is common for
investigations of naturalistic scene perception and navigation because of the lack of clear
boundaries between objects and/or the dynamic nature of the stimuli [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Except for
sophisticated computational vision algorithms, there are no established methods for
determining the objective “truth” to which a set of detected fixations (e.g., resulting
from different detection algorithms) can be compared in these scenarios. The current
approach extends a common technique for comparing mathematical models without
needing to presuppose any particular objective truth.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Current approach</title>
      <p>There are two primary applications of the current approach: the specification of the
spatiotemporal window for different observers/tasks and the comparison of different
detection algorithms.
2.1</p>
      <sec id="sec-2-1">
        <title>Specification of the spatiotemporal window</title>
        <p>Our general approach for specifying the most appropriate spatiotemporal window
is to calculate error in the data points relative to the nearest detected fixation. Error, in
this case, represents variability in the gaze data that is within the defined
spatiotemporal window but cannot be explained by the set of fixations detected by a particular
algorithm.</p>
        <p>At most, six parameters are needed to describe spatiotemporal windows that reflect
plausible (and interpretable) fixations. Researchers should start by defining the sizes
of spatial and temporal intervals. The spatial and temporal interval parameters
determine which data points are used for calculating the error term of each detected
fixation. Data points are only included in the following calculations if they fall within
both spatial and temporal intervals for any detected fixation. The distance function is
calculated using the following equation:</p>
        <p>( 1,  2) = [ 1( 1 −  2)

+  2( 1 −  2)
+ (1 −  1 −  2)( 1 −  2)
 ]
1</p>
        <p>(1a)
Here, x1 and x2 represent the locations of two points along the horizontal axis, y1 and
y2 represent the locations of two points along the vertical axis, t1 and t2 represent the
locations of two points along the temporal dimension, the two w’s represent the
relative weighing of the two spatial dimensions with respect to the temporal dimension, m
represents the type of Minkowski distance metric, and d(p1,p2) represents the distance
between two points. For most applications, m should be constrained to be either 2
(resulting in a Euclidean distance metric) or 1 (resulting in a city-block distance
metric). A city-block distance metric may be appropriate if researchers consider errors
along x and y dimensions as independent of each other. Other values for m are
possible but difficult to interpret. The parameters w1 and w2 also need to be constrained so
that each weight is greater than 0 and that their sum is less than 1. Larger values for
the w’s indicate larger relative contributions for deviations along the corresponding
spatial dimensions to the fit of the resulting model. Note that this distance function
may need to accommodate differences in visual angle if, for example, two participants
are fixed at different distances from the stimulus.</p>
        <p>
          Equation 1a also assumes that the distribution of data points that represent each
fixation is uniform rather than Gaussian (see, e.g., [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]). The utility of the uniformly
distributed distance function can be compared empirically to the utility of the
following normally distributed (and Euclidean) distance function:
−( 12−  22)2
−( 12−  22)2
⎫
⎪
⎧
⎪
 1 1 − 
+ 2 1 − 
⎨ ⎬
⎪⎩+(1 −  1 −  2) 1 −  −( 12−  22)2 ⎭⎪
⃓
⃓
⃓
⃓
⃓⃓ 1
 ( 1,  2) = ⃓⃓⃓  √2
⃓
⃓
⃓
⎷
(1b)
(2)
Here, the only additional parameter is s, which represents the “steepness” of the
normally distributed distance function. Note that s does not necessarily correspond to the
standard deviation of the distribution of resulting distances. The w’s should be
constrained in the same manner as for the uniformly distributed distance function.
        </p>
        <p>In order to determine which of several possible specifications is most appropriate
for a particular detection algorithm, we then need to calculate the error term for each
fixation:
 (
) = ∑  (  , ̅)</p>
        <p>Here, p represents a data point with index i, ͞p represents the centroid for all of the
data points within the spatiotemporal window, d represents the distance metric from
Equation 1a or 1b, np represents the number of data points within the spatiotemporal
window for a detected fixation, and e(fixation) represents the error term for the
detected fixation (i.e., the mean of the distances from the centroid to each data point
within the spatiotemporal window).</p>
        <p>If researchers are comparing sets of detected fixations with spatiotemporal windows
of the same size and shape, then sums of e(fixation) across sets of detected fixations
are sufficient for comparing different detection algorithms. Across any range of
spatial and temporal intervals, the smallest sum of e(fixation) will reveal the most
appropriate spatiotemporal window for any given detection algorithm.</p>
        <p>However, in order to compare spatiotemporal windows with different shapes or
sizes, the error term needs to be converted into a measure that accounts for the number
of free parameters or the number of detected fixations, respectively. Towards this end,
the summed and squared error terms for all of the detected fixations of a given
spatiotemporal window can be converted to Bayes’ information criterion (BIC):
Here, nf represents the number of detected fixations, k represents the number of free
parameters, ln represents the natural logarithm function, and e(fixation) represents the
error term from Equation 3. We consider each interval as only one parameter because
the location of the fixation along a particular dimension and both boundaries of each
interval are completely constrained by the determination of the size of the interval and
the data.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Detection algorithm comparison</title>
        <p>
          The BIC can also be used in order to compare different fixation detection
algorithms using Equations 1-3. The primary challenge for comparing different detection
algorithms thus becomes determining which parameters are free to vary (see [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]).
The BIC should be used to penalize the fit of any parameter that could have changed
in order to improve the fit of the model to the data. Notably, this method does not
require any assumptions regarding the “true” foci in the stimulus.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Future validation studies</title>
      <p>
        Future investigations can attempt to validate or invalidate our approach in at least
two ways. First, following [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], researchers can direct participants to focus on
individual stimuli at known coordinates. This procedure is often used by eye tracking
software for calibrating eye movement data before an experiment [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. For validation
purposes, fixations may be considered the periods of time during which a participant
was asked to focus on a particular stimulus. The veracity with which the BIC metric
determines the most appropriate spatiotemporal window (or best-performing detection
algorithm) should then be reflected by similar patterns in other metrics (e.g., number
of detected fixations; [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]).
      </p>
      <p>
        Second, the mean spatiotemporal window specified across individual participants
may approximately correspond to established recommendations already in the
literature (e.g., [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]). This may occur if the primary advantage of the current approach
is to account for additional variability, but this procedure could also be misleading if
the current approach actually produces more accurate fixation detection than previous
approaches.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <p>The present paper provided a novel approach to the specification of spatiotemporal
windows for fixation detection algorithms. This approach may also be applied to the
comparison of different detection algorithms. Two future studies for potentially
falsifying this approach are also briefly described.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Matin</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          (
          <year>1974</year>
          ).
          <article-title>Saccadic suppression: A review and an analysis</article-title>
          .
          <source>Psychological Bulletin</source>
          ,
          <volume>81</volume>
          ,
          <fpage>899</fpage>
          -
          <lpage>917</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Henderson</surname>
            ,
            <given-names>J. M.</given-names>
          </string-name>
          (
          <year>2003</year>
          ).
          <article-title>Human gaze control during real-world scene perception</article-title>
          .
          <source>Trends in Cognitive Sciences</source>
          ,
          <volume>7</volume>
          (
          <issue>11</issue>
          ),
          <fpage>498</fpage>
          -
          <lpage>504</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Salvucci</surname>
            ,
            <given-names>D. D.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Goldberg</surname>
            ,
            <given-names>J. H.</given-names>
          </string-name>
          (
          <year>2000</year>
          ).
          <article-title>Identifying fixations and saccades in eye-tracking protocols</article-title>
          .
          <source>Proceedings of the Eye Tracking Research and Applications Symposium</source>
          ,
          <fpage>71</fpage>
          -
          <lpage>78</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Nyström</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Holmqvist</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>An adaptive algorithm for fixation, saccade, and glissade detection in eyetracking data</article-title>
          .
          <source>Behavior Research Methods</source>
          ,
          <volume>42</volume>
          ,
          <fpage>188</fpage>
          -
          <lpage>204</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Komogortsev</surname>
            ,
            <given-names>O. V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gobert</surname>
            ,
            <given-names>D. V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jayarathna</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koh</surname>
            ,
            <given-names>D. H.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Gowda</surname>
            ,
            <given-names>S. M.</given-names>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>Standardization of automated analyses of oculomotor fixation and saccadic behaviors</article-title>
          .
          <source>IEEE Transactions on Biomedical Engineering</source>
          ,
          <volume>57</volume>
          ,
          <fpage>2635</fpage>
          -
          <lpage>2645</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Rayner</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          (
          <year>1998</year>
          ).
          <source>Eye movements in reading and information processing: 20 years of research. Psychological Bulletin</source>
          ,
          <volume>124</volume>
          ,
          <fpage>372</fpage>
          -
          <lpage>422</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Hyönä</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lorch</surname>
            <given-names>Jr</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>R. F.</given-names>
            , &amp;
            <surname>Kaakinen</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. K.</surname>
          </string-name>
          (
          <year>2002</year>
          ).
          <article-title>Individual differences in reading to summarize expository text: Evidence from eye fixation patterns</article-title>
          .
          <source>Journal of Educational Psychology</source>
          ,
          <volume>94</volume>
          (
          <issue>1</issue>
          ),
          <fpage>44</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Rayner</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Raney</surname>
            ,
            <given-names>G. E.</given-names>
          </string-name>
          (
          <year>1996</year>
          ).
          <article-title>Eye movement control in reading and visual search: Effects of word frequency</article-title>
          .
          <source>Psychonomic Bulletin &amp; Review</source>
          ,
          <volume>3</volume>
          (
          <issue>2</issue>
          ),
          <fpage>245</fpage>
          -
          <lpage>248</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Henderson</surname>
            ,
            <given-names>J. M.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Hollingworth</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>1998</year>
          ).
          <article-title>Eye movements during scene viewing: An overview</article-title>
          .
          <source>Eye guidance in reading and scene perception</source>
          ,
          <volume>11</volume>
          ,
          <fpage>269</fpage>
          -
          <lpage>293</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Santella</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>DeCarlo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          (
          <year>2004</year>
          ).
          <article-title>Robust clustering of eye movement recordings for quantification of visual interest</article-title>
          .
          <source>Proceedings of the Eye Tracking Research and Applications Symposium</source>
          ,
          <fpage>27</fpage>
          -
          <lpage>34</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Lewandowsky</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Farrell</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>Computational Modeling in Cognition: Principles and Practice</article-title>
          . Thousand Oaks, CA, USA: Sage Publications.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Hornof</surname>
            ,
            <given-names>A. J.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Halverson</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          (
          <year>2002</year>
          ).
          <article-title>Cleaning up systematic error in eyetracking data by using required fixation locations</article-title>
          .
          <source>Behavior Research Methods</source>
          , Instruments, &amp;
          <string-name>
            <surname>Computers</surname>
          </string-name>
          ,
          <volume>34</volume>
          (
          <issue>4</issue>
          ),
          <fpage>592</fpage>
          -
          <lpage>604</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Salthouse</surname>
            ,
            <given-names>T. A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Ellis</surname>
            ,
            <given-names>C. L.</given-names>
          </string-name>
          (
          <year>1980</year>
          ).
          <article-title>Determinants of eye-fixation duration</article-title>
          .
          <source>The American Journal of Psychology</source>
          ,
          <volume>93</volume>
          ,
          <fpage>207</fpage>
          -
          <lpage>234</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>