<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Recognition: Enhancing Event-Based Eye Tracking with Motion-Aware Post-Processing</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nuwan Bandara</string-name>
          <email>pmnsbandara@smu.edu.sg</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thivya Kandappu</string-name>
          <email>thivyak@smu.edu.sg</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Archan Misra</string-name>
          <email>archanm@smu.edu.sg</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Computing and Information Systems, Singapore Management University</institution>
          ,
          <country country="SG">Singapore</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Event-based eye tracking holds significant promise for fine-grained cognitive state inference, ofering high temporal resolution and robustness to motion artifacts, critical features for decoding subtle mental states such as attention, confusion, or fatigue. In this work, we introduce a model-agnostic, inference-time refinement framework designed to enhance the output of existing event-based gaze estimation models without modifying their architecture or requiring retraining. Our method comprises two key post-processing modules: (i) Motion-Aware Median Filtering, which suppresses blink-induced spikes while preserving natural gaze dynamics, and (ii) Optical Flow-Based Local Refinement, which aligns gaze predictions with cumulative event motion to reduce spatial jitter and temporal discontinuities. To complement traditional spatial accuracy metrics, we propose a novel Jitter Metric that captures the temporal smoothness of predicted gaze trajectories based on velocity regularity and local signal complexity. Together, these contributions significantly improve the consistency of event-based gaze signals, making them better suited for downstream tasks such as micro-expression analysis and mind-state decoding. Our results demonstrate consistent improvements across multiple baseline models on controlled datasets, laying the groundwork for future integration with multimodal afect recognition systems in real-world environments. Our code implementations can be found at https://github.com/eye-tracking-for-physiological-sensing/EyeLoRiN.</p>
      </abstract>
      <kwd-group>
        <kwd>eye tracking</kwd>
        <kwd>event camera</kwd>
        <kwd>post processing</kwd>
        <kwd>local refinement</kwd>
        <kwd>model-agnostic</kwd>
        <kwd>jitter metric</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>CEUR
Workshop</p>
      <p>ISSN1613-0073
input for real-time micro-expression-based inference systems that aim to detect and respond to users’
unspoken mental states.</p>
      <p>However, traditional camera-based eye tracking systems face well-known limitations in capturing
these micro-level dynamics. Frame-based approaches typically operate at 30 – 1000 Hz and struggle
with motion blur during rapid eye movements or transient behaviors such as blinking. In contrast,
event-based vision sensors capture pixel-level brightness changes asynchronously with microsecond
latency, yielding sparse but high-temporal-resolution data streams. These characteristics make
eventbased sensors ideally suited for fine-grained eye tracking, especially in contexts where temporal fidelity,
motion robustness, and low-latency processing are critical.</p>
      <p>Despite these advantages, efective utilization of event data in eye tracking remains a challenge. The
sparse and asynchronous nature of the data presents dificulties in processing and interpreting, requiring
specialized models that can handle both the temporal and spatial dimensions of eye movements [10].
Spatio-temporal models, such as Change-Based ConvLSTM [11], graph-based event representations [12],
and event binning methods [13], have emerged as promising approaches. These models aim to capture
the dynamic and continuous nature of eye movements by encoding both spatial and temporal information
from the event streams. This is particularly important because eye movements are inherently continuous,
both in terms of space and time, and spatio-temporal models attempt to leverage these properties to
improve gaze estimation accuracy.</p>
      <p>However, while these models have shown promise, they sufer from several limitations. A key
challenge in event-based eye tracking is the handling of blink artifacts, which cause interruptions
in the event data and lead to erroneous gaze predictions [14]. Another limitation is the temporal
inconsistency often observed in the predictions, as eye movements are physiologically continuous and
models sometimes fail to enforce this temporal smoothness, leading to abrupt gaze shifts that undermine
tracking stability [15]. Additionally, existing models often fail to fully leverage local event distributions,
resulting in misaligned gaze predictions. These challenges, coupled with the inherent label sparsity of
event datasets, make it dificult to develop a universally robust event-based tracking system.</p>
      <p>To address these challenges, we propose a model-agnostic inference-time post-processing and local
refinement framework to enhance the accuracy and robustness of event-based eye tracking. Our
approach targets the shortcomings of existing spatio-temporal models by introducing lightweight,
post-processing techniques that can be integrated with any model without requiring retraining or
architectural changes. This makes our method flexible and easily applicable to a wide range of existing
models. The post-processing framework consists of two key components: (i) motion-aware median
ifltering, which enforces temporal smoothness by taking advantage of the continuous nature of eye
movements, and (ii) optical flow-based local refinement, which improves spatial consistency by aligning
gaze predictions with dominant motion patterns in the local event neighborhood. These refinements
not only mitigate blinking artifacts but also ensure that gaze predictions remain temporally continuous
and spatially accurate, even in the presence of rapid eye movements or motion artifacts.</p>
      <p>By incorporating these post-processing and refinement techniques, our approach improves the
overall performance and robustness of event-based eye tracking. This makes it particularly valuable
in real-world applications where traditional models may fail due to the challenges posed by low-light
environments, high-speed motion, or intermittent artifacts like blinks. Furthermore, because our
framework is model-agnostic, it can be applied to any existing event-based eye-tracking model, ofering
a significant boost to accuracy and stability without requiring changes to the core model.</p>
      <p>
        In this paper, we make the following key contributions:
• Model-Agnostic Post-Processing: We propose an inference-time refinement approach that
enhances existing event-based pupil estimation models without modifying their architectures or
requiring retraining through (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) Motion-Aware Median Filtering: a median filtering technique
that incorporates motion-awareness to preserve temporal continuity in gaze predictions and
mitigate blinking-induced errors, and (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) Optical Flow-Based Smooth Refinement: a local
refinement strategy leveraging optical flow to ensure that predicted gaze positions align with the
cumulative local event motion, reducing spatial inconsistencies in tracking results.
• Jitter Metric: We propose a complementary metric for pupil tracking task to specifically evaluate
the temporal smooth continuity of the predictions with respect to true targets based on (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) global
statistical distribution of pupil velocities, addressed via comparative velocity entropy, and (
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
local fine-grained frequency content of pupil velocities, addressed via spectral arc length-guided
spectral entropy.
• Empirical Validation and Performance Gains: Through extensive experiments, we
demonstrate that our proposed methods significantly enhance the robustness and accuracy of
state-ofthe-art event-based eye-tracking models across diverse conditions.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <sec id="sec-2-1">
        <title>2.1. Micro-expression Recognition</title>
        <p>Micro-expressions are rapid, involuntary facial movements that reveal transient emotional states often
concealed from conscious awareness. Due to their subtlety and brevity, automatic recognition of
micro-expressions remains a challenging task that has garnered significant interest within the computer
vision and afective computing communities.</p>
        <p>Early approaches predominantly relied on handcrafted spatio-temporal features extracted from
high-frame-rate facial video sequences, such as Local Binary Patterns on Three Orthogonal Planes
(LBP-TOP) [16], optical flow-based descriptors [ 17], and optical strain [18], to capture nuanced facial
muscle dynamics. While these methods laid the foundation for subsequent advances, they were often
limited by their dependence on feature engineering and sensitivity to noise. More recently, deep learning
architectures, including 3D Convolutional Neural Networks [19] and Long Short-Term Memory (LSTM)
networks [20], have been employed to automatically learn hierarchical representations from
highframe-rate facial video sequences, significantly improving recognition accuracy. Attention mechanisms
have further enhanced model capacity by focusing on discriminative spatial and temporal regions [21].
The availability of specialized datasets such as CASME [22], CASME II [2], SMIC [23], and SAMM [24],
has been foundational, providing high-frame-rate facial videos annotated with micro-expression labels
under controlled environments.</p>
        <p>While facial cues remain the primary modality for micro-expression analysis, recent studies
underscore the complementary value of ocular signals, including pupil dynamics and saccadic eye movements,
as indicators of cognitive and afective states [ 25, 26]. Integrating eye-tracking data can enhance
the robustness and granularity of emotion recognition, particularly in applications such as deception
detection, clinical assessment of afective disorders, and adaptive human-computer interaction systems
that respond to user engagement and mental workload.</p>
        <p>In this paper, we specifically address the challenge of fine-grained eye tracking, an important yet
often overlooked component of micro-expression analysis, and propose novel techniques to improve
its accuracy and temporal consistency, thereby enhancing the reliability of eye-based cognitive and
afective state inference.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Event-based Pupil and Gaze Tracking</title>
        <p>Event-based cameras, ofer a fundamentally diferent sensing paradigm compared to conventional
frame-based cameras. By asynchronously recording pixel-level brightness changes with microsecond
temporal resolution and an extremely high dynamic range (120 ) while consuming minimal power
( level), these sensors are uniquely suited for applications requiring precise, low-latency motion
tracking [27]. Consequently, the eye tracking community has recently begun to explore event-based
approaches for pupil and gaze tracking, motivated by the limitations of traditional RGB and infrared
systems in terms of temporal resolution and power eficiency [ 12]. Current research on event-based
eye tracking can be broadly categorized into two main approaches:</p>
        <p>Hybrid event-RGB: Several works have proposed combining event data with RGB frames to leverage
the spatial resolution and structural detail of conventional cameras alongside the temporal advantages
of event sensors. The approaches presented in [28] exemplify this direction by using RGB frames
for initial pupil detection, with event streams employed to refine temporal tracking. While efective
in improving robustness, such hybrid methods are inherently constrained by the frame rate of the
RGB sensor, typically in the order of tens of milliseconds, which limits the full exploitation of the
asynchronous, high-frequency event data. Moreover, these methods rely on RGB imagery and thus
may sufer from environmental limitations such as variable lighting conditions, which event sensors
are inherently more resilient to.</p>
        <p>Event-Only Tracking: More recent work focuses exclusively on event streams, aiming to fully
harness the unique properties of event cameras for pupil and gaze estimation. The event-only
approaches [29, 12, 30], which aggregate events into either 2D or 3D representations and get inferred
through either neural networks or traditional computer vision algorithms and thus, occasionally sufer
from label sparsity and ineficient representations.</p>
        <p>Bandara et al. [12] proposed EyeGraph, a novel approach that constructs spatiotemporal graphs
from event data to represent pupil contours, and addressed the issue of label sparsity by proposing an
unsupervised graph-based clustering approach to spatially localise the pupil in a 3D event volume. In
contrast, Sen et al. [29] presented EyeTrAES, an event-based adaptive slicing mechanism that adaptively
adjusts the volume of the event accumulation based on the underlying eye motion.</p>
        <p>Despite these advances, event-only pupil tracking faces several open challenges. Designing
representations that preserve the high temporal fidelity of events without overwhelming computational
resources remains an active research area. Moreover, the limited availability of synchronized ground
truth data hinders large-scale supervised training. Event sensors also produce noise from
environmental artifacts such as illumination flicker or head movement, necessitating robust filtering and model
designs [27].</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Spatio-Temporal Processing of Events</title>
        <p>Event-based eye tracking requires models that efectively capture the sparse, asynchronous nature of
event streams while preserving high temporal resolution [27]. Unlike frame-based methods,
eventbased vision demands spatio-temporal representations that can model continuous eye movements with
precision. Recent prior works have proposed several spatio-temporal models-based event processing to
capture the temporal evolution of the events. In this paper, we consider three classes of such models:
(a) ConvGRU and ConvLSTM, (b) graph-based representations, and (c) event-binning methods.</p>
        <p>ConvLSTM and ConvGRU networks [11], originally designed for dense sequential data, struggle with
the sparsity of event streams. Change-Based-convLSTM (CB-convLSTM) mitigates this by leveraging
change-based updates, focusing on local event dynamics rather than static frames [11]. This improves
tracking accuracy by ensuring temporal continuity while maintaining event eficiency. Graph-based
models encode events as nodes with spatio-temporal edges, preserving fine-grained motion patterns
[12]. These approaches enhance tracking by leveraging local event dependencies, making them efective
for eye movement estimation. Event binning methods aggregate events over predefined intervals to
create structured inputs for deep temporal networks. While simple and eficient, methods like causal
event volume binning strike a balance between temporal continuity and real-time feasibility [13].</p>
        <p>While these models enhance spatio-temporal representation, they often produce temporally
inconsistent or spatially misaligned predictions. Our proposed model-agnostic inference-time refinement
improves accuracy by enforcing temporal smoothness and aligning predictions with local optical
lfow. This enhancement operates independently of the underlying spatio-temporal model, making
event-based eye tracking more robust across diferent architectures.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Motivation</title>
      <sec id="sec-3-1">
        <title>3.1. Inference-time Post-processing</title>
        <p>Even though the current event-based pupil tracking models attempt to incorporate both spatial and
temporal learning blocks within their pipelines, the empirical results show that they still sufer from
poor performance when handling blink artifacts. Further, even though the pupil movements are
physiologically continuous and bounded, these models fail to strictly enforce this rule within the learning
pipeline, leading to unstable pupil trajectories at the inference time. To address these limitations, here
we propose a motion-aware median filtering technique as a post-processing method which penalizes
trajectory outliers, either due to blinking or tracking instability, based on an adaptive motion profiling
mechanism and thereby, achieves a stable pupil trajectory. Additionally, given that the existing models
often tend to prioritize global perceptible eye morphology compared to local heuristics, which are
deduced through event distributions, the pupil trajectory predictions often present an unpreventable
ofset. To circumvent this issue, we propose to utilize the optical flow around the original predictions
such that if the optical flow at the original prediction does not align with the local optical flow, then an
ofset is assumed and subsequently, corrected by shifting the prediction by a small defined margin. It is
to be noted that both of these proposed techniques work in model-agnostic fashion and thus, can be
lfexibly applied to any existing model without them being re-trained or modified.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Jitter Metric</title>
        <p>
          Existing metrics for evaluating the event-based pupil tracking performance such as p-accuracy and
pixel distances only capture the per-sample positional accuracy of the methods and thereby neglect
an essential aspect in pupil tracking: temporal smooth-continuity, which is critical for the stable and
user-ergonomic performance in many downstream applications such as foveated rendering,
gazebased human computer interaction, user authentication, and afective-cognitive modelling [ 31, 29]. In
addition, unlike gaze which exhibits both smooth and rapid transitions which sometimes can be almost
non-continuous with a significantly higher angle acceleration [ 28], the pupil movements are bounded
and continuous in nature [12] which further validates the need for an explicit evaluation metric for
temporal smooth-continuity of pupil tracking. To this end, we propose a velocity-based metric which
considers and weights both (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) global statistical distribution of velocities via Kullback-Leibler divergence
and (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) local velocity jaggedness via spectral arc length-guided spectral entropy. Further, it is to be
noted that our metric is comparative between the predicted and true pupil trajectories while being
lesser susceptible to measurement or prediction noise in contrast to jerk-based smoothness scores. We
theoretically and empirically show the eficacy of the proposed metric in the context of pupil tracking.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Inference-time Post-processing</title>
      <p>
        To address the shortcomings of existing methods in the inference stage, we propose to add two
lightweight post-processing techniques specifically targeting the following limitations: (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) motion-aware
median filtering (algorithm 1) to (a) ensure the temporal consistency of the predictions since the eye
movements are physiologically bound to be continuous in spatial domain [12] and (b) reduce the blinking
artifacts and (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) optical flow estimation in the local spatial neighbourhood (algorithm 2) to smoothly
shift the original predictions if the flow vector at the original prediction is unaligned with the cumulative
local neighbourhood flow direction. Synoptically, (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) is motivated by our empirical observations which
convey the abundance of blinking artifacts within the predictions of the existing models (as shown in
Fig. 1), whereas (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) is specifically inspired by our observations which hint that the original predictions
tend to occupy a negligence towards the event motion flow in the local neighbourhood, suggesting a
lack of attention to the local event distribution in the original models.
      </p>
      <p>More descriptively, in motion-aware filtering as shown in Algorithm 1, we first estimate the local
motion variance in temporal dimension (i.e., within a set time window) using a set of alternative methods
(a)
(b)
(c)</p>
      <p>(d)
(e)
global temporal variation of x coordinates around the presented blinking case.
including 0ℎ to 2 order kinetics, covariance and frequency (of which the equations are defined in
Equations below), and subsequently, assign a median-based adaptive filter windows for each set time
windows such that the kernel size for median filtering is adaptive and appropriate to the variability of
the background pupil movement while also ensuring the temporal consistency.</p>
      <p>With respect to the local motion variance calculation, more mathematically, if the predicted pupil
trajectory is p() = [
estimation could be derived as  ̄ = 1 ∑/2</p>
      <p>=−/2
()
 () ] for discrete time index  , then, 0ℎ kinetic-based local motion variance</p>
      <p>‖ ( + )‖ whereas   = √(  −  −1 )2 + (  −  −1 )2 and 
being the window size. In extension, velocity is the first derivative of position: v() = 
the smoothened velocity-based local variance could be calculated through   vel = 1 ∑
the local motion variance estimation of   acc = 1 ∑
/2
Similarly, acceleration is the second derivative of trajectory position: a() =  2

=−/2 ‖a( + )‖. With respect to the
covariancebased local motion estimation, given a local window   = {p( −  /2), … , p( +  /2)} , we compute:
Σ = Cov(  )and subsequently, the covariance-based motion variance is estimated as:
 p() and thus,
/2
=−/2 ‖v( + )‖.</p>
      <p>
        2p() , which leads to
  cov = ‖Σ ‖
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
where ‖ ⋅ ‖ denotes the Frobenius norm.
      </p>
      <p>When estimating the local motion variance through frequency features, we utilize the Short-Time
Fourier Transform of pupil trajectory signal () be   ( , ) = ∫( ) ( − )
power spectrum is   ( , ) = |  ( , )| 2. Then, the frequency-domain motion variance is estimated as
of which the
−2  
  freq
= √</p>
      <p>Var (  ( , ) )+ Var (  ( , ) ).</p>
      <sec id="sec-4-1">
        <title>In optical flow estimation as shown in Algorithm</title>
      </sec>
      <sec id="sec-4-2">
        <title>2, we first estimate the appropriate size for the</title>
        <p>region of interest (ROI) around the filtered prediction using the first order derivatives of
 and  and</p>
      </sec>
      <sec id="sec-4-3">
        <title>Algorithm 1 Motion-aware median filtering</title>
        <p>Require: Original predictions {  ,   }, base window for local motion variance estimation   ,
minimum allowed smoothing window   , maximum allowed smoothing window   , percentile
to determine adaptive window size  , method  (.) ∈ {displacement, velocity, acceleration, covariance,
frequency}
1: Output: filtered predictions { ( ,) ,  ( ,) }
2: local motion variance ⟵  ({  ,   },   )
3: smoothened variance ⟵ rolling mean(  , local motion variance)
4: median window ⟵ clipping(smoothened variance,   ,   )
5: adaptive windows ⟵ clipping(  ,   , rolling(median window,   ,  ))
6: { ( ,) ,  ( ,) } ⟵ rolling median({  ,   }, adaptive windows)
then, if the number of events within the selected ROI exceeds a set threshold, we accumulate and
determine the cumulative vector trajectory of the events within ROI to softly shift the filtered prediction
to further refine its spatial position.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Jitter Metric</title>
      <sec id="sec-5-1">
        <title>5.1. Background &amp; Definition</title>
        <p>Even though the existing metrics for the task at hand are beneficial to evaluate positional accuracy, none
is efective in evaluating the smooth continuity of the pupil movements, which is critical for the stable
performance in downstream applications such as foveated rendering or gaze-based interaction [31, 29],
especially given the bounded and continuous nature of oculomotor (i.e., pupil movement) activity.
Therefore, as a complementary metric, we propose the following jitter metric (Eq. 2) to specifically
evaluate the temporal smoothness of the predictions while also considering the true targets.</p>
        <p>
          When designing the jitter metric, we postulate two key premises based on pupil velocity considering
both global and local levels: if the predicted trajectory is significantly diferent in temporal cohesion (i.e.,
comparative smooth-continuity) than the true trajectory, then (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) the distributions of the motion (i.e.,
velocity) in the global level are statistically diferent from each other, suggesting the predicted trajectory
is of poor realism of the true trajectory, and (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) abrupt transitions and local jaggedness are reflective in
the disparity in the frequency content, suggesting the fine-grained local diferences between the two
trajectories.
        </p>
        <p>
          To embed premise (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) within our metric, we employ Kullback-Leibler (KL) divergence due to its ability
in measuring the information loss when compared to the true velocity distribution, such that, if the
predicted velocities encompasses highly erratic or excessively smooth velocity distribution compared
to true velocities, the log-normalized comparative velocity entropy, measured through KL divergence,
would reflect such global diference with a higher value of our jitter metric. In contrast, to integrate
premise (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) within our jitter metric, we utilize spectral arc length (SPARC)-guided, inspired by [32],
spectral entropy (SPE) which is a frequency-domain metric known for its lower sensitivity towards noise
than the jerk-based smoothness metrics. Typically, more complex frequency spectra relates to greater
arc lengths and thereby, lesser locally smooth signals. Further, smoother velocity signals generally
consist of higher lower frequency energy and vice versa. Motivated by these observations, here we
incorporate ground-truth-anchored SPE diference between predicted and true velocities as a reflective
measure of the fine-grained local diferences between predicted and true trajectories (see section 5.2 for
our derivation of this equation).
        </p>
        <p>(
pred(,) , true(,) ) =  ⋅
|SPE ( pred) − SPE ( true) |</p>
        <p>
          |SPE ( true) | + 

+ (1 − ) ⋅ log (1 +  KL (  [ ( pred) ] ||   [ ( true) ])) (
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
        </p>
        <p>}, scaling parameter  , count threshold  , diference threshold 
,(,,,)
where  ∈ {1,  } , filtered
predictions { ( ,)</p>
        <p>,  ( ,)
1: Output: Refined predictions
2: timestep ⟵   (= , 
)−  (=1,</p>
        <p>)
{ (, ,)
|{ ( ,)
, ( ,)</p>
        <p>}|
3: previous timestamp ⟵ 
4: ROI size  ⟵  × 10
5: for  , (

( ,)</p>
        <p>,  ( ,)</p>
        <p>} do
,  (, ,)</p>
        <p>}
diference in x
diference in y
⟵ absolute(
⟵ absolute(


( ,)
( ,)
− ({
− ({
−∶
( ,)
−∶
( ,)
}))
}))
if diference in x &gt;  ×  ∪
diference in y
&gt;  × 
then
current timestamp ⟵ previous timestamp + ( + 1) × 
if  &gt;  then
 ⟵ (1 + ) × 
 ⟵ (1 − ) ×</p>
        <sec id="sec-5-1-1">
          <title>Algorithm 2 Rule-based optical flow estimation for smooth shifts</title>
          <p>Require: Continuous event stream with  number of events  
events in ROI ⟵   ( ∈ {previous timestamp, current timestamp},  ∈ { 
( ,)
− , 

( ,)
+ } ,
− , 
then
|events in ROI|
 ⟵ 0;  ⟵ 0
for  ∈ {1, }</p>
          <p>do
previous timestamp</p>
          <p>⟵ current timestamp

( ,)</p>
          <p>+ } ) ∈  ,(,,,)
+ =
 + =
events in ROI( =  ) − events in ROI( =  − 1 )
events in ROI( =  ) − events in ROI( =  − 1 )
if absolute( ) &gt; 0∪ absolute( ) &gt; 0 then


⟵  ( ,)
⟵  ( ,)
+ ‖,‖
+ ‖,‖


6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30: end for
where,
else
end if
end if
 ∈ { ( ,)
 ⟵
if  &gt;  × 10


 (, ,)
 (, ,)
end if
end for
end if
 KL (
 [ ( pred) ] ||</p>
          <p>[ ( true) ]) = ∑</p>
          <p>
            )
(
            <xref ref-type="bibr" rid="ref3">3</xref>
            )
(
            <xref ref-type="bibr" rid="ref4">4</xref>
            )
Here, pred(,) , true(,)
          </p>
          <p>are predicted and true pupil trajectories whereas  [.] is the function for
velocity histogram estimation from predicted and true trajectories. The input to SPE as in Eq. 4 is
(,  ) ∈ { pred(,) , true(,) } and   is the Fourier magnitude of the respective velocity signal. The
weight hyperparameter for balancing the impact of KL divergence and SPE terms is  ∈ [0, 1] and 
is a small constant to ensure numerical stability (i.e., avoid division by zero). In summary, our jitter
metric</p>
          <p>is computed as a weighted sum of the normalized SPE diference and the log-normalized
KL divergence of velocity histograms, and in addition, as designed, a lower value would reflect a more
similar comparative temporal smoothness between the true and predicted pupil trajectories. Further, a
Step 1: Uniform Frequency Spacing Assumption</p>
        </sec>
        <sec id="sec-5-1-2">
          <title>Assuming the frequency bins are uniformly spaced, we define:</title>
          <p>Δ =  +1 −   (is constant),
Δ ′ =</p>
          <p>We then simplify the arc length as with Δ  =  +1 −   and given that the spectrum is smooth
Δ  &lt;  where  (&gt; 0) is infinitesimally small and ∈ ℝ:</p>
          <p>SPARC ≈ −Δ ′ ∑</p>
          <p>1 + (
 −1
 −1
=1
=1 √
= −Δ ′ ∑ (1 +
= −( − 1)Δ ′ −
Δ 
Δ ′</p>
          <p>2
)
(
1 Δ 
2 Δ ′</p>
          <p>1
2Δ ′
 −1
=1</p>
          <p>2
) )
∑ (Δ  )2</p>
          <p>Δ

 −  1
 2
2
(using √1 +  2 ≈ 1 +
when || &lt;&lt; 1 )
detailed set of theoretical analysis on the proposed jitter metric, including the boundedness, continuity,
formal constraints, continuity, and diferentiability, is included in section
5.3.</p>
        </sec>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Derivation of SPE Equation</title>
        <p>Original Discrete SPARC Definition
Let  ( )</p>
        <p>denote the normalized magnitude spectrum of the velocity signal  () , sampled at frequency
  . Let {  ,   } be the discrete points of this spectrum such that   being the frequency bin and   being
the normalized magnitude of Fourier transform at frequency   (The velocity signal can be optionally
ifltered prior by a cutof frequency
used. The SPARC metric is given by:</p>
        <p>and amplitude threshold  ).  is the number of frequency bins
SPARC = − ∑
This corresponds to the total normalized negative arc length in the frequency-magnitude plane.
 (  ) =
SPARC ≈ SPE = − ∑ log(  + ) ⋅  (  )</p>
        <p>This shows that SPARC is negatively correlated with squared variation in spectral magnitude and thus,
penalizes high spectral variation, which corresponds to non-smooth or jerky movements (Observation
A).</p>
        <p>Step 2: Frequency-Magnitude Reinterpretation</p>
        <sec id="sec-5-2-1">
          <title>We now reinterpret  ( )</title>
          <p>as forming a normalized (discrete) probability distribution over frequency
(instead of normalizing using the maximum amplitude), penalizing the higher spectral variation, such
that  ( ) ≈ |  |:</p>
          <p>
            Motivated by Observation A, we propose a frequency-weighted sum form that emphasizes smoothness
by penalizing energy concentrated at high frequencies (Note that the frequency weighting term ( )
penalizes the higher frequencies by leading the SPE towards more negative sum):
(
            <xref ref-type="bibr" rid="ref5">5</xref>
            )
(
            <xref ref-type="bibr" rid="ref6">6</xref>
            )
(
            <xref ref-type="bibr" rid="ref7">7</xref>
            )
(
            <xref ref-type="bibr" rid="ref8">8</xref>
            )
(
            <xref ref-type="bibr" rid="ref9">9</xref>
            )
(
            <xref ref-type="bibr" rid="ref10">10</xref>
            )
where  &gt; 0 is a small constant added for numerical stability.
          </p>
          <p>Even though this version drops the explicit arc-length geometry, it retains the original spirit of SPARC
by favoring low-frequency spectral energy concentration, and is more computationally tractable and
diferentiable.</p>
          <p>Summary</p>
        </sec>
        <sec id="sec-5-2-2">
          <title>We conclude with the following approximate proxy metric:</title>
          <p>SPE = − ∑ log(  + ) ⋅</p>
        </sec>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Theoretical Analysis on Jitter Metric</title>
        <p>5.3.1. Theoretical Justification
Supplementary to our explanations in section 5.1 and the derivation in section 5.2, below we present
two lemmas to show why the jitter metric captures the smoothness.</p>
        <p>Lemma 5.1. Let  () be a continuous-time signal. Then, the smoother  () is, the more concentrated its
frequency spectrum is in low frequencies. As a result, the spectral entropy is lower.</p>
        <p>Proof. The intuition behind the lemma, in high-level terms, is that a signal with higher smoothness
should have a fewer rapid fluctuations, which in Fourier terms, similar to the high-frequency components
having lower magnitudes. The quantity,  
, in the first term in the jitter metric reflects, in
absolute
sense, how spread out energy is across frequencies; i.e., highly smooth signal has low  
.</p>
        <p>Formally, in more general sense, if the velocity function  ∈   (ℝ)with − continuous derivatives:
 ,  () ∈ 
1(ℝ)and Fourier transform of  ,  ∈̂</p>
        <p>1(ℝ ),then by diferentiation theorem,
Similarly, if    () ∈  1(ℝ)for some  ∈ ℕ , then,  ∈̂   (ℝ)for 0 ≤  ≤  ,
 ( )̂ =
−1   ̂
2</p>
        <p>( )
   (̂  ) = (
2 
−1     ̂
)
  ( )
(11)
(12)
(13)
(14)
Therefore, through this decay property, it is possible to conclude that smoother signals, which have
larger  , decay faster which goes to 0 at infinity.</p>
        <sec id="sec-5-3-1">
          <title>On the other hand, the log penalizing term in</title>
          <p>the energy at high frequency, the greater the contribution to  
in absolute sense, since log( + ) is
monotonically increasing for  +  &gt; 1 .</p>
          <p>Therefore, considering above, a smoother signal  () has a faster decaying spectrum and thus lower
weights on high frequency terms, and thus, has lower spectral entropy.</p>
          <p>Lemma 5.2. Let  and  be discrete velocity magnitude distributions for predicted and true trajectories.
Then,  
( ||)</p>
          <p>quantifies how diferent the temporal dynamics are between the predicted and true
trajectories, with larger divergence implying greater discrepancy in movement similarity.
Proof. The intuition behind the lemma, in high-level terms, is that a if the predicted trajectory exhibits
similar motion variability and regularity as the true trajectory, then, a lesser extra information would
in the first term in jitter metric ensure that higher
be needed to encode samples from  using a model based on  .</p>
          <p>Formally, assuming both  (), () &gt; 0 and ∑  () =
has the following properties (from Gibb’s inequality),
∑ () = 1 , and noting that the KL divergence
  ( ||) ≥ 0 and   ( ||) = 0 ⟷  = 
In addition, as shown in theorem 5.6, KL divergence is finite only if the support of  is contained within
the support of  . As an example, if  assigns a probability to velocities where  is almost 0, then,</p>
          <p>Based on above properties and the log-likelihood ratio, it is possible to imply that,
()
()
•  () &gt;&gt; () for some  ∈  ⟶</p>
          <p>&gt;&gt;  , which in turn implies that   increases.
• Similarly,  () &lt;&lt; () for some  ∈  ⟶
&lt;&lt; 0 while  () ≈ 0 .
log ()
()
  =  (,  )
As a case study, it is possible to build a simple family of probability distributions parametrized by  .
Let  =0 =  and therefore,   difers from  as  is modified. If  = 0 and  ∈ ℕ and the family of
probability distributions be univariate Gaussians, then,  is characterized by  (0,  )
and generally,
. Therefore, it is trivial that as  grows,   shifts away from  . Then, from general case for
Since in our case,  1 =  2,  1 =  , and  2 = 0, then,
  ( ||) = log  2 +
  ( ||) = log  1 +
Therefore, it is trivial that   increases monotonically with  , i.e., when   difers more from  . Similarly,
a more generic proof can follow Donsker and Varadhan’s variational formula in this context.
5.3.2. Formal Constraints
Jitter metric is theoretically plausible under the following set of assumptions:
• The predicted and true trajectories must be at least  1−continuous.
• The jitter metric must have the full support of 
 [ ( true) ].</p>
          <p>• Both predicted and true trajectories must have bounded energies: ∫| (̂ .)|2 &lt; ∞
• Additionally, as required by the task at hand, the velocity trajectories should be non-negative and
temporally aligned.</p>
          <p>Lemma 5.3. Let pred(,) () ∈  1(ℝ)be the predicted pupil trajectory, then, the velocity
continuous, and the velocity histogram   is well-defined as a Radon-Nikodym derivative with respect to

is
the Lebesgue measure.</p>
          <p>Proof. By definition of  1−continuity, pred(,)
and</p>
          <p />
          <p>Since pred(,) ()is  1, then,
value theorem. Further,  pred(,) ()



()is diferentiable (on a closed and bounded interval),
is continuous. Continuity of
ensures it is measurable, by measure theory.
(as it is a physical trajectory, i.e., a pupil velocity trajectory).
measure, and  ⊆ ℝ is any measurable set:</p>
          <p>Considering the constructed velocity histogram   on the time interval [0,  ]  ∈ ℝ ,  is the Lebesgue
is bounded on any compact interval [, ]; ,  ∈ ℝ
by extreme
does not diverge since it has a finite energy, i.e., ∫|

 pred(,) () |2 &lt; ∞
1

  () =
({ ∈ [0,  ]|

∈ })
As
measurable for any Borel set  . Further, as
of physical trajectories),   is a valid probability measure.</p>
          <p>is continuous, then measurable, the preimage { ∈ [0,  ]|
∈ } is
Lebesgue</p>
          <p>is bounded and  &lt; ∞ (due to being characteristics
(15)
(16)
(17)

( ) =

1</p>
          <p>∑
 |

 pred(,)

( )
 =
|
 
 
( )−|1
Being a physical trajectory,</p>
          <p>is piecewise monotonic. Therefore, it is possible to further
extend that   should admit a probability density function ( ) :
  =
pred(,)
( +1 )−pred(,)</p>
          <p>+1 − 
∞ . Further, a similar proof can
The same can be proved for true trajectory as well. Therefore, the first term is scale-invariant for  &gt; 0
Regarding the second term, for scaled  ′ and  ′, while adapting the continuous definition for KL
≈ − ∑ log( + ) ⋅ (</p>
          <p>) = SPE(
   ( ′|| ″) =∫  ′( )log
 ′( )
 ′( )
  =

1
∫  (


)log 
1

1</p>
          <p>)
 (
(  ) 


Therefore,</p>
          <p>is scale-invariant, so is log(1 +    ).</p>
          <p>Therefore, from all above, the jitter metric is scale invariant for ∀ &gt; 0 with negligible  .
 ()
()
 ()
()
= ∫  () log
 ⋅   =
∫  () log
  = 
  ( ||)
where the sum is over roots   of</p>
          <p>−  = 0 . As a practical implication, velocity from discrete
pupil trajectories (of  samples) can be computed via finite diferences:
follow true pupil trajectories as well.
pred(,)</p>
          <p>()is  1, then, the histogram of {  } converges to   as  ⟶
5.3.3. Scale Invariance
Theorem 5.4. Jitter metric is scale-invariant for ∀ &gt; 0 under negligible  .</p>
          <p>= − ∑ log( + ) ⋅ (
) ≈ − ∑ log( + ) ⋅ (
) under negligible 
the velocities scale as:
ℱ (
 pred(′,) ()</p>
          <p>SPE(</p>
          <p>) = ℱ (
 pred(′,) ()
 &gt;0</p>
          <p>for negligible  .
divergence,
Changing variables:  =
(18)
( )
 , as if
(19)
(20)
(21)
(22)
(23)
Then, the scaled trajectories are: pred(′,) () =  ⋅ pred(,)
Proof. Consider a linear and consistent scaling factor  (&gt; 0) across both predicted and true trajectories.
 pred(′,) ()

=  ⋅
and
()and true(′,)
 true(′,) ()</p>
          <p>=  ⋅

() =  ⋅ true(,) (). Therefore,
 true(,) ()</p>
          <p>. By extension, the


When considering the first term in the jitter metric, from the scaling property of Fourier transform:

) =  ℱ (
) ⟶ | ′| =  |  |. Therefore,</p>
          <p>) = −∑ log( + ) ⋅ (
5.3.4. Lower Bound
Theorem 5.5. Jitter metric is lower bounded. In other words, for any predicted (pred(,) ) and true
(true(,) ) trajectories, the metric satisfies   (
pred(,) , true(,) ) ≥ 0with equality if pred(,)
= true(,)
Proof. Both first and second terms in the jitter metric are strictly non-negative due to both additive
terms being non-negative (while  ∈ [0, 1] and  ∈ ℝ ):
• The first term is trivially non-negative (i.e., |SPE ( pred) − SPE ( true) | ≥ 0)
• The second term is also non-negative since KL divergence is always non-negative (i.e., Gibb’s
inequality,</p>
          <p>( ||) ≥ 0 leads to log(1 + ) ≥ 0 for  ≥ 0 ).</p>
          <p>Regarding the condition for equality,
true(,) ), then, ( pred)
metric vanishes.</p>
          <p />
          <p>second term also vanishes.</p>
          <p />
          <p>• Similarly, ( pred) = ( true) leads to   (.) = ,0since log (
• When the predicted trajectory and the true trajectory are congruent in 1D sense (i.e., pred(,)
=
= ( true) . Therefore, |SPE ( pred) − SPE ( true) | = 0 and the first term in jitter</p>
          <p>Since both first and seconds terms are strictly non-negative, their weighted sum (i.e., jitter metric
is the lower bound for the metric.
value) is also strictly non-negative. Further,   (.) = 0 is only possible when pred(,)
= true(,)
which
5.3.5. Upper Bound
Theorem 5.6. Jitter metric is not upper-bounded.</p>
          <p>Proof. The first term in the jitter metric is bounded if both SPE ( true) and SPE ( pred) are finite and ∈ ℝ.</p>
        </sec>
        <sec id="sec-5-3-2">
          <title>In other words,</title>
          <p>Then,   (.) ⟶</p>
          <p>∞,so,   (.) ≥ (1 − ) ⋅ log(1 + ∞) = ∞.</p>
        </sec>
        <sec id="sec-5-3-3">
          <title>To ensure the suficient condition for KL divergence to be upper-bounded,</title>
          <p>Regarding the second term: let</p>
          <p>[ ( pred) ] be a velocity distribution with support disjoint from   [ ( true) ].
 [ ( pred) ] assigns non-zero probability to events where 
 [ ( true) ] has zero probability.</p>
          <p>[ ( true) () ) = 0, and therefore, the

]



bound, then  
Proof. Since 
(</p>
          <p>[ ( true) ]) &lt; ∞

 [ ( true) ] has compact support:</p>
        </sec>
        <sec id="sec-5-3-4">
          <title>Similarly, since</title>
          <p>[ ( pred) ] has compact support,
Since both are on the same support and 

∈
∈



 = inf</p>
          <p>[ ( true) ]() &gt; 0
 =̄ sup</p>
          <p>[ ( pred) ]() &gt; 0
 [ ( pred) ] is bounded,
0 &lt;  ≤  &lt;̄ ∞


sup log (
  [ ( pred) ]()


) ≤ log  −̄ log 
Lemma 5.7. Given both   [ ( true) ] and 
 [ ( pred) ] have the same support  and 
 [ ( pred) ] has a finite upper

(24)
(25)
(26)
(27)
Let  = log  −̄ log  , then,
   (</p>
          <p>metric becomes unbounded above.
5.3.6. Continuity</p>
        </sec>
        <sec id="sec-5-3-5">
          <title>Therefore, if</title>
          <p>[ ( pred) ] is with support disjoint from 
 [ ( true) ], then, second term dominates and the

Theorem 5.8. Jitter metric is continuous everywhere except when   [ ( true) ] has zero-mass bins (i.e., if not
smoothened). In other words, with full support, jitter metric is continuous.

Proof. Assume both pred(,)</p>
          <p>are diferentiable on a closed and bounded interval (Note
that these are valid assumptions given the typical ocularmotor i.e., pupil activity, function [12]). Then,
both ( pred) and ( true) are continuous functions under  2 norm: if ‖true − pred‖ ⟶ 0 then ‖ ( true)
−

≤ (log  −̄ log  )∑ 
∈</p>
          <p>[ ( pred) ]() =  &lt; ∞


 [ ( pred) ]
 [ ( true) ]


) ≤ sup log (
∈




 [ ( pred) ]()
 [ ( true) ]() ∈
) ∑ 
 [ ( pred) ]()

(28)
(29)
(30)
(31)
(32)
(33)
(34)
(35)


If</p>
          <p />
          <p>Based on the assumptions of diferentiability on a closed and bounded interval (and thereby continuous
since diferentiability implies continuity), both
tern implies that those are Lebesgue integrable. Therefore, both velocity trajectories are in  1 space:</p>
          <p>( pred) and ( true) are Riemann integrable, which in
(), let Fourier transform of predicted velocity trajectory be:</p>
          <p>̂
. Note that, for simplicity, we only consider 1D case (for  coordinate) considering one variable at a
time in the trajectory and this is easily extendable to multi-variable case as the Fourier transform on
multivariables is well-defined. ∀</p>
          <p>⟶  :
  () =  −</p>
          <p>() is measurable ∀ ∈ ℕ
  () ⟶  − 
()| where | ()
()| is integrable
Therefore, by dominated convergence theorem,
⟶∞
lim  ̂()
(  ) = lim
⟶∞ ∫  − 



ℝ
()
() =
lim
⟶∞ ∫   () =</p>
          <p>ℝ
∫ ()
ℝ</p>
          <p>̂
⟶∞
lim  ̂()
(  ) =∫  −</p>
          <p>ℝ
Therefore,  ̂()</p>
          <p>( ) is uniformly continuous. Similarly, we can prove this for true trajectory as well
under the same set of assumptions. Since taking the absolute value, normalization, multiplication, and
log (for  &gt; 0 ) do not violate the continuity, the first term is continuous.</p>
          <p>Assuming soft histogramming (i.e., soft binning or kernel density estimation), in other words, if

  [ ( true) ] has full support:</p>
          <p>[ ( true) ] &gt; 0 everywhere, then,
As log is continuous over (0, ∞ ),by extension, under the said assumption, the second term is also
As the convex sum of continuous functions is also continuous, the jitter metric is continuous while
  [ ( pred) ], 

 [ ( true) ] ∈ (0, 1] ⟶   [ ( pred) ]/</p>
          <p>[ ( true) ] is continuous</p>
          <p>
            (36)


Theorem 5.9. Jitter metric is diferentiable almost everywhere except when (
            <xref ref-type="bibr" rid="ref1">1</xref>
            )  
pred =  
true, (
            <xref ref-type="bibr" rid="ref2">2</xref>
            )
  = 0 for any  , (
            <xref ref-type="bibr" rid="ref3">3</xref>
            ) with support disjoint from 
 [ ( true) ], and (
            <xref ref-type="bibr" rid="ref4">4</xref>
            ) 
 [ ( pred) ] is on the simplex boundary.
          </p>
          <p />
          <p>are diferentiable on a closed and bounded interval (Note that
these are valid assumptions given the typical ocularmotor i.e., pupil activity, function [12] as a physical
signal). Then, both ( pred) and ( true) are continuous functions under  2 norm: if ‖true − pred‖ ⟶
0 then ‖ ( true) −</p>
          <p>Based on the assumptions of diferentiability on a closed and bounded interval (and thereby continuous
since diferentiability implies continuity), both ( pred) and ( true) are Riemann integrable, which in
tern implies that those are Lebesgue integrable. Therefore, both velocity trajectories are in  1 space:

continuous.
with full support.
5.3.7. Diferentiability

If ( pred() )</p>
          <p>(37)
( ) is
()
(38)
(39)
(), let Fourier transform of predicted velocity trajectory be:</p>
          <p>̂
. Note that, for simplicity, we only consider 1D case (for  coordinate) considering one variable at a
time in the trajectory and this is easily extendable to multi-variable case as the Fourier transform on
multivariables is well-defined.</p>
          <p>Following the theorem 5.8 on the continuity of the jitter metric, more specifically,  ̂()
uniformly continuous, and assuming  ()
which is also uniformly continuous. Consider,
() ∈  1(ℝ ),define the function  ()
()as − ()
̂
(ℎ)ℎ = ∫ ∫ − −ℎ  ()
()  ℎ =
ℝ
∫  −ℎ | 
0 ()</p>
          <p>()
−  ⋅0 ] =  ̂()
( ) −  ()
̂
(0)

0
∫  ()
= ∫  ()
ℝ

0 ℝ
()[−
Therefore, by fundamental theorem of calculus,  ̂()
we can prove this result for  ̂()
( ), and the true velocity trajectory as well.</p>
          <p>( )is diferentiable almost everywhere. Similarly,
pred(.) =</p>
          <p>true as well.
 
when</p>
          <p>Upon the immediate above result, the first term in the jitter metric is diferentiable almost everywhere
except when |  | = 0 since the absolute value function ∈ ℝ is not diferentiable at
0. Further, since
pred(.) =   true leads to vanishing the first term, the diferentiability of jitter metric is not defined
Regarding the second term in the jitter metric, if 
parameters  ∈ ℝ  and  ∈ ℝ  respectively, then, 
  [ ( true) ] as 


. Here, we assume that 
 [ ( pred) ] and</p>
          <p>[ ( true) ] can be parametrized using</p>
          <p>[ ( true) ](;|)
 [ ( pred) ] can be written as</p>
          <p>,   [ ( true) ](;| )

 [ ( pred)</p>
          <p>](;|)
∈  1(ℝ)on the space 
, whereas</p>
          <p>∈
∈

 
) +   () 1   ()
  ()</p>
          <p>] i.e., a fixed   wrt  
as   and   respectively. Therefore,
&gt; 0 ∀(;  ) ∈  and  . For simplicity, we denote 

and 

  (  ||  ) =∑   log(
  )
 
For a fixed   wrt   ,
term:</p>
        </sec>
        <sec id="sec-5-3-6">
          <title>Therefore,</title>
          <p>Therefore, under the assumptions of   ,   ∈  1 in ,  , respectively, on  and   () &gt; 0 ∀ ∈  (and
given   is not on the simplex boundary), the above sum uniformly converges and the derivative exists
and continuous. Similarly, the partial derivative of  
wrt   can be proven to be:
  (  ||  ) = −∑   ()</p>
          <p>log   ()

 
which exists and continuous under the same set of assumptions.</p>
          <p>A more detailed proof with gradient derivations can follow as proved in variational bayes [33].
5.3.8. Time Complexity
The dominant operation in the first term is the Fourier transform. If used fast Fourier transform (FFT)
on an array of length  , then,
As taking magnitude, normalization, weighted sum are in () , then, the time complexity of the first</p>
        </sec>
        <sec id="sec-5-3-7">
          <title>Time complexity of the first term</title>
          <p>= ( log  + ) = (
log )</p>
          <p>For the second term, assuming histograms with  bins ( &lt;&lt;  ), soft-binning is of order ( ⋅ )
(discrete) KL divergence is of order () . Subsequently,</p>
          <p>Time complexity of FFT = ( log )</p>
          <p>Time complexity of the second term = ( ⋅  + ) = ( ⋅ )
Time complexity of Jitter Metric = ( log  +  ⋅ ) = (
log )
(40)
(41)
(42)
(43)
(44)
(45)
(46)
(47)
and
(48)</p>
        </sec>
      </sec>
      <sec id="sec-5-4">
        <title>5.4. Empirical Justification for Jitter Metric</title>
        <p>To empirically demonstrate the necessity and validity of the proposed jitter metric in the context of
pupil tracking, we present a set of controlled visual demonstrations in Fig. 2. These examples are
designed to contrast gaze prediction trajectories with varying degrees of temporal smoothness and
positional accuracy. In doing so, we highlight the limitations of conventional metrics, such as Mean
Squared Error (MSE), and illustrate how the proposed jitter metric serves as a complementary measure
that captures temporal continuity, a crucial yet often overlooked dimension in eye tracking evaluation.</p>
        <p>Each subfigure (Fig. 2b – Fig. 2g) represents a perturbed version of the predicted pupil trajectory
(Fig. 2a) obtained from the 3ET+ dataset. Perturbations were deliberately designed to simulate typical
error modes encountered in real-world event-based eye-tracking pipelines. These include: (i)
lowamplitude random noise, (ii) blink-induced discontinuities, (iii) pixel shift artifacts, and (iv)
highfrequency tremor-like oscillations. Each simulated prediction includes at least two of these perturbations
to reflect realistic degradation patterns observed in practice.</p>
        <p>We use the prediction in Fig. 2a as the reference trajectory. It achieves a moderate MSE of 15.83 and a
jitter metric score of   =0.75 = 0.18. This example serves as a baseline for comparing other trajectories
in terms of both spatial and temporal quality. In Fig. 2b, although the MSE is lower than that of the
reference, the prediction exhibits reduced temporal smoothness. This discrepancy is efectively captured
by our jitter metric, which assigns it a higher score, penalizing the temporal instability overlooked by
conventional positional metrics. Conversely, the prediction in Fig.2e demonstrates superior temporal
smoothness, despite a slightly higher MSE. The jitter score reflects this improvement, underscoring the
metric’s ability to reward smooth predictions even in the presence of minor spatial deviations.</p>
        <p>Figures 2c and 2f present a particularly instructive comparison: both predictions yield nearly identical
MSE values but difer significantly in their temporal continuity. The proposed jitter metric distinguishes
between the two, accurately assigning a lower score to the smoother prediction. These cases exemplify
scenarios where traditional metrics fail to diferentiate predictions with similar positional accuracy but
markedly diferent perceptual quality.</p>
        <p>The trajectory in Fig. 2d is characterized by blink-related discontinuities and abrupt pixel
shifts—common artifacts in event-based systems. While its MSE is comparable to the reference, its elevated jitter
score reflects the increased temporal noise. This highlights the metric’s sensitivity to transient
disruptions that can compromise downstream tasks such as attention estimation or gaze-based interaction.</p>
        <p>Finally, Fig. 2g illustrates a case with relatively high MSE but excellent temporal smoothness. The
jitter metric appropriately assigns it a lower score than the reference, reinforcing its utility as a decoupled
measure of temporal fidelity.</p>
        <p>These examples validate the proposed jitter metric as a critical complement to existing positional
accuracy measures. By capturing trajectory-level smoothness, the metric provides a more holistic
evaluation of prediction quality, particularly in applications such as micro-expression recognition,
cognitive state inference, and gaze-based behavioral analytics, where temporal consistency is important.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Experiments &amp; Results</title>
      <sec id="sec-6-1">
        <title>6.1. Datasets &amp; Base Models</title>
        <p>By following the recent challenge on event-based eye tracking [10], we test our method on the 3ET+
dataset [10, 30] since 3ET+ serves as the most prominent benchmark dataset for the task at hand. In
contrast, since our proposed method is presented as a post-processing step and works in a model-agnostic
fashion, we select two recent models as base models: CB-ConvLSTM [11], and bigBrains [13], to show
the impact of the proposed pipeline towards improved pupil coordinate predictions in each case. More
descriptively, CB-ConvLSTM is a change-based convolutional long short-term memory architecture
which specifically designed for eficient spatio-temporal modelling to predict pupil coordinates from
sparse event frames, whereas bigBrains attempts to preserve causality and learn spatial relationships
using a lightweight model consisting of spatial and temporal convolutions.</p>
        <p>(a) MSE: 15.83, JM: 0.18
(b) MSE: 9.82, JM: 0.23
(c) MSE: 14.40, JM: 0.20
(d) MSE: 59.41, JM: 0.32
(e) MSE: 23.07, JM: 0.16
(f) MSE: 16.32, JM: 0.11
(g) MSE: 3.39, JM: 0.05</p>
      </sec>
      <sec id="sec-6-2">
        <title>6.2. Implementation Details</title>
        <p>We implement and run all our post-processing blocks on a single V100 GPU machine while setting
the following specifics for each proposed algorithm. For the implementation of algorithm 1, we set
  to be 5,   to be 20, the percentile to determine adaptive window size to be 75 and the default
mode of local motion variance estimation method to be based on covariance. For the implementation of
algorithm 2, we set the scaling parameter  to be 8, the count threshold  to be 5, and the diference
threshold  to be 2.</p>
      </sec>
      <sec id="sec-6-3">
        <title>6.3. Baselines</title>
        <p>FreeEVs [30], and SEE [35].</p>
      </sec>
      <sec id="sec-6-4">
        <title>6.4. Evaluation Metrics</title>
        <p>Trivially, we consider two base models, described in section 6.1, as baselines to compare with the
proposed post-processing techniques. In addition, to further extend our analysis, we compare our
method with latest other works in the literature as well, including EyeGraph [12], MambaPupil [34],
Along with the proposed jitter metric, we implement three other metrics: p-accuracy, mean Euclidean
distance ( 2) and mean Manhattan distance ( 1), which are utilized in the recent works in the literature [30],
to quantitatively evaluate the performance of the proposed post-processing methods. As defined in [ 11],
p-accuracy, as presented in Eq. 50, indicates the pixel-level accuracy of the predictions by checking the</p>
        <sec id="sec-6-4-1">
          <title>Euclidean distance between the predicted coordinates (</title>
          <p>) and true coordinates ( 
 ) is within a
specified pixel threshold (  ℎ ). In this work, we set the pixel thresholds to be 10, 5, and 1 following [30].
Further, since the pupil coordinate prediction is a regression task, we incorporate two well-established
regression metrics:  2 in Eq. 51 and  1 in Eq. 52 as well.</p>
          <p>with  ( 
 ,  
 ,  ℎ) = {
 −</p>
          <p>‖ ≤  ℎ
{ ℎ} =
∑  ( 
 ,</p>
          <p>,  ℎ)

1
 =1
1 if ‖ 
0 otherwise


 2 = 1
 1 = 1
 =1
 =1
∑ ‖   −  
∑ |    −  
‖
 2
 |
(50)
(51)
(52)</p>
        </sec>
      </sec>
      <sec id="sec-6-5">
        <title>6.5. Results</title>
        <p>In this section, we present a comprehensive evaluation of our inference-time post-processing framework
across four dimensions: standard positional accuracy metrics, our proposed jitter metric, per-component
ablation analysis, and computational complexity. We also present a set of qualitative results in Fig. 3 for
further demonstration.</p>
        <p>Positional Accuracy: Table 1 reports the performance of our inference-time refinement pipeline
on standard positional accuracy metrics, evaluated on predictions from the bigBrains model [13]. Our
post-processing techniques consistently improve the gaze localization accuracy across all benchmarks.
Notably, when applied to the base model, our method reduces the mean squared error by more than 5.1%
on average (on both validation and test datasets), outperforming all existing event-based eye tracking
approaches. These results demonstrate that our method enhances baseline predictions without any
model retraining or architectural modifications, validating its model-agnostic design.</p>
        <p>Computational Complexity: Table 2 quantifies the computational overhead introduced by our
post-processing modules. As our methods operate entirely at inference time and do not include any
trainable parameters, the additional computational burden is minimal. Specifically, motion-aware
median filtering and optical flow refinement require approximately 172 and 340 FLOPs per event frame,
respectively. Across all tested configurations, the total computational overhead remains below 0.00048%
p10↑
91.45
99.42
99.26
99.00
99.00
99.99
of the base model’s cost. This confirms the practicality of our approach for real-time deployment on
edge devices with constrained resources.</p>
        <p>Temporal Smoothness via Jitter Metric: Table 3 highlights the benefits of our approach with
respect to temporal smoothness, as measured by the proposed jitter metric. Unlike traditional metrics
that emphasize spatial proximity to ground truth, the jitter metric captures the fine-grained continuity
of gaze predictions over time, an essential attribute for downstream applications such as mind-state
decoding or attention estimation. Our refinement modules yield significantly lower jitter scores
than both the base model and other state-of-the-art systems, indicating smoother and more stable
tracking output. This demonstrates that our method not only improves accuracy but also mitigates
high-frequency noise often induced by sensor sparsity or blinking.</p>
        <p>Ablation Analysis: To isolate the contribution of each refinement component, we conduct an
ablation study summarized in Table 4. We evaluate the individual efects of the motion-aware median
ifltering and optical flow-based local refinement modules. The former proves efective in suppressing
transient spikes caused by blinks and sensor noise, while the latter aligns predictions more closely with
the underlying motion cues embedded in the event stream. When combined, these components exhibit
complementary benefits, leading to the highest overall improvements in both accuracy and smoothness.</p>
        <sec id="sec-6-5-1">
          <title>These results validate the composability and robustness of our design.</title>
          <p>As shown in Tab. 4, both of our post processing techniques consistently improved the results of vanilla
predictions of each method. To this end, only applying the motion-aware median filtering improves the
vanilla prediction performance of [13] by reducing the  2 error from 1.500 to 1.466 whereas applying both
motion-aware median filtering and optical flow-based local refinement leads to an  2 of 1.423, thereby
marking an overall improvement of 5.13%. Similarly, when we apply our model-agnostic methods</p>
          <p>Model size
417K
417K
809K
809K
on [11]’s vanilla predictions, the performance improved from 7.922 to 7.504. These observations confirm
the validity of the proposed model-agnostic post-processing methods as a collective way of improving
the existing models while also ensuring the eficacy of the individual blocks.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Discussion &amp; Conclusion</title>
      <p>This work presents a practical, model-agnostic framework for enhancing event-based eye-tracking
pipelines at inference time. A key strength of our approach is its applicability: the proposed
postprocessing techniques, i.e., Motion-Aware Median Filtering and Optical Flow Estimation for smooth
shifts, can be seamlessly applied to the output of any existing event-based pupil estimation model,
improving temporal stability and spatial coherence without retraining or modifying the model
architecture. This makes our method especially valuable in resource-constrained settings or when dealing with
black-box models, ofering a lightweight way to boost performance across a wide range of real-world
applications. Additionally, the introduction of a dedicated Jitter Metric provides a complementary
measure on model quality, addressing a critical gap in evaluation criteria for time-sensitive behavioral
tracking.</p>
      <p>Despite these strengths, there are important limitations to acknowledge.</p>
      <p>• Our work currently focuses exclusively on the ocular modality. While eye movements,
particularly micro-saccades and pupil dynamics, are powerful indicators of cognitive states such as
attention, fatigue, or confusion, real-world mind-state inference typically benefits from
multimodal integration, combining gaze with facial micro-expressions, head dynamics, or physiological
signals. Extending our refinement pipeline and temporal metrics to accommodate or complement
such modalities remains an exciting direction for future work.
• Our evaluations are conducted on datasets collected in controlled laboratory settings, where
participants are relatively still, lighting is consistent, and noise in the event stream is minimal.
In contrast, real-world deployments, for example, in wearable settings or during naturalistic
interactions, introduce challenges such as head motion, background clutter, dynamic lighting,
and partial occlusions. These conditions may degrade the assumptions behind our refinement
techniques (e.g., motion coherence in flow estimation), and thus real-world validation is a critical
next step.
• While our approach improves temporal smoothness and reduces spatial jitter, it does not correct
fundamental prediction errors arising from poor base model performance. If a baseline model
consistently mispredicts pupil location due to sensor misalignment, incorrect calibration, or
biased training data, our method may smooth those errors rather than eliminate them. Therefore,
the method is best viewed as an enhancement layer for models that already ofer reasonable
accuracy, rather than as a full corrective mechanism.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>This work was supported by both the Ministry of Education (MOE) Academic Research Fund (AcRF)
Tier 1 grant (Grant ID: 22-SIS-SMU-044), and by Singapore Management University’s Lee Kong Chian
Professorship Award. Any opinions, findings and conclusions or recommendations expressed in this
material are those of the author(s).</p>
    </sec>
    <sec id="sec-9">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used ChatGPT in order to: Grammar and spelling
check. After using this tool, the author(s) reviewed and edited the content as needed and take(s) full
responsibility for the publication’s content.
[11] Q. Chen, Z. Wang, S.-C. Liu, C. Gao, 3ET: eficient event-based eye tracking using a change-based
convlstm network, in: 2023 IEEE Biomedical Circuits and Systems Conference (BioCAS), IEEE,
2023, pp. 1–5.
[12] N. Bandara, T. Kandappu, A. Sen, I. Gokarn, A. Misra, EyeGraph: modularity-aware spatio temporal
graph clustering for continuous event-based eye tracking, Advances in Neural Information
Processing Systems 37 (2024) 120366–120380.
[13] Y. R. Pei, S. Brüers, S. Crouzet, D. McLelland, O. Coenen, A lightweight spatiotemporal network for
online eye tracking with event camera, in: Proceedings of the IEEE/CVF Conference on Computer</p>
      <sec id="sec-9-1">
        <title>Vision and Pattern Recognition, 2024, pp. 5780–5788.</title>
        <p>[14] J. W. Grootjen, H. Weingärtner, S. Mayer, Highlighting the challenges of blinks in eye tracking
for interactive systems, in: Proceedings of the 2023 Symposium on Eye Tracking Research and
Applications, 2023, pp. 1–7.
[15] P. D. Allopenna, J. S. Magnuson, M. K. Tanenhaus, Tracking the time course of spoken word
recognition using eye movements: Evidence for continuous mapping models, Journal of memory
and language 38 (1998) 419–439.
[16] C. Guo, J. Liang, G. Zhan, Z. Liu, M. Pietikäinen, L. Liu, Extended local binary patterns for eficient
and robust spontaneous facial micro-expression recognition, IEEE Access 7 (2019) 174517–174530.
[17] M. Verburg, V. Menkovski, Micro-expression detection in long videos using optical flow and
recurrent neural networks, in: 2019 14th IEEE International conference on automatic face &amp;
gesture recognition (FG 2019), IEEE, 2019, pp. 1–6.
[18] S.-T. Liong, J. See, R. C.-W. Phan, A. C. Le Ngo, Y.-H. Oh, K. Wong, Subtle expression recognition
using optical strain weighted features, in: Computer Vision-ACCV 2014 Workshops: Singapore,
Singapore, November 1-2, 2014, Revised Selected Papers, Part II 12, Springer, 2015, pp. 644–657.
[19] Z. Xia, X. Hong, X. Gao, X. Feng, G. Zhao, Spatiotemporal recurrent convolutional networks for
recognizing spontaneous micro-expressions, IEEE Transactions on Multimedia 22 (2019) 626–640.
[20] M. Bai, R. Goecke, Investigating lstm for micro-expression recognition, in: Companion Publication
of the 2020 International Conference on Multimodal Interaction, 2020, pp. 7–11.
[21] Y. Wang, S. Zheng, X. Sun, D. Guo, J. Lang, Micro-expression recognition with attention mechanism
and region enhancement, Multimedia Systems 29 (2023) 3095–3103.
[22] W.-J. Yan, Q. Wu, Y.-J. Liu, S.-J. Wang, X. Fu, Casme database: A dataset of spontaneous
microexpressions collected from neutralized faces, in: 2013 10th IEEE international conference and
workshops on automatic face and gesture recognition (FG), IEEE, 2013, pp. 1–7.
[23] X. Li, T. Pfister, X. Huang, G. Zhao, M. Pietikäinen, A spontaneous micro-expression database:
Inducement, collection and baseline, in: 2013 10th IEEE International Conference and Workshops
on Automatic face and gesture recognition (fg), IEEE, 2013, pp. 1–6.
[24] A. K. Davison, C. Lansley, N. Costen, K. Tan, M. H. Yap, Samm: A spontaneous micro-facial
movement dataset, IEEE transactions on afective computing 9 (2016) 116–129.
[25] E. H. Hess, J. M. Polt, Pupil size as related to interest value of visual stimuli, Science 132 (1960)
349–350.
[26] T. Partala, V. Surakka, Pupil size variation as an indication of afective processing, International
journal of human-computer studies 59 (2003) 185–198.
[27] G. Gallego, T. Delbrück, G. Orchard, C. Bartolozzi, B. Taba, A. Censi, S. Leutenegger, A. J. Davison,
J. Conradt, K. Daniilidis, et al., Event-based vision: A survey, IEEE transactions on pattern analysis
and machine intelligence 44 (2020) 154–180.
[28] A. N. Angelopoulos, J. N. Martel, A. P. Kohli, J. Conradt, G. Wetzstein, Event-based near-eye gaze
tracking beyond 10,000 Hz, IEEE Transactions on Visualization and Computer Graphics 27 (2021)
2577–2586.
[29] A. Sen, N. S. Bandara, I. Gokarn, T. Kandappu, A. Misra, EyeTrAES: fine-grained, low-latency eye
tracking via adaptive event slicing, Proceedings of the ACM on Interactive, Mobile, Wearable and</p>
      </sec>
      <sec id="sec-9-2">
        <title>Ubiquitous Technologies 8 (2024) 1–32.</title>
        <p>[30] Z. Wang, C. Gao, Z. Wu, M. V. Conde, R. Timofte, S.-C. Liu, Q. Chen, Z.-J. Zha, W. Zhai, H. Han,
et al., Event-based eye tracking. AIS 2024 challenge survey, in: Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition, 2024, pp. 5810–5825.
[31] P. Majaranta, A. Bulling, Eye tracking and eye-based human–computer interaction, in: Advances
in physiological computing, Springer, 2014, pp. 39–65.
[32] S. Balasubramanian, A. Melendez-Calderon, A. Roby-Brami, E. Burdet, On the analysis of movement
smoothness, Journal of neuroengineering and rehabilitation 12 (2015) 1–11.
[33] D. P. Kingma, M. Welling, Auto-encoding variational bayes, 2022. URL: https://arxiv.org/abs/1312.</p>
        <p>6114. arXiv:1312.6114.
[34] Z. Wang, Z. Wan, H. Han, B. Liao, Y. Wu, W. Zhai, Y. Cao, Z.-J. Zha, Mambapupil: Bidirectional
selective recurrent model for event-based eye tracking, in: Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition, 2024, pp. 5762–5770.
[35] B. Zhang, Y. Gao, J. Li, H. K.-H. So, Co-designing a sub-millisecond latency event-based eye
tracking system with submanifold sparse cnn, in: Proceedings of the IEEE/CVF Conference on</p>
      </sec>
      <sec id="sec-9-3">
        <title>Computer Vision and Pattern Recognition, 2024, pp. 5771–5779.</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Ekman</surname>
          </string-name>
          , Emotions revealed,
          <source>Bmj</source>
          <volume>328</volume>
          (
          <year>2004</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>W.-J.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.-J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.-H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <article-title>Casme ii: An improved spontaneous micro-expression database and the baseline evaluation</article-title>
          ,
          <source>PloS one 9</source>
          (
          <year>2014</year>
          )
          <article-title>e86041</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>Ekman</surname>
          </string-name>
          ,
          <article-title>Telling lies: Clues to deceit in the marketplace, politics, and marriage (revised edition)</article-title>
          ,
          <source>WW Norton &amp; Company</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>K.</given-names>
            <surname>Wezowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. S.</given-names>
            <surname>Penton-Voak</surname>
          </string-name>
          ,
          <article-title>An open label pilot study of micro expression recognition training as an intervention for low mood</article-title>
          ,
          <source>Scientific Reports</source>
          <volume>15</volume>
          (
          <year>2025</year>
          )
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Bi</surname>
          </string-name>
          , T. Chen,
          <article-title>Micro-attention for micro-expression recognition</article-title>
          ,
          <source>Neurocomputing</source>
          <volume>410</volume>
          (
          <year>2020</year>
          )
          <fpage>354</fpage>
          -
          <lpage>362</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Alipour</surname>
          </string-name>
          , É. Céret,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dupuy-Chessa</surname>
          </string-name>
          ,
          <article-title>A framework for user interface adaptation to emotions and their temporal aspects</article-title>
          ,
          <source>Proceedings of the ACM on Human-Computer Interaction</source>
          <volume>7</volume>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>25</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.</given-names>
            <surname>Bixler</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          <article-title>D'Mello, Automatic gaze-based detection of mind wandering with metacognitive awareness, in: User Modeling, Adaptation</article-title>
          and Personalization: 23rd International Conference,
          <string-name>
            <surname>UMAP</surname>
          </string-name>
          <year>2015</year>
          , Dublin, Ireland, June 29-July 3,
          <year>2015</year>
          . Proceedings 23, Springer,
          <year>2015</year>
          , pp.
          <fpage>31</fpage>
          -
          <lpage>43</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M. K.</given-names>
            <surname>Eckstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Guerra-Carrillo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. T. M.</given-names>
            <surname>Singley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Bunge</surname>
          </string-name>
          ,
          <article-title>Beyond eye gaze: What else can eyetracking reveal about cognition and cognitive development?</article-title>
          ,
          <source>Developmental cognitive neuroscience 25</source>
          (
          <year>2017</year>
          )
          <fpage>69</fpage>
          -
          <lpage>91</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Prasse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Reich</surname>
          </string-name>
          , S. Makowski,
          <string-name>
            <given-names>T.</given-names>
            <surname>Schefer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. A.</given-names>
            <surname>Jäger</surname>
          </string-name>
          ,
          <article-title>Improving cognitive-state analysis from eye gaze with synthetic eye-movement data</article-title>
          ,
          <source>Computers &amp; Graphics</source>
          <volume>119</volume>
          (
          <year>2024</year>
          )
          <fpage>103901</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Perrone</surname>
          </string-name>
          , et al.,
          <source>Event-Based Eye Tracking</source>
          .
          <year>2025</year>
          event
          <article-title>-based vision workshop</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops</source>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>