<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Pan-Tilt Camera⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Idaku Ishii</string-name>
          <email>iishii@robotics.hiroshima-u.ac.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daisuke Tahara</string-name>
          <email>d.tahara.417@ms.saitama-u.ac.jp</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yuta Abe</string-name>
          <email>y.abe.796@ms.saitama-u.ac.jp</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Taishi Iriyama</string-name>
          <email>iriyama@mail.saitama-u.ac.jp</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Takashi Komuro</string-name>
          <email>komuro@mail.saitama-u.ac.jp</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kohei Shimasaki</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Hiroshima University</institution>
          ,
          <addr-line>Higashihiroshima</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Saitama University</institution>
          ,
          <addr-line>Saitama</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We present a VR remote magnified viewing system that allows users to observe a wide-angle live video from a remote site while simultaneously inspecting a target region in high resolution. Our system realizes ”Optical Foveation,” a concept inspired by the human visual system, which provides high-acuity vision in the fovea (center of gaze) and a wide field of view in the periphery. The system combines a wide-field camera for contextual overview and an ultrafast pan-tilt camera with a galvano mirror for a magnified, gaze-contingent view. This mirror-driven mechanism achieves millisecond-level response, instantaneously aligning the magnified view with the user's head motion and ensuring a seamless transition from context to detail. To mitigate VR sickness, which is often exacerbated by the lag and visual-vestibular mismatch in magnified views, our system displays the telephoto image only on user command. This user-triggered approach minimizes exposure to high-magnification motion and enhances comfort. We describe the system's architecture and report on a prototype implementation, with experimental results confirming its responsive, comfortable, and efective operation.</p>
      </abstract>
      <kwd-group>
        <kwd>Remote magnified viewing system</kwd>
        <kwd>Ultrafast pan-tilt camera</kwd>
        <kwd>VR sickness</kwd>
        <kwd>Optical foveation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        VR remote viewing systems ofer an immersive sense of presence for applications like remote tourism,
technical assistance, and site inspection [
        <xref ref-type="bibr" rid="ref1">1, 10</xref>
        ]. A critical challenge in these applications is the need to
simultaneously perceive a wide contextual overview and inspect fine details of distant objects with
high fidelity.
      </p>
      <p>Conventional approaches to this problem have major drawbacks in VR contexts. Hardware-based
solutions using mechanical Pan-Tilt-Zoom (PTZ) cameras [9, 13] sufer from significant mechanical
latency. Their physical actuation is too slow to follow a user’s rapid head movements, causing a critical
gaze-to-image lag that is a primary contributor to VR sickness, as extensive research confirms [ 12, 8].
On the other hand, software-based alternatives, such as foveated rendering which computationally
prioritizes gaze points [6], or super-resolution techniques [5, 11], introduce their own challenges. These
methods can incur significant computational costs, potentially introducing new sources of latency, and
may compromise information accuracy required for precise tasks due to their estimative nature.</p>
      <p>
        To resolve this trade-of between latency and resolution, we propose a system that implements
”Optical Foveation.” Inspired by the human eye, which combines a high-resolution foveal view with a
wide peripheral view, our system uses two separate cameras. A wide-field camera provides a stable,
lowmagnification peripheral image, while an ultrafast pan-tilt camera provides a high-resolution telephoto
image of the user’s gaze point. The key is a low-inertia galvano mirror that steers the telephoto camera’s
line of sight with millisecond-level response [
        <xref ref-type="bibr" rid="ref2 ref4">4, 2</xref>
        ]. This enables the magnified view to be redirected
virtually instantaneously, minimizing the head-motion-to-display mismatch that is amplified at high
⋆We use the latest ceurart style.
      </p>
      <p>CEUR</p>
      <p>ceur-ws.org
magnifications. Furthermore, we empower the user to toggle the magnified view, limiting the duration
of tight head-image coupling and reducing discomfort.</p>
      <p>Contributions.</p>
      <p>We make the following contributions:
1. A VR remote viewing system implementing Optical Foveation, which combines a wide-field
camera for context and an ultrafast pan-tilt camera for gaze-contingent magnified detail.
2. A low-latency, low-inertia control loop that maps HMD rotation to galvano mirror angles,
achieving virtually instantaneous redirection of the magnified view.
3. A user-triggered display policy that mitigates VR sickness by minimizing exposure time to
high-magnification imagery, thereby reducing visual-vestibular conflict.
4. A prototype implementation and a qualitative evaluation demonstrating the system’s
responsiveness and usability for comfortable remote inspection tasks.</p>
    </sec>
    <sec id="sec-2">
      <title>2. System architecture and implementation</title>
      <p>This section presents the overall design, camera module, optical parameters, and the control/display
pipeline. Figure 1 provides a high-level overview: a wide-field camera supplies the wide-angle image; an
ultrafast pan-tilt camera with a galvano mirror supplies the magnified image aligned to the user’s current
gaze direction. The user wears an HMD; head rotation from the HMD sensors is mapped to galvano
mirror angles and sent via UDP to control the magnified view. The user presses a controller button to
display or hide the magnified view. This policy preserves situational awareness in the wide-angle view
and shortens the time under high magnification.</p>
      <sec id="sec-2-1">
        <title>2.1. Camera module</title>
        <p>Figure 2 shows the module, consisting of a 30 fps wide-field camera for target detection and a 120 fps
ultrafast, mirror-driven pan-tilt camera for gazing. The galvano mirror enables smooth, low-inertia</p>
        <p>redirection of the magnified camera’s line of sight, avoiding bulk camera motion.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Optics and fields of view</title>
        <p>The wide-field camera uses a 3.5 mm lens (horizontal FOV 70.7°, vertical FOV 56.6°); the ultrafast pan-tilt
camera uses a 25 mm lens (horizontal FOV 8.5°, vertical FOV 11.4°). At a distance of 5 m, a 1440 × 1080
wide-angle frame covers 5.6 × 4.2 m (≈ 3.9 mm/px), while a 480 × 640 magnified frame covers 0.6 × 0.8 m
(≈ 1.25 mm/px). Thus, the magnified view provides about 3.1× finer linear sampling ( ≈ 9.7× per-area
pixel density) than the wide-angle view.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Gaze redirection</title>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Display policy and composition</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experiments and results</title>
      <p>We conducted a qualitative evaluation of the prototype in an indoor environment to assess its
performance and user experience. Figure 5 shows the HMD view and a user operating the system.</p>
      <p>Our observations confirmed that the low-latency link between HMD motion and the ultrafast pan-tilt
camera enabled smooth and immediate gaze shifts. Users could rapidly acquire magnified details at their
point of regard without perceptible delay. The user-triggered display mechanism was reported to be a
key factor for comfort. By giving users explicit control, any residual mismatch between head motion
and the magnified image motion became significantly less distracting. This allowed for comfortable
operation over extended periods, as users were not forced into a continuous, tight coupling with the
high-magnification view. Because the magnified view was always presented at the center of the field of
view, peripheral distortions were not a salient issue. We did observe slight misalignments between the
wide-angle and magnified views when users looked toward the edges of the wide-angle image. While
calibration successfully reduced this efect, perfect alignment remains a challenge due to factors like
lens distortion and mechanical tolerances.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion and future work</title>
      <p>We presented a VR remote magnified viewing system based on the principle of Optical Foveation. By
using an ultrafast pan-tilt camera with a galvano mirror, our system delivers high-resolution details at
the user’s gaze point with minimal latency, while a wide-field camera preserves the broader context.
This approach, combined with a user-triggered display policy, efectively mitigates the discomfort
typically associated with magnified views in VR, enabling rapid viewpoint control and comfortable
remote inspection.</p>
      <p>Future work will focus on two main areas. First, we plan to extend the system to support multiple
concurrent users. The high-speed capability of the pan-tilt camera allows for time-multiplexing several
magnified viewpoints, enabling diferent users to inspect distinct regions of interest within the same
shared scene. A key challenge will be to manage the per-user refresh rate to maintain a comfortable
experience. Second, acknowledging the limitations of our preliminary qualitative study, we will conduct
(a) Presented image (facing left)
(b) Presented image (facing right)
(c) User using the system (facing left)
(d) User using the system (facing right)
more rigorous, quantitative evaluations. This will include measuring end-to-end latency from head
motion to display update and performing controlled user studies. These studies will compare our
system against conventional baselines, such as mechanical PTZ systems and digital zoom, to assess task
performance, usability, and the reduction in VR sickness using standardized metrics like the Simulator
Sickness Questionnaire (SSQ) [7].</p>
    </sec>
    <sec id="sec-5">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author used GPT-5 and gemini in order to: Grammar and
spelling check.After using these tools, the author reviewed and edited the content as needed and takes
full responsibility for the publication’s content.
[5] Farsiu, Sina, Robinson, M. Dirk, Elad, Michael, &amp; Milanfar, Peyman. (2004). Fast and robust
multiframe super resolution. IEEE transactions on image processing, 13(10), 1327-1344.
[6] Guenter, Brian, Finch, Mark, Drucker, Steven, Tan, Desney, &amp; Snyder, John. (2012). Foveated 3D
graphics. ACM Transactions on Graphics (TOG), 31(6), 1-10.
[7] Kennedy, Robert S., Lane, Norman E., Berbaum, Kevin S., &amp; Lilienthal, Michael G. (1993). Simulator
sickness questionnaire: An enhanced method for quantifying simulator sickness. The international
journal of aviation psychology, 3(3), 203-220.
[8] Kim, Juno, Charbel-Salloum, Andrew, Perry, Stuart, &amp; Palmisano, Stephen. (2022). Efects of display
lag on vection and presence in the Oculus Rift HMD. Virtual Reality, 1-12.
[9] Neves, Joao C., Moreno, Juan C., Barra, Silvio, &amp; Proença, Hugo. (2015). Acquiring High-Resolution
Face Images in Outdoor Environments: A Master-Slave Calibration Algorithm. In Proceedings of
2015 IEEE 7th International Conference on Biometrics Theory, Applications and Systems (BTAS) (pp.
1-8).
[10] Ren, Yi, &amp; Fuchs, Henry. (2016). Faster feedback for remote scene viewing with pan-tilt stereo
camera. In Proceedings of 2016 IEEE Virtual Reality (VR) (pp. 273-274). IEEE.
[11] Saharia, Chitwan, Ho, Jonathan, Chan, William, Salimans, Tim, Fleet, David J., &amp; Norouzi,
Mohammad. (2022). Image super-resolution via iterative refinement. IEEE transactions on pattern analysis
and machine intelligence, 45(4), 4713-4726.
[12] Staufert, Jan-Philipp, Niebling, Florian, &amp; Latoschik, Marc Erich. (2020). A survey on latency and
its influence on presence and cybersickness. In 2020 IEEE Conference on Virtual Reality and 3D User
Interfaces Abstracts and Workshops (VRW) (pp. 724-727). IEEE.
[13] Xu, Yiliang, &amp; Song, Dezhen. (2010). Systems and algorithms for autonomous and scalable crowd
surveillance using robotic PTZ cameras assisted by a wide-angle camera. Autonomous Robots, 29(1),
53-66.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Aykut</surname>
          </string-name>
          , Tamay, Lochbrunner, Stefan, Karimi, Mojtaba, Cizmeci, Burak, &amp;
          <string-name>
            <surname>Steinbach</surname>
            ,
            <given-names>Eckehard.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>A stereoscopic vision system with delay compensation for 360 remote reality</article-title>
          .
          <source>In Proceedings of the on Thematic Workshops of ACM Multimedia</source>
          <year>2017</year>
          (pp.
          <fpage>201</fpage>
          -
          <lpage>209</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Buhler</surname>
          </string-name>
          , Helmut, Misztal, Sebastian, &amp;
          <string-name>
            <surname>Schild</surname>
            ,
            <given-names>Jonas.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Reducing vr sickness through peripheral visual efects</article-title>
          .
          <source>In 2018 IEEE Conference on Virtual Reality and 3D User Interfaces (VR)</source>
          (pp.
          <fpage>517</fpage>
          -
          <lpage>519</lpage>
          ). IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Chang</surname>
          </string-name>
          , Huiwen, &amp;
          <string-name>
            <surname>Cohen</surname>
            ,
            <given-names>Michael F.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Panning and zooming high-resolution panoramas in virtual reality devices</article-title>
          .
          <source>In Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology</source>
          (pp.
          <fpage>279</fpage>
          -
          <lpage>288</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Chang</surname>
          </string-name>
          , Eunhee, Kim, Hyun Taek, &amp;
          <string-name>
            <surname>Yoo</surname>
            ,
            <given-names>Byounghyun.</given-names>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>Virtual reality sickness: a review of causes and measurements</article-title>
          .
          <source>International Journal of Human-Computer Interaction</source>
          ,
          <volume>36</volume>
          (
          <issue>17</issue>
          ),
          <fpage>1658</fpage>
          -
          <lpage>1682</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>