<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Battery Eficiency in Vision-Based Indoor Navigation: Energy Considerations in NaVIP⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jun Yu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yitian Yang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mushu Li</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vinod Namboodiri</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science and Engineering, P.C. Rossin College of Engineering and Applied Science, Lehigh University</institution>
          ,
          <addr-line>113 Research Drive, Bethlehem, PA 18015</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Deptartment of Biostatistics and Health Data Science, College of Health, Lehigh University</institution>
          ,
          <addr-line>124 East Morton Street Suite 155, Bethlehem, PA 18015</addr-line>
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Wayfinding in unfamiliar indoor environments is a challenging task for visually impaired persons (VIPs) due to the limitations of mapping and satellite-based positioning. Recent indoor localization solutions have emerged with good results in terms of positioning accuracy. Not much is known, however, about how they compare in terms of energy consumption, an important metric for use on mobile devices. This study presents a systematic comparison of battery consumption across three core localization technologies-computer vision (CV)-based, inertial measurement unit (IMU)-based, and ARKit-based-within the context of our proposed iOS application, NaVIP. Using the commercial app GoodMaps as a comparison, energy usage across navigation, exploration, and idle modes are evaluated under standardized testing conditions. Results show that while CV and ARKit-based methods ofer accurate localization, they incur substantially higher power usage than IMU-only solutions. Notably, NaVIP's server-assisted CV approach consumes significantly less energy on the user's device compared to ARKit's fully on-device computation pipeline, even with live camera feed rendering enabled. These findings highlight the benefits of ofloading compute-heavy tasks to backend servers. Subsequent analysis provides actionable design insights for developing energy-eficient, accessible navigation tools for durable use by mobile end users.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Indoor navigation</kwd>
        <kwd>smartphone application</kwd>
        <kwd>battery life</kwd>
        <kwd>energy eficiency</kwd>
        <kwd>ofloading computation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Vision-based localization has become a prominent approach in recent indoor navigation systems, ofering
improved spatial awareness and robustness through the use of real-time camera input and computer
vision algorithms. However, the increased computational demands of these methods raise important
concerns regarding battery life—especially for applications deployed on mobile devices. This issue is
particularly critical for blind and visually impaired (BVI) users who depend on such applications for
independent mobility and cannot just switch of the application when remaining battery levels are lower.
This population is also less likely to detect low-battery warnings or respond quickly to sudden device
shutdowns [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Energy eficiency is a critical design consideration for mobile applications, particularly
in location-based services, where minimizing battery consumption is essential for maintaining usability,
ensuring user satisfaction, and supporting prolonged usage [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In this context, energy eficiency is not
just a technical metric but a practical requirement for ensuring safety, accessibility, and continuous
operation in real-world navigation scenarios.
      </p>
      <p>
        While existing vision-based solutions address many of the fundamental navigational challenges for
visually impaired persons (VIPs)—including infrastructure independence [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], real-time performance, and
localization accuracy—their energy eficiency remains largely underexplored. In assistive navigation,
battery longevity is not merely a technical concern but a user-centered requirement for trust and
longterm usability. Many vision-based solutions rely on continuous camera input and real-time processing,
both of which can be power-intensive by nature, particularly during extended navigation tasks. This
makes energy eficiency a crucial consideration in the design of scalable and dependable navigation
systems. To address this, we developed NaVIP1 (Navigation for VIPs), a deep learning-based absolute
pose regression (APR) system [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] that directly regresses the user’s camera position and orientation
from a single image captured on their phone. Compared to traditional vision-based approaches such as
retrieval-heavy [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] or 3D-2D feature matching methods [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ], NaVIP ofers a more lightweight and
lfexible alternative, enabled by its end-to-end APR framework. In contrast, ARKit and
structure-frommotion (SfM) methods can achieve high localization accuracy but often impose substantial computational
and memory demands on the client device, making ofloading challenging and accelerating battery drain
during real-world use. Although NaVIP can alleviate this through a streamlined end-to-end architecture,
it remains unclear how its energy consumption compares with ARKit-based or hybrid localization
methods. To fill this gap, we present an empirical study measuring battery usage across multiple
infrastructure-free navigation approaches—including core APR-based solution of NaVIP, IMU-only
solutions, and ARKit-enabled systems—under realistic task-oriented conditions. By analyzing these
trade-ofs, we aim to inform the design of more energy-eficient and reliable navigation tools for VIPs,
where power eficiency is a core component of overall usability.
      </p>
      <p>NaVIP adopts a client-server architecture that ofloads the majority of computation, including camera
pose regression and path planning, to the server, thereby alleviating the energy burden on user devices.
This design significantly reduces on-device processing and leads to lower battery consumption during
use. In our experiments, we systematically evaluate battery usage on user devices and compare NaVIP’s
consumption patterns with those of alternative localization approaches. Notably, our findings show that
APR-based localization in NaVIP conserves battery life more efectively than ARKit-based solutions,
even when live camera feed rendering is enabled on the client side. Further energy savings can be
achieved by disabling on-screen rendering while continuing to transmit video frames to the server—an
approach that maintains localization accuracy while significantly reducing power draw. The underlying
cause of this discrepancy lies in architectural diferences: ARKit performs all vision-based localization
and sensor fusion directly on the device, leading to consistently higher energy usage and thermal load.
In contrast, NaVIP leverages server-side inference with lightweight client-side operations, enabling
a more power-eficient experience that is well-suited for long-duration indoor navigation tasks in
real-world settings.</p>
      <p>In summary, this paper presents the first systematic study of battery consumption in indoor navigation
systems designed for visually impaired users, focusing on a comparison of CV-, IMU-, and ARKit-based
approaches. Our contributions are three-fold: (1) we propose a lightweight, server-assisted APR
framework for vision-based localization that minimizes client-side power consumption; (2) we conduct
a comprehensive, task-oriented evaluation of battery usage across CV-, IMU-, and ARKit-based methods
in real-world scenarios; and (3) we identify key architectural decisions, such as ofloading inference
from the client, that significantly improve energy eficiency without compromising usability or accuracy.
Together, these contributions ofer actionable insights for designing sustainable, inclusive, and
poweraware indoor navigation solutions for the visually impaired.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Indoor navigation systems have become a critical area of research, particularly in eforts to support
blind and visually impaired (BVI) users in independently navigating complex indoor environments
such as transit stations, airports, and university buildings [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], where traditional physical signage is
only accessible for sighted persons. Existing solutions employ a variety of technologies, including
Wi-Fi fingerprinting [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], Bluetooth low energy (BLE) beacons [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], augmented reality (AR)-based
tracking [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], visual feature matching [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], and inertial sensors [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. These systems aim to deliver
real-time guidance, improve spatial awareness, and foster autonomy for VIPs, but they vary in terms of
cost, scalability, and infrastructure dependence. Traditional approaches like WiFi and BLE beacons that
rely on installing external tags or transponders face challenges in scalability and long-term maintenance,
especially due to the need for frequent recalibration and their impact on building aesthetics. As a
result, these infrastructure-heavy solutions have gradually fallen out of favor. Depending on the choice
1More details are provided in our accepted work submitted to ICCV Workshop ACVR 2025 [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
of transponders, their physical deployment, the localization algorithm, and the characteristics of the
indoor space, these systems report diferent levels of accuracy [
        <xref ref-type="bibr" rid="ref14 ref15 ref16">14, 15, 16</xref>
        ].
      </p>
      <p>
        Recent studies have evaluated indoor localization approaches based on criteria such as environmental
robustness and usability. ARKit-based systems have demonstrated strong performance in responsiveness
and spatial awareness by fusing camera and IMU data through visual-inertial odometry (VIO)[
        <xref ref-type="bibr" rid="ref12 ref17">17, 12</xref>
        ], and
applications like GoodMaps have been adopted for commercial use. Vision-based methods leveraging
image retrieval[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] or 3D-2D feature matching [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ] have gained popularity due to their infrastructure
independence and reliable localization in visually complex environments. Overall, both approaches
ofer viable solutions for indoor localization, each with distinct strengths and limitations. Retrieval
methods prioritize simplicity and speed but often struggle with scalability, while 3D-2D techniques
ofer higher accuracy and robustness at the cost of greater computational demands. The choice between
them typically depends on the specific requirements and constraints of the navigation application. To
address these trade-ofs, we developed NaVIP, a novel navigation system that uses an APR approach
to regress the user’s location directly from any single image. However, both ARKit- and vision-based
methods often involve substantial computational overhead and may pose battery challenges for mobile
navigation systems. IMU-only and SLAM-based methods are lightweight and infrastructure-free, but
they tend to sufer from drift over time and dificulty with initial localization. As a result, they are
commonly integrated into hybrid models to correct accumulated errors [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>
        While previous studies have extensively addressed accuracy, robustness, and usability, they rarely
examine energy consumption—an equally critical factor in real-world deployment, especially for
accessibility applications. Battery life is a key determinant of user experience in mobile computing [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ],
and for BVI users, sudden device shutdowns can compromise safety, navigation continuity, and user
trust. Navigation aids that demand continuous camera access, real-time sensor fusion, or AR
rendering can quickly deplete battery resources, making them unsuitable for long-duration tasks. Despite
this, few studies ofer empirical measurements of energy usage across diferent localization strategies,
leaving a major gap in our understanding of their practical viability. Our work addresses this gap by
systematically evaluating the battery consumption of three leading infrastructure-free localization
methods—vision-based (APR), ARKit-based, and IMU-based—within our assistive navigation system,
NaVIP. We integrate these modules into a unified app framework and apply a consistent evaluation
protocol across navigation, exploration, and idle states. Additionally, we use APR as a universal
initializer to support fair comparison across modalities. Our findings aim to inform not only the localization
community but also researchers and developers focused on building power-aware, reliable assistive
technologies for real-world deployment.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Procedures</title>
      <sec id="sec-3-1">
        <title>3.1. Study Objective</title>
        <p>This study aims to evaluate and compare the battery consumption of four diferent indoor navigation
variants for visually impaired users: (1) NaVIP, a vision-based navigation system developed by our
group; (2) an ARKit-based navigation solution leveraging Apple’s augmented reality framework; (3) an
IMU-based system relying on inertial sensors; and (4) GoodMaps, a commercial navigation app. The
primary objective is to assess each method’s energy eficiency using standardized testing conditions
under two under realistic scenarios below.</p>
        <p>• Navigation Mode: Users specify a destination, and the system computes the shortest path,
providing continuous voice prompts and real-time guidance based on position and orientation.
• Exploration Mode: As the default mode, it ofers audio feedback on nearby points of interest
(PoIs), supporting spatial awareness without a set destination.</p>
        <p>Vision-Based (APR) Navigation. APR uses camera input and a deep neural network to directly predict
6-DoF poses from image frames, eliminating the need for traditional visual localization steps such as
feature matching and optimization. Ofloading inference to a backend server enables lightweight
clientside operation, making APR particularly suitable for visually impaired users due to its infrastructure-free,
map-free, and energy-eficient design.</p>
        <p>ARKit-Based Navigation. ARKit fuses visual-inertial odometry, SLAM, and motion sensors to estimate
position entirely on-device. It delivers accurate, low-latency localization but is resource-intensive,
leading to high battery usage and thermal buildup during extended use. While efective for route
guidance, its energy demands limit prolonged operation.</p>
        <p>IMU-Based Navigation. IMU navigation estimates motion via accelerometers and gyroscopes, ofering
fast, energy-eficient tracking without external input. However, it sufers from drift and lacks spatial
awareness. In our setup, it relies on APR to initialize the starting pose before switching to inertial
tracking for short-term navigation.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Test Sites and Participants</title>
        <p>One of the biggest challenges of comparative evaluations across indoor navigation systems is the lack
of access to setup each system at the site. For example, GoodMaps is deployed at a few locations around
the U.S. based on commercial agreements that provided them access to deploy their system. Similarly,
academic research typically has access to only collaborative academic environments such as university
buildings. Thus, it is important to carefully construct testing scenarios that support a fair comparison
across systems deployed at diferent sites. Therefore, all tests were conducted over pre-defined routes
of the same length and requiring similar duration of time to complete regardless of site location.</p>
        <p>For this work, NaVIP—including its isolated Vision-, ARKit-, and IMU-based variants—were tested
in the Health, Science, and Technology (HST) Building at Lehigh University (Bethlehem, PA), while
GoodMaps was tested at the Newark Liberty International Airport (EWR) Terminal B (Newark, NJ). All
tests were conducted by a single facilitator or trained sighted volunteer to ensure consistent handling
and movement across trials. No human subject data was collected, and no participation from visually
impaired users was required during battery profiling.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Device Configuration</title>
        <p>All tests were conducted on the same device model—an iPhone
13 Pro—to eliminate hardware variability. Prior to each test
run, the device settings were standardized in Table 1.</p>
        <p>• The test app was launched, and the navigation/exploration task was initiated.
• The tester followed a predefined route while holding the phone at chest height.
• Upon reaching the destination, the test was concluded, and the following data were recorded:
– Start and end battery percentages;
– Any anomalies (e.g., thermal warnings, app crashes)
In addition to navigation and exploration modes, each app was also tested in an idle state, left on the
home screen, to serve as a baseline for comparison. To ensure consistency and capture measurable
battery consumption, each test session was fixed at 30 minutes in duration.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.5. Module-Level Battery Profiling</title>
        <p>Since the two apps are built on distinct technologies—GoodMaps relies heavily on ARKit for localization,
whereas NaVIP streams video frames to a backend server for camera pose estimation—we aim to identify
which specific components contribute most to battery consumption. To enable a fair comparison, we
also integrated all major localization modules (server-side APR-based, ARKit-based, and IMU-based)
into the NaVIP app2.</p>
        <p>For the test of ARKit module, we designed to use the ARWorldTrackingConfiguration but avoided
leveraging any additional ARKit features beyond basic user tracking. Notably, ARKit-based localization
may utilize CoreMotion for refining camera orientation predictions; therefore, we also report results
for an ARKit + IMU configuration.</p>
        <p>Each module was tested under identical conditions for a fixed duration. During each module test:
• NaVIP was run in exploration mode to avoid confounding from path planning or rerouting.
• The tester followed a predefined route while holding the phone at chest height.
• After 30 minutes, the test was concluded, and the following data were recorded:
– Start and end battery percentages
– Any anomalies (e.g., thermal warnings, app crashes)
To ensure consistency, all tests were conducted within the same app and in the same environment.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Results</title>
      <p>To evaluate the energy eficiency of diferent indoor navigation strategies, we conducted battery profiling
experiments under realistic usage scenarios. We compared the power consumption of ARKit-based
tracking, IMU, and NaVIP’s video streaming architecture during typical user interactions. Experiments
were structured at two levels: system-level tests (covering full app activity across navigation, exploration,
and idle modes) and module-level tests (isolating individual components in controlled settings). All
tests ran for 30 minutes on identical hardware and in the same environment for fair comparison. Results
and analysis from both levels are presented below.</p>
      <sec id="sec-4-1">
        <title>4.1. System-Level Results</title>
        <p>The results presented in Table 2 reveal several important insights into the battery eficiency of diferent
indoor navigation solutions under varying modes of operation—navigation, exploration, and idle.</p>
        <p>Idle Mode. In the idle condition, where each app was left running on its home screen without active
navigation or exploration tasks, both NaVIP and GoodMaps exhibited minimal battery consumption,
each with a 5% drop over 30 minutes. This low energy usage confirms that neither app imposes
significant background overhead when inactive, making it an appropriate baseline for comparison.</p>
        <p>Navigation Mode. Navigation mode, which involves full system functionality, showed the highest
battery drain across all configurations. GoodMaps recorded the most substantial battery drop at 23%,
accompanied by signs of overheating. This can be attributed to its use of ARKit for localization,
combined with intensive 3D rendering, continuous map updates, voice prompts, and haptic feedback.
In contrast, NaVIP’s navigation mode consumed 13%, demonstrating better power eficiency. Although
NaVIP uses continuous upstream video streaming and client-side camera feed rendering, it ofloads
pose computation and path planning to the server, reducing on-device processing demands and helping
avoid thermal issues. The diference in battery drain between these two systems underscores the
energy-saving benefits of NaVIP’s client-server architecture.</p>
        <p>Exploration Mode. In exploration mode, which provides spatial awareness without
destinationbased guidance, both apps again showed diferent power profiles. GoodMaps consumed 20%, while
2Note that NaVIP functions efectively without relying on ARKit and IMU; they are included solely for the purpose of
module-level battery profiling.</p>
        <p>NaVIP used 14%. This reflects a moderate reduction from navigation mode for both systems, likely due to
the absence of continuous path planning and rerouting. However, GoodMaps still showed signs of slight
overheating, suggesting that its ARKit-driven rendering pipeline remains active even without explicit
navigation tasks. NaVIP, while maintaining upstream video streaming and rendering, benefits from
ofloaded computation and showed no thermal concerns. Notably, the comparison between navigation
and exploration modes also suggests that specialized audio prompts and haptic feedback for VIPs do
not introduce additional battery overhead.</p>
        <p>Functionality-Level Impact. A breakdown of each app’s functional components—ARKit-based
localization, video streaming, 3D rendering, voice prompts, and haptics—highlights the dominant role
of visual localization and rendering in battery consumption. GoodMaps performs real-time ARKit
tracking and local 3D scene rendering on-device, placing sustained load on the CPU, GPU, and motion
co-processor. This results in higher power draw and thermal buildup during extended use.In contrast,
NaVIP reduces on-device load by ofloading pose estimation to a remote server. It relies on lightweight
upstream video streaming and omits local 3D visualization, minimizing GPU usage and leveraging
the smartphone’s energy-eficient video pipelines. While both apps include voice and haptic feedback
for accessibility, our tests show these features have minimal battery impact due to their low-power,
intermittent use. Overall, the comparison emphasizes how architectural choices—especially ofloading—
can significantly influence energy eficiency in mobile navigation systems.</p>
        <p>Summary. The results validate that NaVIP achieves a favorable trade-of between usability and
energy consumption. Its architecture enables continuous localization and user guidance while avoiding
the battery drain and overheating risks observed in fully on-device ARKit-based systems. These findings
demonstrate the viability of server-assisted vision-based navigation systems for visually impaired users,
especially in scenarios requiring prolonged usage.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Module-Level Results</title>
        <p>To further isolate the battery impact of
individual localization components, we con- Table 3: Battery Usage Across Diferent Module
Comducted module-level tests by embedding all ponents
major localization technologies—server-side Module Type Test Site Battery Drop Notes
APR, ARKit, IMU, and ARKit+IMU—within Upstream Video Streaming HST 12% Normal
the NaVIP app. This controlled setup elimi- IAMRUKit HHSSTT 156%% NWoarrmmainlg
nated external variables such as app-specific ARKit + IMU HST 17% Warming
UI rendering or back-end integration diferences, ensuring a consistent evaluation environment. Each
module was tested independently under identical conditions for 30 minutes. It is worth noting that all
the methods above rely on the APR model for initial localization and for correcting accumulated errors
when the navigation path deviates from expected use. Since APR is invoked infrequently during typical
operation, its overall energy contribution is minimal and can be reasonably considered negligible. The
results in Table 3 reveal clear diferences in battery usage among modules:</p>
        <p>Upstream Video Streaming. The upstream video streaming module, representing NaVIP’s default
visual input system, consumed 12% of the battery during the 30-minute test session. This moderate
energy usage is expected given the continuous activation of the camera and the real-time transmission
of video frames to the backend server for processing. Notably, the energy cost of this module does
not include any on-device inference, as all computationally intensive tasks such as pose estimation
and scene understanding are performed on the server side. This architectural choice not only reduces
battery consumption but also limits device overheating and preserves system responsiveness. The
thermally stable performance throughout the test suggests that upstream video streaming can be a
viable long-term localization strategy for mobile applications when paired with eficient server-side
infrastructure. Moreover, since upstream video streaming consumes constant bandwidth in most case,
future work might also examine its trade-of with network availability and data plan constraints in
real-world deployment.</p>
        <p>ARKit. When the ARKit module was activated using ARWorldTrackingConfiguration, the device
experienced a 16% battery drop over 30 minutes. This increase, relative to upstream video streaming,
reflects the higher computational demands of ARKit’s visual-inertial odometry (VIO) pipeline. ARKit
continuously fuses camera frames with gyroscope and accelerometer readings to track motion and
orientation, requiring substantial CPU and GPU cycles for real-time feature tracking, depth estimation,
and pose updates. While thermal throttling was not triggered, the device exhibited mild warming,
indicating a sustained processing load. Unlike NaVIP’s ofloaded approach, ARKit performs all
computations locally, which may lead to further performance degradation over extended usage due to thermal
accumulation. This result highlights a trade-of: although ARKit ofers precise short-term tracking and
responsiveness, it incurs a heavier energy burden that may not scale well for long navigation sessions
in resource-constrained scenarios.</p>
        <p>IMU. As expected, the IMU-based module demonstrated the lowest battery consumption at only
5%. This eficiency is a result of the system’s reliance solely on low-power inertial sensors, such as
accelerometers and gyroscopes, which operate independently of camera input and require minimal
computation. These sensors can estimate user motion through dead reckoning, using integrated velocity
and orientation changes over time. However, the inherent drawback is cumulative error—drift increases
with time and distance traveled, leading to degraded positional accuracy unless external correction
(e.g., APR or visual landmarks) is applied. In our testbed, the IMU module still relied on NaVIP’s APR
component to establish the initial pose, after which dead reckoning was used to track motion. While
this approach is ideal for conserving power during short sessions or in environments where camera use
is infeasible, its standalone viability for accurate indoor navigation remains limited without periodic
recalibration.</p>
        <p>ARKit + IMU. Combining ARKit with IMU integration resulted in the highest energy consumption
across all modules, with a 17% battery drop. This configuration leverages Apple’s CoreMotion framework
to further refine ARKit’s pose estimation by fusing high-frequency inertial sensor data with the visual
odometry pipeline. While this improves motion tracking robustness—particularly in fast movements or
low-light conditions where visual cues are less reliable—it significantly increases computational load on
the device. The combined system must continuously synchronize multi-modal sensor streams, maintain
real-time frame processing, and update environmental maps, all of which place sustained demand on
both CPU and GPU. The warming observed during the test, though not critical, suggests a growing
thermal footprint that may afect long-term usability and device longevity. This setup, while ofering
the most accurate short-term localization, may not be practical for energy-constrained accessibility
applications without adaptive switching or load balancing strategies.</p>
        <p>Summary. These module-level insights confirm that ARKit-based localization is inherently more
power-intensive than upstream video streaming or IMU-only alternatives, particularly when paired with
additional sensor fusion. NaVIP’s architectural choice to isolate visual input and delegate processing to
the server demonstrates a strategic advantage in minimizing client-side battery drain. Future designs
may consider hybrid approaches that dynamically activate or deactivate modules based on usage context
to optimize power eficiency without compromising localization performance.</p>
        <p>System vs. Module Observations. Comparing module-level results to system-level evaluations
(Table 2), we observe that GoodMaps’ 23% battery drop in navigation mode is consistent with the
behavior of the ARKit + rendering pipeline when deployed in a full-featured app. NaVIP’s 13–14%
consumption across navigation and exploration modes aligns closely with the 12% attributed to upstream
video streaming, confirming that its energy eficiency stems from ofloading compute-intensive tasks to
the server.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion and Limitations</title>
      <sec id="sec-5-1">
        <title>5.1. Architectural Trade-ofs and Energy Implications</title>
        <p>Our findings underscore key architectural trade-ofs in mobile indoor navigation, particularly around
computation placement. Ofloading localization to a remote server, as in NaVIP, preserves accuracy
while significantly reducing battery usage. This client-server split avoids on-device pose regression and
path planning, supporting longer usage—especially important in accessibility contexts. This design
reflects a broader shift toward lightweight clients that stream sensor data to cloud-based AI models
for inference. In contrast, ARKit-based solutions ofer decent localization with low latency and no
infrastructure needs, but perform all computation—vision tracking, sensor fusion, and 3D
mapping—ondevice. This leads to faster battery drain and occasional thermal issues during prolonged use, limiting
their viability for continuous assistive navigation. We also observe that pairing ARKit with IMU sensors
further increases energy consumption due to real-time sensor fusion. In comparison, NaVIP’s video
streaming frontend remains relatively eficient—even with live rendering—making it well-suited for
vision-based, client-light localization. These results highlight that energy eficiency depends not only
on the localization algorithm but also on system architecture. As mobile apps adopt larger models and
complex pipelines, ofloading will likely be key to ensuring performance and usability.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Limitations and Future Work</title>
        <p>
          This study has several limitations. First, evaluations were conducted at diferent sites: NaVIP in the HST
building and GoodMaps at Newark Airport. Although route length, hardware, and usage time were
controlled, environmental diferences may introduce noise. These variations don’t inherently favor one
system but stress the need for standardized benchmarking environments. Second, we relied on app-level
energy profiling, which, while informative, lacks the precision of system-level instrumentation. Deeper
profiling wasn’t feasible due to proprietary restrictions in commercial apps like GoodMaps. Third,
we measured battery usage over fixed durations rather than per unit of localization accuracy. This
was due to the absence of ground-truth pose data in real-world settings. We assume both systems
reliably navigate users from start to destination. For detailed accuracy benchmarking, we refer to
our companion work [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] using pseudo ground truth in curated datasets. Lastly, there’s a need for
long-term, in-situ studies involving diverse users. These would capture real-world variation in usage
patterns and device behavior. However, such eforts are limited by the site-specific nature of most
systems and lack of access to multiple commercial platforms. Future work should pursue cross-app
collaborations to enable large-scale evaluations of energy, accuracy, usability, and user satisfaction in
realistic deployments.
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>This paper presents the first empirical analysis of battery consumption across key indoor localization
technologies—CV-based, IMU-based, and ARKit-based—evaluated within a unified testbed using NaVIP
and GoodMaps. By conducting both system-level and module-level profiling under standardized
conditions, we demonstrate that localization accuracy alone is not a suficient metric for real-world
deployment in accessibility contexts; energy eficiency must also be prioritized. Our results show that
ARKit, while providing reliable on-device localization, imposes a substantial energy cost that may hinder
its practicality for long-term use. In contrast, the proposed NaVIP’s server-assisted architecture enables
energy-eficient navigation without sacrificing localization fidelity. The ability to maintain functionality
while reducing device-side computation aligns with broader trends in edge-cloud computing and the
integration of foundation models in mobile applications. These findings point toward a design path
for future accessible navigation tools: leveraging lightweight client interfaces paired with backend
intelligence to deliver high performance with low energy consumption. Such architectures are
wellsuited to support visually impaired users who rely on continuous, dependable assistance throughout
their indoor journeys.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work was partially supported by the U.S. National Science Foundation through award #2345057.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used ChatGPT-4 in order to: Grammar and spelling
check. After using these tool(s)/service(s), the author(s) reviewed and edited the content as needed and
take(s) full responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>W. C.</given-names>
            <surname>Simões</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. S.</given-names>
            <surname>Machado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Sales</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. de Lucena</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Jazdi</surname>
          </string-name>
          ,
          <string-name>
            <surname>V. F. de Lucena Jr</surname>
          </string-name>
          ,
          <article-title>A review of technologies and techniques for indoor navigation systems for the visually impaired</article-title>
          ,
          <source>Sensors</source>
          <volume>20</volume>
          (
          <year>2020</year>
          )
          <fpage>3935</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K.</given-names>
            <surname>Hasan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tom</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. R.</given-names>
            <surname>Yuce</surname>
          </string-name>
          ,
          <article-title>Navigating battery choices in iot: An extensive survey of technologies and their applications</article-title>
          ,
          <source>Batteries</source>
          <volume>9</volume>
          (
          <year>2023</year>
          )
          <fpage>580</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Winter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tomko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Vasardani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.-F.</given-names>
            <surname>Richter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Khoshelham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kalantari</surname>
          </string-name>
          ,
          <article-title>Infrastructureindependent indoor localization and navigation</article-title>
          ,
          <source>ACM CSUR 52</source>
          (
          <year>2019</year>
          )
          <fpage>1</fpage>
          -
          <lpage>24</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Namboodiri</surname>
          </string-name>
          ,
          <article-title>Navip: A low-cost, infrastructure-free indoor navigation solution for visually impaired persons</article-title>
          , in: ICCV Workshop on ACVR, Honolulu, USA,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kendall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Grimes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cipolla</surname>
          </string-name>
          ,
          <article-title>Posenet: A convolutional network for real-time 6-dof camera relocalization</article-title>
          ,
          <source>in: Proceedings of the IEEE ICCV</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>2938</fpage>
          -
          <lpage>2946</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Beheshti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. E.</given-names>
            <surname>Hudson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Vedanthan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Riewpaiboon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mongkolwat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-R.</given-names>
            <surname>Rizzo</surname>
          </string-name>
          ,
          <string-name>
            <surname>Unav:</surname>
          </string-name>
          <article-title>An infrastructure-independent vision-based navigation system for people with blindness and low vision</article-title>
          ,
          <source>Sensors</source>
          <volume>22</volume>
          (
          <year>2022</year>
          )
          <fpage>8894</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Puttonen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hyyppä</surname>
          </string-name>
          ,
          <article-title>An up-view visual-based indoor positioning method via deep learning</article-title>
          ,
          <source>Remote Sensing</source>
          <volume>16</volume>
          (
          <year>2024</year>
          )
          <fpage>1024</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Mahmoud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Adham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Automated scan-to-bim: A deep learningbased framework for indoor environments with complex furniture elements</article-title>
          ,
          <source>Journal of Building Engineering</source>
          <volume>106</volume>
          (
          <year>2025</year>
          )
          <fpage>112596</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M. H.</given-names>
            <surname>Abidi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Siddiquee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Alkhalefah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Srivastava</surname>
          </string-name>
          ,
          <article-title>A comprehensive review of navigation systems for visually impaired individuals</article-title>
          ,
          <source>Heliyon</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zeng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Cumanan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Waraiet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <article-title>Lcvae-cnn: Indoor wi-fi ifngerprinting cnn positioning method based on lcvae</article-title>
          ,
          <source>IEEE Internet of Things Journal</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Cheraghi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Almadan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Namboodiri</surname>
          </string-name>
          ,
          <article-title>Cityguide:a seamless indoor-outdoor wayfnding system for people with vision impairments</article-title>
          ,
          <source>in: In Proceedings of the 21st International ACM ASSETS</source>
          ,
          <string-name>
            <surname>Poster</surname>
            <given-names>Session</given-names>
          </string-name>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>R.</given-names>
            <surname>Cervenak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Masek</surname>
          </string-name>
          ,
          <article-title>Arkit as indoor positioning system</article-title>
          ,
          <source>in: 2019 11th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT)</source>
          , IEEE,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>C. H.</given-names>
            <surname>Tsai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Elyasi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Manduchi</surname>
          </string-name>
          ,
          <article-title>All the way there and back: Inertial-based, phonein-pocket indoor wayfinding and backtracking apps for blind travelers</article-title>
          ,
          <source>ACM transactions on accessible computing 17</source>
          (
          <year>2024</year>
          )
          <fpage>1</fpage>
          -
          <lpage>35</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>V.</given-names>
            <surname>Nair</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Tsangouri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Xiao</surname>
          </string-name>
          , G. Olmschenk,
          <string-name>
            <given-names>W. H.</given-names>
            <surname>Seiple</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <article-title>A hybrid indoor positioning system for blind and visually impaired using bluetooth and google tango</article-title>
          ,
          <source>Journal on Technology and Persons with Disabilities</source>
          <volume>6</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>V.</given-names>
            <surname>Namboodiri</surname>
          </string-name>
          ,
          <article-title>Towards accessible and inclusive navigation and wayfinding</article-title>
          ,
          <source>in: Workshop on Hacking Blind Navigation at The ACM CHI Conference on HFCS</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Cheraghi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Namboodiri</surname>
          </string-name>
          , G. Arsal,
          <article-title>Cityguide: A seamless indoor-outdoor wayfinding system for people with vision impairments</article-title>
          , in: Conference on Pervasive Computing and
          <article-title>Communications Workshops and other Afiliated Events (PerCom Workshops)</article-title>
          , IEEE,
          <year>2021</year>
          , pp.
          <fpage>105</fpage>
          -
          <lpage>110</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>D.</given-names>
            <surname>Sato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Oh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Naito</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Takagi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kitani</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Asakawa, Navcog3: An evaluation of a smartphonebased blind indoor navigation assistant with semantic features in a large-scale environment</article-title>
          ,
          <source>in: Proceedings of the 19th International ACM ASSETS, ASSETS '17</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>270</fpage>
          -
          <lpage>279</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>J.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Namboodiri</surname>
          </string-name>
          ,
          <article-title>Large-scale benchmarking of vision-based indoor localization using absolute pose regression</article-title>
          ,
          <source>in: 2025 15th IPIN</source>
          , IEEE,
          <year>2025</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>