<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>AR Memory Viewer: Recreating Memorable Scenes through AR Superimposition of Past Photos</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Shuya Tonooka</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Taishi Iriyama</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Takashi Komuro</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Saitama University</institution>
          ,
          <addr-line>Saitama</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we propose AR Memory VieWer, which recreates personal memory scenes through AR superimposition of past photos. Deep learning-based feature point matching enables the accurate alignment of personal memory photos with the real-world scene for AR superimposition, even when there are significant visual differences between the past photo and the real-world scene. This enables AR Memory VieWer to provide an eEperience in which users can view past scenes such as landscapes in different seasons or moments when a pet was present through their device. We conducted a user study with the prototype system and confirmed its effectiveness as a new method for recalling personal memories.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Mobile augmented reality</kwd>
        <kwd>Reminiscence support</kwd>
        <kwd>Feature point matching</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Memories are the recollections of events and emotions we have eEperienced in the past,
enriching our lives. Past photos play an important role in evoking or recalling old memories.
While the spread of mobile devices has increased opportunities to take photos, it has been
pointed out that many of these photos remain unused, left behind in folders [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Therefore, there
is a growing need for methods that go beyond simply viewing photos, aiming to derive richer
eEperiences and greater value from them.
      </p>
      <p>
        Some studies have effectively utilized past photos taken at the same location as the user's
current position to recreate past scenes through AR superimposition [
        <xref ref-type="bibr" rid="ref2 ref3 ref4">2, 3, 4</xref>
        ]. In this field of
research, accurately aligning past photos with the real-world scene remains a major challenge,
and various approaches have been proposed to address this challenge.
      </p>
      <p>
        One approach is to utilize GPS to enable AR superimposition of past photos corresponding to
the user's location information [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. However, this approach that utilizes GPS can only handle past
photos with geotags, and GPS often suffers from decreased accuracy in indoor environments.
Another approach is to perform feature point matching based on salient visual features at the
location, enabling accurate alignment within the area where such features are visible [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
However, conventional feature point matching methods may fail when salient visual features at
the location become unrecognizable due to changes in weather or lighting conditions. To address
this issue, a method has been proposed in which multiple reference images capturing the same
location under different weather conditions are prepared in advance, allowing robust detection
of visual features under varying environmental conditions [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>In this paper, we propose AR Memory VieWer, which utilizes deep learning-based feature point
matching to accurately align and present past photos as AR superimpositions, even when there
are significant visual differences between the past photo and the real-world scene. As shown in
Figure 1, AR Memory VieWer enables a new memory-recalling eEperience by allowing users to
view personal past scenes through their device.</p>
      <p>Using AR Memory Viewer</p>
      <p>Camera image</p>
      <p>Selected past photo</p>
      <p>User’s perspective</p>
    </sec>
    <sec id="sec-2">
      <title>2. AR Memory Viewer</title>
      <sec id="sec-2-1">
        <title>2.1. Core processing</title>
        <p>An overview of the core processing of AR Memory VieWer is shown in Figure 2. Feature point
matching is performed between the camera image and all past photos in the folder, and the past
photo with the highest number of matched feature points is selected. A projective transformation
is performed using the matched feature points from the selected photo, generating an image in
which the past photo is aligned with the camera image. A transformation is applied to the
generated image to simulate a magic lens from the user's point of view. The result is then
displayed on the screen of the mobile device.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Implementation details</title>
        <p>
          The prototype system used a Surface Pro 7 as the mobile device, and a PC equipped with an
NVIDIA GeForce GTX 1070 GPU as the server for performing deep learning-based feature point
matching. SuperPoint [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] was used for feature point eEtraction, with a maEimum of 4,096 feature
points per image. SuperGlue [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] was used for matching, configured with the outdoor version of
the model trained on outdoor datasets. The camera image captured by the mobile device is sent
to the server via TCP communication. Feature point matching is performed on the server, and
the aligned image is sent to the mobile device. The image displayed on the mobile device was
transformed under the assumption that the distance from the user to the device is 50 cm, and
the distance from the device to the real-world scene is at infinity. After receiving the image, the
mobile device performs alignment using AKAZE [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], a computationally efficient feature matching
method, enabling accurate AR superimposition even when the camera is moved.
        </p>
        <p>The processing results of the prototype system are shown in Figure 3. We verified the
operation using scenes from different seasons for outdoor locations and scenes with a pet for
indoor locations. Sufficient matching between the selected past photo and the camera image was
achieved, enabling a magic lens with accurate alignment of the past photo to the real-world scene.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. User Study</title>
      <sec id="sec-3-1">
        <title>3.1. Experimental design</title>
        <p>We conducted a comparative eEperiment to investigate the effectiveness of the proposed method.
As a baseline method, we employed a manual selection approach in which participants chose a
past photo similar to the real-world scene from a folder and presented it as an AR
superimposition at the center of the screen. The set of past photos in the folder consisted of 10
images: one photo used for scene reproduction and nine randomly selected photos from Flickr
using the keywords “indoor place” and “outdoor place”.</p>
        <p>
          The eEperiment was conducted in July 2025. The participants were 20 students from our
university (5 females; mean age = 22.9, SD = 1.74). Four past photos taken on our university
campus (two indoor locations taken on June 2025, and two outdoor locations taken on December
2023 and April 2025) were used, and participants eEperienced one of the two methods at each
location. The conditions were counterbalanced so that each was eEperienced an equal number
of times across the four locations. After guiding the participants to each location, we provided
instructions on the appropriate camera angle when using AR Memory VieWer. We measured the
time from launching each application to performing AR superimposition of past photos, as well
as the time from the AR superimposition to eEiting the application. After the eEperience,
participants were asked to complete a questionnaire in which the “game” section of the GEQ [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]
was adapted by replacing “game” with “application,” along with three custom questions. The GEQ
has been used to evaluate AR eEperiences in the study by Lee et al. [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], and it is considered
effective as an indicator for measuring user eEperience. Memories are deeply rooted in each
user’s internal eEperiences, and it is difficult to directly measure the quality of recreating
memorable scenes using quantitative scales. Therefore, we adopted this evaluation method.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Experimental results</title>
        <p>The time required from launching the application to performing AR superimposition was 18.1
seconds on average (±3.65) for the baseline method and 18.56 seconds on average (±4.31) for
the proposed method, showing no substantial difference between the two methods. Since only
10 past photos were included in the folder in this eEperiment, no difference was observed;
however, the advantage of the proposed method is eEpected to become more evident as the
number of photos increases. Regarding the time from the start of AR superimposition to eEiting
the application, the baseline method required 27.82 seconds on average (±11.81), whereas the
proposed method required 43.13 seconds on average (±19.89), indicating that the proposed
method enabled significantly longer application usage (p &lt; 0.001).</p>
        <p>NeEt, the results of the GEQ questionnaire are shown in Figure 4. A paired t-test revealed
that the proposed method demonstrated superiority in all subscales eEcept for Tension and
Challenge. This indicates that, compared to the baseline method, the proposed method enhances
the quality of the application eEperience.</p>
        <p>Finally, the results of the custom questions are shown in Figure 5. For Q1, ratings of 7 and 6
accounted for 80% of the responses, suggesting that the concept of the proposed method was
sufficiently conveyed. For Q2, all responses were either 7 or 6, indicating that the proposed
method may be effective as a new way of referring to personal memories. For Q3, the ratings
were limited to 7, 6, and 5, implying that the burden of photo selection was reduced and that past
photos were continuously aligned with the real-world scene at the correct position.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion and future work</title>
      <p>In this paper, we proposed AR Memory VieWer, which utilizes deep learning-based feature point
matching to accurately align personal memory photos with the real-world scene for AR
superimposition. AR Memory VieWer allows users to view past scenes through their device even
when the appearance of past photos differs from the real-world scene, as long as the geometric
structure is similar. The results of the evaluation eEperiment demonstrated that the concept of
the prototype system was sufficiently conveyed and that it is effective as a new way of referring
to personal memories.</p>
      <p>
        In the prototype system, feature point matching with SuperGlue was difficult to perform
solely on a mobile device, so communication with a desktop PC was used. However, LiteGlue [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ],
a lightweight version, has recently been proposed, and it may enable operation solely on a mobile
device.
      </p>
    </sec>
    <sec id="sec-5">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used GPT-5 for grammar and language
refinement. After using this tool, the authors reviewed and edited the content as needed and took
full responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>David</given-names>
            <surname>McGookin</surname>
          </string-name>
          .
          <article-title>Reveal: Investigating Proactive Location-Based Reminiscing with Personal Digital Photo Repositories</article-title>
          .
          <source>In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI '19)</source>
          . pp.
          <fpage>1</fpage>
          -
          <lpage>14</lpage>
          . https://doi.org/10.1145/3290605.3300665
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Tommy</given-names>
            <surname>Hasselman</surname>
          </string-name>
          , Wei Hong Lo, Tobias Langlotz, and Stefanie Zollmann.
          <article-title>ARephotography: Revisiting Historical Photographs using Augmented Reality</article-title>
          .
          <source>In EEtended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems (CHI EA '23)</source>
          . pp.
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          . https://doi.org/10.1145/3544549.3585646
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Gun</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Lee</surname>
            , Andreas Dünser, Seungwon Kim, and
            <given-names>Mark</given-names>
          </string-name>
          <string-name>
            <surname>Billinghurst</surname>
          </string-name>
          .
          <article-title>CityViewAR: A mobile outdoor AR application for city visualization</article-title>
          .
          <source>In 2012 IEEE International Symposium on MiEed and Augmented</source>
          Reality - Arts, Media, and
          <string-name>
            <surname>Humanities (ISMAR-AMH</surname>
          </string-name>
          '
          <volume>12</volume>
          ).
          <fpage>57</fpage>
          -
          <lpage>64</lpage>
          . https://doi.org/10.1109/ISMAR-AMH.
          <year>2012</year>
          .6483989
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Marco</given-names>
            <surname>Cavallo</surname>
          </string-name>
          , Geoffrey Alan Rhodes, and Angus Graeme Forbes.
          <article-title>Riverwalk: Incorporating Historical Photographs in Public Outdoor Augmented Reality EEperiences</article-title>
          .
          <source>In Adjunct Proceedings of 2016 IEEE International Symposium on Mixed and Augmented Reality (ISMARAdjunct '16)</source>
          . pp.
          <fpage>160</fpage>
          -
          <lpage>165</lpage>
          . https://doi.org/10.1109/ISMAR-Adjunct.
          <year>2016</year>
          .0068
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Silvia</given-names>
            <surname>Blanco-Pons</surname>
          </string-name>
          , Berta Carrión-Ruiz, Michelle Duong, Joshua Chartrand, Stephen Fai, and José Luis Lerma.
          <article-title>Augmented Reality Markerless Multi-Image Outdoor Tracking System for the Historical Buildings on Parliament Hill</article-title>
          .
          <source>Sustainability</source>
          <year>2019</year>
          , Vol.
          <volume>11</volume>
          , No.
          <volume>16</volume>
          ,
          <string-name>
            <surname>Art</surname>
          </string-name>
          . no.
          <issue>4268</issue>
          . https://doi.org/10.3390/su11164268.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Daniel</surname>
            <given-names>DeTone</given-names>
          </string-name>
          , Tomasz Malisiewicz, and
          <string-name>
            <given-names>Andrew</given-names>
            <surname>Rabinovich</surname>
          </string-name>
          .
          <article-title>SuperPoint: Self-Supervised Interest Point Detection and Description</article-title>
          .
          <source>In Proceedings of 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops ( CVPRW '18)</source>
          . pp.
          <fpage>337</fpage>
          -
          <lpage>349</lpage>
          https://doi.org/10.48550/arXiv.1712.07629
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Paul-Edouard</surname>
            <given-names>Sarlin</given-names>
          </string-name>
          , Daniel DeTone, Tomasz Malisiewicz, and
          <string-name>
            <given-names>Andrew</given-names>
            <surname>Rabinovich</surname>
          </string-name>
          .
          <article-title>SuperGlue: Learning Feature Matching With Graph Neural Networks</article-title>
          .
          <source>In Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          .
          <volume>4937</volume>
          -
          <fpage>4946</fpage>
          . https://doi.org/10.1109/CVPR42600.
          <year>2020</year>
          .00499
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Pablo</surname>
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Alcantarilla</surname>
            , Adrien Bartoli, and
            <given-names>Andrew J.</given-names>
          </string-name>
          <string-name>
            <surname>Davison</surname>
          </string-name>
          .
          <article-title>Fast eEplicit diffusion for accelerated features in nonlinear scale spaces</article-title>
          .
          <source>IEE E Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 34</source>
          , 7
          <fpage>1281</fpage>
          -
          <lpage>1298</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Wijnand</surname>
            <given-names>A. IJsselsteijn</given-names>
          </string-name>
          , Yvonne A. W. de Kort, and
          <string-name>
            <given-names>Karolien</given-names>
            <surname>Poels</surname>
          </string-name>
          .
          <article-title>The game eEperience questionnaire</article-title>
          . Technische Universiteit Eindhoven, E indhoven, The Netherlands.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Philipp</given-names>
            <surname>Lindenberger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Paul-Edouard</given-names>
            <surname>Sarlin</surname>
          </string-name>
          , and Marc Pollefeys.
          <article-title>LightGlue: Local feature matching at light speed</article-title>
          .
          <source>In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV '23)</source>
          .
          <fpage>17627</fpage>
          -
          <lpage>17638</lpage>
          . https://doi.org/10.48550/arXiv.2306.13643
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>