<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>X (H. Lee);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>3D Trajectory Reconstruction of Dynamic Objects in Digital Twins from Monocular Video</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Bogwan Kim</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Haeseong Lee</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Myungho Lee</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Pusan National University</institution>
          ,
          <addr-line>2, Busandaehak-ro 63beon-gil, Geumjeong-gu, Busan</addr-line>
          ,
          <country country="KR">South Korea</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>The growing demand for remote monitoring through digital twins highlights the importance of integrating both structural accuracy and dynamic awareness of physical spaces. While 3D reconstruction technologies enable highly precise digital twin environments, they typically remain static, failing to reflect real-time changes. Conversely, CCTV systems provide live monitoring but only as separate 2D video streams, requiring users to mentally map them to the reconstructed 3D environment. To address this gap, we propose a 2D-3D projectionbased pipeline that incorporates dynamic object trajectories from monocular video into a 3D reconstructed digital twin. Our method leverages widely available indoor CCTV feeds, combining them with reconstructed static scenes and camera pose information to back-project object masks and recover placement and orientation. A stabilization filter further ensures robustness against noise and mask deformation. This approach ofers a practical foundation for integrating dynamic objects into digital twins, facilitating more consistent spatial perception and real-time monitoring of remote environments.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Digital Twin</kwd>
        <kwd>Dynamic Object Trajectory Reconstruction</kwd>
        <kwd>Pose Estimation</kwd>
        <kwd>Video Surveillance</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Digital Twin (DT) technology is gaining significant attention as an innovative paradigm that connects
the physical and digital worlds, enabling the continuous reflection of a real environment’s state, behavior,
and changes over time in a virtual environment [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. Unlike traditional modeling approaches, which
were often limited to static representations or simplified simulations, DTs integrate heterogeneous data
sources—such as sensor data, image data, and simulation results—to provide a continuously updated
virtual environment [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. This characteristic is particularly crucial in various application domains such
as smart manufacturing, healthcare, urban infrastructure management, and autonomous driving, where
the demand for real-time monitoring, predictive analytics, and decision support is rapidly increasing
[
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ]. In this context, the usability and reliability of a DT are directly determined by the level of fidelity
with which the virtual model reflects the structural, spatial, and temporal characteristics of the physical
environment [
        <xref ref-type="bibr" rid="ref1 ref6">1, 6</xref>
        ]. Therefore, fidelity has become a core concept in DT research, extending beyond
mere geometric representation or physical model accuracy to a comprehensive discussion that includes
the realism of dynamic interactions and behavioral patterns [
        <xref ref-type="bibr" rid="ref2 ref7">2, 7</xref>
        ].
      </p>
      <p>
        While fidelity can be defined in various ways [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], it essentially refers to how accurately a DT captures
not only the static properties of a real environment but also its dynamic states and transitions over time.
For example, if a DT of a manufacturing site only reproduces the geometric shape of machinery and
fails to reflect dynamic elements such as trajectories, its utility for predictive maintenance is limited [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
Similarly, if a smart city’s DT includes only static infrastructure like buildings and roads but fails to
track the movement of mobile objects such as vehicles and pedestrians, it cannot suficiently contribute
to trafic flow analysis or safety decision support [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. These examples illustrate that the value of a DT
lies not merely in creating a visually precise digital replica, but in ensuring a functionally equivalent
level to reality by securing spatiotemporal consistency between the physical and virtual environments
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. However, achieving such high fidelity entails several challenges. While low-fidelity models can
reduce computational resource consumption, discrepancies with reality may lead to degraded prediction
performance or erroneous judgments. Conversely, high-fidelity DTs require precise 3D reconstruction,
robust pose estimation, and reliable dynamic object tracking, thus demanding massive computational
loads and significant algorithmic complexity [
        <xref ref-type="bibr" rid="ref10">10, 11</xref>
        ]. Therefore, determining how to define and balance
the level of fidelity has emerged as a key challenge in DT research, especially in application contexts that
simultaneously require dynamic object recognition, real-time localization, and temporal consistency
[12].
      </p>
      <p>In this context, research aimed at virtually reproducing real-world scenes has continued steadily. 3D
reconstruction is a prime example. Traditional pipelines for reconstructing 3D scenes from multi-view
cameras have widely used Structure-from-Motion(SfM) [13] to estimate camera poses and sparse points,
followed by Multi-View Stereo (MVS) [14] to produce dense depth and meshes. More recently, rapid
advances in neural reconstruction methods—most notably Neural Radiance Fields (NeRF)—have made it
possible to create and update high-precision 3D models of large-scale scenes [15, 16]. In particular, as
the accuracy of visual localization and the conversion pipelines between mesh-based and
pointcloudbased representations have matured [17, 18], it has become feasible to stably perform reconstruction
and maintenance of a scene’s geometry and material properties at industrial scale. These technical
foundations provide the continuously high-quality updatable spatial models required by DTs.</p>
      <p>At the same time, progress in computer vision—object segmentation [19, 20], Multi-Object
Tracking(MOT) [21], and Human Activity Recognition (HAR) [22]—has made it possible to quantitatively
characterize and measure object states and scene events from images and video streams. In addition, the
advent of vision-language models(VLMs) supports query-centric recognition and relational/descriptive
reasoning even for object and behavior categories that are not predefined, enabling robust
integration of domain-specific knowledge into vision pipelines tailored to each use case [ 23]. These models
move beyond mere visualization of static scenes in Digital Twins to enable tracking, explanation, and
prediction of dynamic states.</p>
      <p>Together, advances in 3D reconstruction and object understanding now make it feasible to operate
CCTV-equipped indoor environments (e.g., manufacturing facilities) as digital twins synchronized
with their physical counterparts. By leveraging a predefined 3D scene model and visual localization,
segmented objects from video streams can be registered into the 3D scene, ensuring spatiotemporal
consistency. In this paper, we focus on synchronizing dynamic objects and propose a method to reconstruct
their motion frame by frame within a virtual environment. We assume that a reconstructed static scene,
3D mesh models of dynamic objects, and accurate camera poses are available—assumptions that align
with the current state of 3D reconstruction, modeling, and localization technologies. From the input
image sequences, we extract object masks and incorporate predefined object and spatial information to
reinforce consistency between physical and virtual spaces, enabling high-fidelity representations of
dynamic objects in DT environments. This approach provides foundational techniques for implementing
dynamic digital twins in domains with frequent motion, such as manufacturing facilities and urban
settings.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Methodology</title>
      <p>This section introduces a pipeline for high-fidelity DT representation of dynamic objects in indoor
scenes recorded by a static camera (e.g., CCTV). To this end, we assume the following are given: (i) a
3D reconstructed mesh of the static scene, (ii) a 3D mesh of the dynamic objects, and (iii) intrinsic and
extrinsic parameters of the camera. In particular, the camera pose estimated within the DT is assumed to
be aligned—via visual localization—with the coordinate frame of the physical camera used for capture.</p>
      <p>Since the dynamic objects are predefined, we prompt SAM2 once at initialization to obtain per-frame
masks. From the pixel distribution in each mask, we compute a principal ray, which is then projected
into the world coordinate system using the camera parameters. The intersection of this ray with the
object’s mid-height plane yields the per-frame position, while the displacement between successive
positions determines the rotation, primarily yaw. To reduce inter-frame rotational instability caused
by mask deformation and noise, we apply a stabilization filter. Finally, if the position and rotation
values were not calculated due to complete mask loss, we interpolate them to maintain consistency. An
overview of the pipeline is shown in Figure 1.
2.1. Problem Statement
We define the proposed algorithm as F, as shown in Eq. 1. Here,  denotes the 3D scene mesh and 
represents the target object model. The camera is defined as  = (, ), where  = [, ] is the 3D
pose of —with  representing translation and  representing rotation—and  denotes the intrinsic
parameters. ℐ denotes the monocular RGB image sequence (i.e., video) captured by , and  refers to
the image at frame .</p>
      <p />
      <p>F(, , , ℐ) =  = {}=1</p>
      <p>The 3D pose of  at time  is represented as {, , , }, where  ∈ (3). The pose of  on
the ground plane of the scene model  at time , computed by F and denoted as , is defined in Eq. 2,
where  denotes the yaw angle.</p>
      <p>= {, 0, ,  }</p>
      <sec id="sec-2-1">
        <title>2.1.1. 3D Mask Projection</title>
        <p>To project  from a 2D image onto the 3D scene, we first generate the target object mask  on the
image  using SAM2.  is represented as an array of 2D pixels  = (, ). For each pixel in the
mask, we define a ray () using , as shown in Eq. 3, where  denotes the depth of ray. Finally, we
compute the ray set r = {1 , 2 , ...} for 3D projection.
(1)
(2)
⎡⎤</p>
        <p>1
 () = − ⊤ +  ⊤− 1 ⎢⎣ ⎥⎦ ,</p>
        <p>However, these rays may be afected by mask noise or camera pose errors. To ensure robustness, we
compute the unit vector dˆ of ray, as defined in Eq.4.</p>
        <p>d = ⊤− 1 ⎢⎥ ,
⎣ ⎦
⎡⎤
1
dˆ =</p>
        <p>d
||d||
¯ () = ∑︀r dˆ  + ,
|r |</p>
        <p>The principal ray ¯ () is then defined as the mean of the unit vectors in r, as given in Eq. 5, where
|r| denotes the size of array.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.1.2. Pose Calculation</title>
        <p>As previously mentioned, the 3D mesh model of the object is predefined. Consequently, we can obtain
the bounding box of the dynamic object and determine its maximum height . We then compute the 3D
coordinates of the intersection point  between the object’s principal ray ¯ and the horizontal plane
at  = /2. The corresponding ray parameter  is determined by solving the ray–plane intersection:
The full 3D position is then calculated as Eq 7:
* = (/2) − ,</p>
        <p>,
 = ¯ (* )
(4)
(5)
(6)
(7)
(8)
(9)</p>
        <p>Finally, by taking only the x and z values from this point , we project it onto the ground plane
( = 0) to place the object.</p>
        <p>The position calculation allows us to determine the object’s placement for each frame. The object’s
direction of rotation is determined from the displacement vector , calculated as the diference between
the current frame’s position, , and the previous frame’s position, − 1.</p>
        <p>=  − − 1 = ⟨ − − 1, 0,  − − 1⟩</p>
        <p>Although the object’s position and rotation can be computed, significant inconsistencies may arise
between consecutive frames if the masks are deformed or noisy. Such abrupt variations reduce fidelity,
as the rotation calculation directly reflects them. To address this, we apply a stabilization filter composed
of three components:
• Motion gating / deadband: Suppresses micro-jitters by treating negligible rotational changes
as zero when motion is minimal.
• Rate limiting: Constrains the maximum rotation angle per frame, ensuring smooth and
consistent turns.
• Exponential moving average (EMA) smoothing: Reduces noise by blending the newly
computed orientation with the previously filtered orientation using spherical interpolation.</p>
        <p>By applying this process to each frame, we obtain the object’s position (, ) and yaw rotation  for
the sequence. However, when the mask is completely missing, position and rotation cannot be computed
for those frames. To maintain temporal consistency during such dropouts, we linearly interpolate
both position and rotation across short gaps of up to  consecutive frames. Let 0 &lt; 1 be the valid
keyframes that bracket a gap of length  = 1 − 0 − 1 ≤  . For any missing frame  ∈ (0, 1), set
 = 1− − 00 and compute using Eqs. 9 and 10.</p>
        <p>= (1 −  )0 +  1
  =  0 +  ( 1 −  0 )
In Eq 10,  (· ) maps angles to (− ,  ] to ensure shortest-arc interpolation. Interpolation of sections
whose length exceeds  may cause problems such as objects penetrating the scene, so they are not
interpolated and are left as post-processing targets.</p>
        <p>The full procedure is summarized in Algorithm 1.</p>
        <p>Algorithm 1 Algorithm F for dynamic object pose estimation.</p>
        <p>Require: image sequence ℐ, 3D scene mesh , object mesh , Camera  = (, )
Ensure: object pose sequence 
1: for  ← 1 to  do
2:  ←  2() //Mask Image From Segment Anything Model 2
3: ¯ () ←   ()
4:  ← ℎ()
5: * ← (/2)− 
 ← ¯ (* )
if  &gt; 1 then
 − − 1
 (, − 1) //Filtering with EMA, rate limit, deadband, motion gate</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Evaluation</title>
      <p>To evaluate the proposed methodology, we use a synthetic scene created in Unity. The object’s position
and rotation information is logged for each frame, and these data serve as the GT. The methodology is
then assessed by comparing and analyzing two sets of data against the GT: the data obtained with the
stabilization filter applied and the data obtained without it.</p>
      <p>In Figure 2, during frames 0–20, the object moves short distances and performs specific actions while
largely stationary. From frames 21–35, it moves backward. Subsequently, the object moves straight,
turns to the right, and then to the left, before ending the sequence. The same positions are obtained
with both the filtered and unfiltered methods; however, the unfiltered method exhibits highly sporadic
rotational directions, whereas the filtered method maintains consistency. More detailed results are
provided in Figure 3. As illustrated in Figure 4, a masking error occurs between frames 30 and 40, leading
to a substantial position error in this interval. In addition, after frame 90, an occlusion is observed,
resulting in tracking failure and a further increase in position error.</p>
      <p>As shown in Figure 3, between frames 0 and 40—where the inter-frame trajectory distance is short
and both in-place rotations and masking errors occur—the unfiltered method exhibits large rotational
lfuctuations, whereas the filtered method maintains narrower fluctuations, demonstrating robustness to
noise. However, compared to the unfiltered method, the filtered method cannot immediately capture
rapid directional changes due to the maximum rotation speed limit observed during the right/left
turning section (frames 70–90).</p>
      <p>Table 1 shows that applying the filter significantly reduces errors compared to the unfiltered method.
In particular, for the maximum angular error (MaxAE), the unfiltered method produced a large error of
approximately 179°, whereas the filtered method reduced this error to about 63 °.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>This study proposes a lightweight pipeline that, after extracting masks using Segment Anything Model 2
(SAM2), performs mask projection, computes positions via the intersection between a principal ray and
a plane, and approximates rotation (yaw) using frame-to-frame motion vectors. In addition, to suppress
noises in the estimated rotation and ensure continuity along the time axis, we introduce a stabilization
scheme that combines Gating, Deadband, Rate Limiting, and an exponential moving average (EMA). By
incorporating this stabilization module, the system is designed to maintain spatiotemporal consistency
even in the presence of noise and occasional errors. This design is practically meaningful in that it
achieves computational eficiency suitable for real-time processing without complex optimization or
large-scale learning.</p>
      <p>Nevertheless, the proposed approach is structurally dependent on segmentation quality. Because
position and rotations are determined from masks produced by SAM2, a basic level of error is inherent,
and large errors may occur when occlusions are present or when SAM2 fails due to its performance
limits. Moreover, since rotation is determined by the motion vectors, it is dificult to correctly reflect
orientation in scenarios dominated by lateral or backward motion, in-place rotation or in-place actions.
Our method also assumes that objects remain in contact with the ground and therefore estimates only
3DoF (planar position and yaw); accordingly, it is not applicable to aerial objects (e.g., drones) or to
objects exhibiting substantial pitch/roll variations. To address these structural issues, future work
should introduce more robust methods for position and rotation estimation and extent the framework
to full 6DoF pose estimation.</p>
      <p>Furthermore, it operates under the assumption that the camera extrinsics in the digital twin coordinate
system are estimated with very high accuracy through visual localization. However, even a small pose
error can bias the principal ray-plane intersection, inducing position and rotation drift. To mitigate
this, pose-stabilization strategies—such as drift compensation using semantic landmarks and sensor
fusion with additional modalities (e.g., IMU)—should be considered. For dynamic object models with
large intra-class shape variation, the fixed height assumption may not be valid if the shape dispersion
is large, a fixed-height assumption may be invalid, potentially distorting position and orientation
estimates. Future work should estimate object height online from frame-by-frame observations to
preserve robustness when object models are inaccurate.</p>
      <p>The proposed method was evaluated only in a synthetic virtual scene using quantitative metrics.
For future work, in-the-wild validation is needed by applying the method to real video within a
digital twin constructed from a 3D reconstruction of the physical environment. It is desirable to
conduct multi-site, multi-scenario experiments spanning diverse indoor locations, camera setups, and
object categories, and to complement them with user studies that qualitatively assess the temporal
consistency of dynamic-object trajectories. The qualitative evaluation can use panel-based Likert-scale
ratings or pairwise comparisons. Raters inspect side-by-side overlays on the source video and top-down
trajectory visualizations, and statistical significance is assessed using appropriate tests. Such a combined
quantitative–qualitative evaluation in real settings would allow a more rigorous demonstration of the
generalizability and robustness of the proposed method.</p>
      <p>In summary, the proposed method presents a concise and portable foundation that goes beyond
the visualization of static structures in Digital Twins and aims for high-fidelity dynamic reproduction
approaching functional equivalence for dynamic objects in scenes. Its significance lies in providing
a balanced trade-of among lightweight implementation, real-time performance, and consistency in
application domains dominated by dynamic factors-such as manufacturing, logistics, and smart cities.
By pursuing the aforementioned extensions, we expect to progressively resolve challenges such as
occlusion and in-place motion, thereby further improving the reliability and applicability of dynamic
Digital Twin implementations.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This work was supported in part by the Institute of Information &amp; Communications Technology Planning
&amp; Evaluation(IITP) grant funded by the Korea government(MSIT) (RS-2024-00344883)</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used GPT-5 in order to: Grammar and spelling
check. The author(s) reviewed and edited the content as needed and take(s) full responsibility for the
publication’s content.
[11] Y. Dai, Z. Hu, S. Zhang, L. Liu, A survey of detection-based video multi-object tracking, Displays
75 (2022) 102317. doi:10.1016/j.displa.2022.102317.
[12] C. Kober, M. Fette, J. P. Wulfsberg, A method for calculating optimum digital twin fidelity,
Procedia CIRP 120 (2023) 1155–1160. URL: https://www.sciencedirect.com/science/article/pii/
S2212827123008739. doi:https://doi.org/10.1016/j.procir.2023.09.141, 56th CIRP
International Conference on Manufacturing Systems 2023.
[13] J. L. Schönberger, J.-M. Frahm, Structure-from-motion revisited, in: 2016 IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), 2016, pp. 4104–4113. doi:10.1109/CVPR.2016.
445.
[14] Y. Furukawa, J. Ponce, Accurate, dense, and robust multiview stereopsis, IEEE Transactions on</p>
      <p>Pattern Analysis and Machine Intelligence 32 (2010) 1362–1376. doi:10.1109/TPAMI.2009.161.
[15] B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, R. Ng, NeRF: Representing
scenes as neural radiance fields for view synthesis, Communications of the ACM 65 (2022) 99–106.
doi:10.1145/3503250.
[16] B. Kerbl, G. Kopanas, T. Leimkühler, G. Drettakis, 3d gaussian splatting for real-time radiance field
rendering, ACM Transactions on Graphics 42 (2023) 1–14. doi:10.1145/3592433.
[17] C. Chen, B. Wang, C. X. Lu, N. Trigoni, A. Markham, Deep learning for visual localization and
mapping: A survey, IEEE Transactions on Neural Networks and Learning Systems 35 (2024)
17000–17020. doi:10.1109/TNNLS.2023.3309809.
[18] W. Xiao, R. Chierchia, R. S. Cruz, X. Li, D. Ahmedt-Aristizabal, O. Salvado, C. Fookes, L. Lebrat,
Neural radiance fields for the real world: A survey, 2025. URL: https://arxiv.org/abs/2501.13104.
arXiv:2501.13104.
[19] A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg,
W.-Y. Lo, et al., Segment anything, in: Proceedings of the IEEE/CVF international conference on
computer vision, 2023, pp. 4015–4026.
[20] N. Ravi, et al., Sam 2: Segment anything in images and videos, arXiv preprint arXiv:2408.00714
(2024). arXiv:2408.00714.
[21] W. Luo, J. Xing, A. Milan, X. Zhang, W. Liu, T.-K. Kim, Multiple object tracking: A literature
review, Artificial Intelligence 293 (2021) 103448. URL: https://www.sciencedirect.com/science/
article/pii/S0004370220301958. doi:https://doi.org/10.1016/j.artint.2020.103448.
[22] J. Shin, N. Hassan, A. S. M. Miah1, S. Nishimura, A comprehensive methodological survey of human
activity recognition across divers data modalities, 2024. URL: https://arxiv.org/abs/2409.09678.
arXiv:2409.09678.
[23] Z. Li, X. Wu, H. Du, F. Liu, H. Nghiem, G. Shi, A survey of state of the art large vision language
models: Alignment, benchmark, evaluations and challenges, 2025. URL: https://arxiv.org/abs/2501.
02189. arXiv:2501.02189.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>[1] ISO/IEC, Digital twin - concepts and terminology</article-title>
          ,
          <source>International Standard ISO/IEC</source>
          <volume>30173</volume>
          :
          <year>2023</year>
          ,
          <year>2023</year>
          . URL: https://www.iso.org/standard/81442.html, standard.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>[2] National Academies of Sciences, Engineering, and Medicine, The digital twin landscape</article-title>
          ,
          <source>in: Foundational Research Gaps and Future Directions for Digital Twins, National Academies Press (US)</source>
          ,
          <year>2024</year>
          . URL: https://www.ncbi.nlm.nih.gov/books/NBK605499/.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Fuller</surname>
          </string-name>
          , et al.,
          <article-title>Digital twin: Enabling technologies, challenges</article-title>
          and open research,
          <source>IEEE Access 8</source>
          (
          <year>2020</year>
          )
          <fpage>108952</fpage>
          -
          <lpage>108971</lpage>
          . doi:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2020</year>
          .
          <volume>2998358</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Snider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nassehi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hicks</surname>
          </string-name>
          ,
          <article-title>Characterising the digital twin: A systematic literature review</article-title>
          ,
          <source>CIRP Journal of Manufacturing Science and Technology</source>
          <volume>29</volume>
          (
          <year>2020</year>
          )
          <fpage>36</fpage>
          -
          <lpage>52</lpage>
          . URL: https://www.sciencedirect.com/science/article/pii/S1755581720300110. doi:https://doi.org/ 10.1016/j.cirpj.
          <year>2020</year>
          .
          <volume>02</volume>
          .002.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Botín-Sanabria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.-S.</given-names>
            <surname>Mihaita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. E.</given-names>
            <surname>Peimbert-García</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Ramírez-Moreno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. A.</given-names>
            <surname>RamírezMendoza</surname>
          </string-name>
          , J. d. J.
          <string-name>
            <surname>Lozoya-Santos</surname>
          </string-name>
          ,
          <article-title>Digital twin technology challenges and applications: A comprehensive review</article-title>
          ,
          <source>Remote Sensing</source>
          <volume>14</volume>
          (
          <year>2022</year>
          ). URL: https://www.mdpi.com/2072-4292/14/6/1335. doi:
          <volume>10</volume>
          .3390/rs14061335.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P.</given-names>
            <surname>Muñoz</surname>
          </string-name>
          ,
          <article-title>Measuring the fidelity of digital twin systems</article-title>
          ,
          <source>in: Proceedings of the 25th International Conference on Model Driven Engineering Languages and Systems: Companion Proceedings (MODELS '22)</source>
          , Association for Computing Machinery, New York, NY, USA,
          <year>2022</year>
          , pp.
          <fpage>182</fpage>
          -
          <lpage>188</lpage>
          . doi:
          <volume>10</volume>
          .1145/3550356.3558516.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Digital</given-names>
            <surname>Twin</surname>
          </string-name>
          <string-name>
            <surname>Consortium</surname>
          </string-name>
          ,
          <article-title>Digital twin consortium defines digital twin</article-title>
          ,
          <year>2020</year>
          . URL: https://www. digitaltwinconsortium.org/
          <year>2020</year>
          /12/digital-twin
          <article-title>-consortium-defines-digital-twin/</article-title>
          , accessed:
          <fpage>2025</fpage>
          - 08-21.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>F.</given-names>
            <surname>Tao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , Y. Liu,
          <string-name>
            <given-names>A. Y. C.</given-names>
            <surname>Nee</surname>
          </string-name>
          ,
          <article-title>Digital twin driven prognostics and health management for complex equipment</article-title>
          ,
          <source>CIRP Annals 67</source>
          (
          <year>2018</year>
          )
          <fpage>169</fpage>
          -
          <lpage>172</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.cirp.
          <year>2018</year>
          .
          <volume>04</volume>
          .055.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Irfan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dasgupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rahman</surname>
          </string-name>
          ,
          <article-title>Toward transportation digital twin systems for trafic safety and mobility: A review</article-title>
          ,
          <source>IEEE Internet of Things Journal</source>
          <volume>11</volume>
          (
          <year>2024</year>
          )
          <fpage>24581</fpage>
          -
          <lpage>24603</lpage>
          . doi:
          <volume>10</volume>
          .1109/ JIOT.
          <year>2024</year>
          .
          <volume>3395186</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Picard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chevobbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Darouich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-Y.</given-names>
            <surname>Didier</surname>
          </string-name>
          ,
          <article-title>A survey on real-time 3d scene reconstruction with slam methods in embedded systems</article-title>
          ,
          <source>arXiv preprint arXiv:2309.05349</source>
          (
          <year>2023</year>
          ). arXiv:
          <volume>2309</volume>
          .
          <fpage>05349</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>