<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Multi-Domain Calibration Framework for SAR-XAI: A Systematic Approach to Trustworthy Explainable AI with Transparency Enhancements</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Diego Argüello Ron</string-name>
          <email>diego.arguello@i4ri.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christyan Cruz Ulloa</string-name>
          <email>christyan.cruz.ulloa@upm.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Orfeas Menis Mastromichalakis</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
          <xref ref-type="aff" rid="aff5">5</xref>
          <xref ref-type="aff" rid="aff6">6</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kristina Livitckaia</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jaime Del Cerro</string-name>
          <email>j.cerro@upm.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oscar Garcia Perales</string-name>
          <email>oscar.garcia@i4ri.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pawel Andrzej Herman</string-name>
          <email>paherman@kth.se</email>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Centro de Automática y Robótica (UPM-CSIC), Universidad Politécnica de Madrid</institution>
          ,
          <addr-line>Madrid</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Data Analytics for Industries 4.0</institution>
          ,
          <addr-line>Xàtiva</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Division of Computational Science and Technology, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology</institution>
          ,
          <addr-line>Stockholm</addr-line>
          ,
          <country country="SE">Sweden</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Information Technologies Institute, Centre for Research and Technology Hellas</institution>
          ,
          <addr-line>Thessaloniki</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Keywords Trustworthy AI, Human-Centered XAI</institution>
          ,
          <addr-line>Model Calibration, SAR Operations, EU AI Act Compliance, LLM-Inspired Calibration</addr-line>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>Nerion</institution>
          ,
          <addr-line>Chios</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
        <aff id="aff6">
          <label>6</label>
          <institution>School of Electrical and Computer Engineering, National Technical University of Athens</institution>
          ,
          <addr-line>Athens</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Search-and-Rescue robotic operations require trustworthy AI systems where severe overconfidence (D-ECE &gt; 0.9) could compromise life-critical decisions. We present a multi-domain calibration framework demonstrating improvements across synthetic, simulated, and real SAR domains. Our approach uses heuristic-based ground truth generation, enabling calibration assessment without expensive manual annotation. Crucially, we reveal that explainability method choice directly impacts calibration quality-LayerCAM achieves optimal performance with superior sparsity (0.044 ± 0.029) and calibration (D-ECE: 0.136) by creating focused attention maps that enable reliable confidence assessment. Diferent CAM methods produce distinct attention regions, which afect how calibration is computed and validated, making joint optimization essential for safety-critical deployment. The framework provides foundations for EU AI Act Article 13 transparency requirements while acknowledging the need for expanded validation before operational use.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Model overconfidence occurs when neural networks assign high probability scores to incorrect
predictions [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. We distinguish epistemic uncertainty (reducible through additional data) from aleatoric
uncertainty (irreducible data variability). In SAR operations, overconfidence manifests as D-ECE &gt; 0.9,
where confidence severely misaligns with accuracy. For example, predicting “victim detected” with
95% confidence when actual accuracy is 20% causes operators to trust incorrect detections, wasting
resources and endangering lives.
      </p>
      <p>
        Nixon et al. established D-ECE as the standard detection calibration metric [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], with established
thresholds: D-ECE &lt; 0.15 for well-calibrated systems and D-ECE &gt; 0.9 indicating dangerous
overconfidence [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. However, existing calibration research assumes single-domain applications with available
labeled validation data, lacking cross-domain transfer capabilities essential for SAR operations spanning
synthetic, simulated, and real environments.
      </p>
      <p>
        SAR applications face unique explainability challenges where operators must quickly understand
AI recommendations under time pressure [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. CAM techniques provide visual explanations but lack
calibration assessment [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], with LayerCAM showing promise [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Most studies treat explanation and
calibration as separate processes, problematic in SAR where operators must simultaneously interpret
prediction location and confidence level.
      </p>
      <p>This paper introduces four contributions addressing these human-centered trustworthiness
challenges:
1. Heuristic-Based Ground Truth Generation: A filename -based heuristic procedure to supply
calibration labels at scale, removing the need for manual annotation in this study.
2. Multi-Domain Calibration Framework: Novel approach with validation across synthetic
(D_LLM), simulated (D_SIM), and real (D_REAL) domains.
3. Cross-Domain Calibration Analysis: Empirical evidence that calibration improvements are
achievable across diferent data collection methodologies.
4. Transparency-Enhanced Implementation: Technical framework addressing EU AI Act Article
13 transparency requirements.</p>
      <p>Our evaluation demonstrates calibration improvements across domains while acknowledging
expanded validation needs before safety-critical deployment. Critically, explainability and calibration
are interdependent—diferent CAM methods create distinct attention regions that directly influence
calibration computation, making joint optimization essential.</p>
      <p>Throughout this manuscript, domain refers to data-source modality (D_LLM, D_SIM, D_REAL)
unless referring to broader application contexts.</p>
      <p>Table 1 summarizes our contributions and their assessed novelty levels relative to existing work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Technical Background</title>
      <sec id="sec-2-1">
        <title>2.1. Class Activation Mapping (CAM) Methods</title>
        <p>CAM techniques provide visual explanations by highlighting image regions that contribute most to
CNN predictions, essential for understanding AI decisions in safety-critical SAR operations.</p>
        <p>
          For Grad-CAM [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], the importance of feature map  for class  is
with class score  and spatial normalizer . The resulting heatmap is a weighted combination of .
        </p>
        <p>
          LayerCAM [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] extends this by aggregating across layers using positive gradients
  =
        </p>
        <p>1 ∑,︁  ,

, = ReLU
︃(
 )︃</p>
        <p>,

=1

=1
which tends to yield finer localization—useful in SAR scenes with small, critical targets.</p>
        <p>EigenCAM applies Principal Component Analysis (PCA) to activation maps but often produces
fragmented attention patterns that can be dificult for human operators to interpret in time-critical SAR
scenarios.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Calibration Metrics for SAR Applications</title>
        <p>Model calibration measures the alignment between predicted confidence and actual accuracy—a critical
safety requirement in life-critical operations.</p>
        <p>
          Expected Calibration Error (ECE) [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] measures the gap between confidence and accuracy over 
confidence bins  :
        </p>
        <p>ECE =</p>
        <p>∑︁ || ⃒⃒ acc() − conf( )⃒⃒ .</p>
        <p>
          Detection Expected Calibration Error (D-ECE) [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] extends ECE to object detection, incorporating
spatial information and false negatives—particularly critical in SAR where missed detections endanger
lives.
        </p>
        <p>D−ECE =</p>
        <p>∑︁ || ⃒⃒ prec() − conf( )⃒⃒ .</p>
        <p>
          Interpretation Thresholds: D-ECE &lt; 0.15 indicates well-calibrated systems suitable for operational
deployment, while D-ECE &gt; 0.9 represents dangerous overconfidence requiring immediate attention
before safety-critical use [
          <xref ref-type="bibr" rid="ref2 ref5">2, 5</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <sec id="sec-3-1">
        <title>3.1. Heuristic-Based Synthetic Ground Truth Generation</title>
        <p>We introduce a heuristic-based approach to calibration ground truth generation, addressing SAR data
scarcity through filename-based pattern recognition for systematic calibration labels.</p>
        <p>SAR Data Scarcity: Collecting diverse SAR training datasets is problematic due to high costs, safety
risks, environmental variability, and annotation challenges in adverse conditions, hindering model
generalization.</p>
        <p>
          Synthetic Video Solution: Our framework leverages Sora, difusion-based model producing up to
60-second realistic videos with complex multi-entity scenarios and cinematic quality suitable for SAR
(1)
(2)
(3)
(4)
training. It also employs DeepSeek Janus-Pro (7B parameter transformer), for high-quality
frame-byframe generation, ofering flexible integration into custom pipelines. Finally, Gemini Pro + Veo allows
us to generate 8-second clips with synchronized audio, enabling simulation of radio communications,
victim calls, and environmental sounds [
          <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
          ].
        </p>
        <p>Mission-Critical Scenarios: We generate diverse SAR scenarios including rugged terrain traversal,
victim detection (thermal views, burial states), sensor degradation (smoke, dust, weather), and varying
environmental conditions (day/night, indoor/outdoor).</p>
        <p>We implement domain-specific calibration ground truth generation using systematic filename pattern
analysis as detailed in Algorithm 1.
◁ Special images assumed positive
◁ Non-standard files assumed positive
◁ General model</p>
        <p>Rationale for Deterministic Labeling: Traditional calibration approaches require extensive
manually-labeled validation sets, which are prohibitively expensive and time-consuming for SAR
applications. Our heuristic labeling provides a systematic alternative for calibration assessment when
manual annotation is infeasible.</p>
        <p>1. Core Purpose: Generate balanced positive/negative samples for computing calibration metrics
without expensive manual labeling. These labels serve as proxies for actual detection outcomes,
enabling systematic calibration assessment across thousands of images that would otherwise
require expert annotation.
2. Scientific Methodology: The filename patterns in our datasets encode temporal and spatial
information that correlates with real SAR search patterns.
3. Calibration Application: These heuristic labels enable: (a) D-ECE computation by providing
accuracy baselines for confidence comparison, (b) temperature scaling parameter optimization
through gradient-based methods, and (c) cross-domain consistency analysis across D_LLM, D_SIM,
and D_REAL environments.</p>
        <p>Limitation Acknowledgment: This heuristic approach represents a practical compromise between
annotation cost and calibration assessment needs. Future work should validate these findings with
larger expert-annotated datasets and explore LLM-based label generation methods.</p>
        <p>
          Validation: Maritime SAR studies show 218% improvement in mean Average Precision when
synthetic data augments real datasets [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. This O(1) complexity approach enables scalable calibration
assessment; future work should explore LLM-based label generation.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Multi-Domain Calibration Framework</title>
        <p>Our framework addresses SAR multi-domain operations with human operator understanding.</p>
        <p>Domain Architecture and Data Collection:</p>
        <p>Our multi-domain approach reflects the realistic deployment pathway for SAR AI systems, progressing
from synthetic training data through simulation validation to real-world application.
• D_LLM Domain (Synthetic): Synthetic frame sequences (≈ 163 samples per CAM method, ≈
1,141 total). These are generated using state-of-the-art AI systems: Sora for 60-second realistic
video sequences, DeepSeek Janus-Pro (7B parameter transformer) for detailed scene understanding,
and Gemini Pro+Veo for 8-second clips. Generated scenarios include collapsed buildings with
realistic debris patterns, challenging terrain navigation, and atmospheric efects (smoke, dust,
varying weather conditions).
• D_SIM Domain (Simulated): Simulated environments (≈ 35 samples per CAM method, ≈ 245
total) from physics-based simulation environments using Unity3D game engine and NVIDIA
Isaac Sim platform. These platforms provide realistic rubble dynamics, accurate thermal signature
simulation, and particle efects for dust and debris.
• D_REAL Domain (Real-world): Real-world SAR operations (≈ 73 samples per CAM method,
≈ 511 total)</p>
        <p>Uneven sample distribution reflects real-world SAR data scarcity, requiring synthetic augmentation
for safety-critical deployment.</p>
        <p>Human-Centered Calibration Process: For each domain, we implement temperature scaling with
human oversight:</p>
        <p>
          ^ = ( / )
where  is the domain-specific temperature parameter optimized on synthetic ground truth. We
evaluate calibration quality using D-ECE [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], with perfect calibration achieving D-ECE = 0. Following
established benchmarks [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], we target D-ECE &lt; 0.15 for operational deployment, while D-ECE &gt; 0.9
indicates dangerous overconfidence requiring immediate recalibration.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Trustworthy and Explainability Integration</title>
        <p>A critical finding of our research is that explainability and calibration are not independent concerns—the
choice of CAM method fundamentally afects how well-calibrated the resulting confidence estimates
become.</p>
        <p>Mathematical Foundation: Diferent CAM methods alter calibration computation through their
spatial attention distribution patterns. For a given CAM method  producing attention map  , the
calibration-weighted confidence becomes:
^ =
∑︀,  (, ) · (, )</p>
        <p>∑︀,  (, )
∑︀, I[, &gt;  ]
 = ∑︀, I[, &gt; 0]
(5)
(6)
(7)
where (, ) represents the pixel-wise prediction confidence. This means that the spatial distribution
of attention directly influences the final confidence estimate used for calibration assessment.</p>
        <p>Sparsity-Calibration Relationship: We hypothesize the explanation sparsity correlates with
calibration quality:
Lower sparsity indicates concentrated attention, enabling reliable confidence assessment and better
human interpretation.</p>
        <p>Comparative Method Analysis:
• LayerCAM: Typically produces more focused attention; see Table 3.
• GradCAM: Often yields more difuse attention; see Table 3.</p>
        <p>• EigenCAM: Can produce fragmented attention patterns; see Table 3.</p>
        <p>Practical Impact: Diferences in explanation focus can materially afect how closely confidence
aligns with accuracy, underscoring that explanation choice and calibration should be considered jointly
in safety-critical SAR scenarios.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <sec id="sec-4-1">
        <title>4.1. Model-Dominance Discovery</title>
        <p>
          Our empirical evaluation (Table 2) provides initial evidence that calibration improvements can be
achieved across diferent data collection methodologies. Moreover, these consistent improvements
across diferent data collection methodologies suggest that systematic calibration approaches may
be transferable. Our results achieve the established benchmark threshold of D-ECE &lt; 0.15 across all
domains [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Explainable AI Performance Assessment</title>
        <p>LayerCAM emerges as the optimal method for SAR robotics applications, achieving superior
performance in both explanation focus (sparsity: 0.044 ± 0.029) and calibration quality (D-ECE: 0.136),
outperforming gradient-based methods like GradCAM (sparsity: 0.324 ± 0.121) and eigenspace approaches.
This finding suggests that methods with architectural advantages in spatial attention (LayerCAM’s
layer-specific focus) may achieve better calibration across diferent data collection approaches, though
expanded validation is needed.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Visual Validation of Calibration Impact</title>
        <p>The quantitative results presented in Tables 2 and 3 are corroborated by visual evidence demonstrating
that our calibration framework preserves spatial attention quality while dramatically improving
confidence reliability. Figures 1 and 2 illustrate how the 84%+ calibration improvement (D-ECE reduction
from 0.927 to 0.136) maintains operational efectiveness for human-AI collaboration in SAR scenarios.</p>
        <p>Figure 1 demonstrates a critical finding. Calibration enhancement transforms dangerous
overconfidence without degrading the spatial attention patterns essential for human operator decision-making.
The uncalibrated attention map exhibits severe overconfidence (D-ECE: 0.95) that could lead to false
security in life-critical situations, while the calibrated version achieves appropriate confidence levels
(DECE: 0.15) while preserving identical target localization accuracy. This validates our model-dominance
hypothesis—architectural calibration solutions can address overconfidence while maintaining the spatial
intelligence that makes these systems operationally valuable.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Regulatory Compliance Assessment</title>
        <p>Having demonstrated calibration improvements and visual preservation of spatial attention quality,
we now present transparency framework capabilities against regulatory requirements. Our approach
provides foundations for regulatory compliance through systematic transparency measures (Table 4).</p>
        <p>The visual validation evidence (Figures 1 and 2) supports transparency requirements by demonstrating
interpretable calibration processes, while the quantitative metrics provide measurable trustworthiness
characteristics aligned with EU AI Act Article 13 and NIST AI-RMF frameworks.</p>
        <p>Our multi-domain monitoring capabilities (D_LLM, D_REAL, D_SIM) with quantitative calibration
metrics (85% D-ECE improvement) demonstrate substantial progress beyond foundational concepts
toward operational monitoring systems. However, operational deployment requires comprehensive
regulatory assessment, expanded validation datasets, and formal compliance certification.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion and Conclusions</title>
      <p>Our multi-domain calibration framework demonstrates systematic improvements across synthetic,
simulated, and real SAR domains, with LayerCAM achieving optimal performance (sparsity: 0.044±0.029,
D-ECE: 0.136) by transforming dangerous overconfidence (D-ECE &gt; 0.9) into well-calibrated predictions
(D-ECE &lt; 0.15). The heuristic-based approach provides O(1) complexity ground truth generation while
addressing EU AI Act transparency requirements.</p>
      <p>Deployment and Limitations: The framework supports progressive deployment requiring
systematic field validation and regulatory assessment. Current limitations include reliance on heuristic
labeling, which we only qualitatively verified against expert annotations—expanded validation is needed
for deployment. Uneven sample distribution (D_LLM: 1,141; D_REAL: 511; D_SIM: 245) limits statistical
power, necessitating domain-specific validation. The framework shows broader applicability for medical
imaging, autonomous systems, and industrial inspection.</p>
      <p>Impact and Future Work: The 85.3% calibration improvement across domains ofers clear
practitioner guidance: choose LayerCAM for focused, well-calibrated explanations. Priority research areas
include dynamic calibration, multi-modal integration, and comprehensive safety assessment. Our
framework represents a foundational step requiring continued validation before operational deployment.
As AI increasingly supports life-critical decisions, appropriate confidence calibration becomes both a
technical and ethical imperative.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work was supported by the project "Explainable Trustworthy AI for Data Intensive Applications"
(EXTRA - BRAIN), Grant no. 101135809 - HORIZON-CL4-2023-HUMAN-01-CNECT.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used OpenAI GPT-4 and Claude Sonnet 4 for grammar
and spelling check; formatting assistance (LaTeX error correction).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A. B.</given-names>
            <surname>Arrieta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Díaz-Rodríguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Del</given-names>
            <surname>Ser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bennetot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tabik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barbado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>García</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gil-López</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Molina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Benjamins</surname>
          </string-name>
          , et al.,
          <article-title>Explainable artificial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward responsible ai</article-title>
          ,
          <source>Information fusion 58</source>
          (
          <year>2020</year>
          )
          <fpage>82</fpage>
          -
          <lpage>115</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C.</given-names>
            <surname>Guo</surname>
          </string-name>
          , G. Pleiss,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. Q.</given-names>
            <surname>Weinberger</surname>
          </string-name>
          ,
          <article-title>On calibration of modern neural networks</article-title>
          ,
          <source>in: International conference on machine learning, PMLR</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>1321</fpage>
          -
          <lpage>1330</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>European</given-names>
            <surname>Parliament</surname>
          </string-name>
          and Council,
          <source>Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence (Artificial Intelligence Act)</source>
          ,
          <source>Technical Report L 1689</source>
          ,
          <string-name>
            <surname>Oficial</surname>
            <given-names>Journal</given-names>
          </string-name>
          <source>of the European Union</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>National</given-names>
            <surname>Institute</surname>
          </string-name>
          of Standards and Technology,
          <source>AI Risk Management Framework (AI RMF 1.0)</source>
          ,
          <source>Technical Report NIST AI 100-1</source>
          , NIST,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Nixon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. W.</given-names>
            <surname>Dusenberry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , G. Jerfel,
          <string-name>
            <given-names>D.</given-names>
            <surname>Tran</surname>
          </string-name>
          ,
          <article-title>Measuring calibration in deep object detection</article-title>
          ,
          <source>in: CVPR Workshops</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>0</fpage>
          -
          <lpage>0</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P.</given-names>
            <surname>Liu</surname>
          </string-name>
          , et al.,
          <article-title>Aerial person detection for search and rescue: Survey and benchmarks</article-title>
          ,
          <source>Journal of Remote Sensing</source>
          <volume>4</volume>
          (
          <year>2024</year>
          )
          <article-title>0474</article-title>
          . doi:
          <volume>10</volume>
          .34133/remotesensing.0474.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Selvaraju</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cogswell</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Das</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Vedantam</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Parikh</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Batra</surname>
          </string-name>
          , Grad-cam:
          <article-title>Visual explanations from deep networks via gradient-based localization</article-title>
          ,
          <source>in: Proceedings of the IEEE international conference on computer vision</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>618</fpage>
          -
          <lpage>626</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.-T.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.-B. Zhang</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          <string-name>
            <surname>Hou</surname>
          </string-name>
          , M.-M. Cheng, Y. Wei, Layercam:
          <article-title>Exploring hierarchical class activation maps for localization</article-title>
          ,
          <source>in: IEEE Transactions on Image Processing</source>
          , volume
          <volume>30</volume>
          , IEEE,
          <year>2021</year>
          , pp.
          <fpage>5875</fpage>
          -
          <lpage>5888</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>W.</surname>
          </string-name>
          <article-title>on Synthetic Data for Computer Vision, Synthetic data for computer vision: Current state and future directions</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>2543</fpage>
          -
          <lpage>2552</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>NVIDIA</given-names>
            <surname>Corporation</surname>
          </string-name>
          ,
          <article-title>Nvidia omniverse replicator: Synthetic data generation for computer vision</article-title>
          ,
          <source>Technical Report</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>G.</given-names>
            <surname>Machado</surname>
          </string-name>
          , et al.,
          <article-title>On the use of synthetic data for body detection in maritime search and rescue operations</article-title>
          ,
          <source>Engineering Applications of Artificial Intelligence</source>
          <volume>137</volume>
          (
          <year>2024</year>
          )
          <article-title>109138</article-title>
          . doi:
          <volume>10</volume>
          .1016/j. engappai.
          <year>2024</year>
          .
          <volume>109138</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>