<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Co-located with ECAI</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Generalization in Diabetic Retinopathy: A Neuro-Symbolic Learning Approach</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Midhat Urooj</string-name>
          <email>murooj@asu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ayan Banerjee</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Farhat Shaikh</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kuntal Thakur</string-name>
          <email>kthakur9@asu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sandeep Gupta</string-name>
          <email>sandeep.gupta@asu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Similarly</string-name>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Diabetic Retinopathy, Vision Transformers, Out-of-Distribution Robustness</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Impact Lab, Arizona State University</institution>
          ,
          <addr-line>Tempe, AZ</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>25</volume>
      <abstract>
        <p>Domain generalization remains a critical challenge in medical imaging, where models trained on single sources often fail under real-world distribution shifts. We propose KG-DG, a neuro-symbolic framework for diabetic retinopathy (DR) classification that integrates vision transformers with expert-guided symbolic reasoning to enable robust generalization across unseen domains. Our approach leverages clinical lesion ontologies through structured, rule-based features and retinal vessel segmentation, fusing them with deep visual representations via a confidence-weighted integration strategy. The framework addresses both Single-Domain Generalization (SDG) and Multi-Domain Generalization (MDG) by aligning high-level clinical semantics across domains, thereby improving robustness and cross-domain performance.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Diabetic Retinopathy (DR) is a microvascular complication
of Diabetes Mellitus that afects the retinal vasculature,
leading to hemorrhages, microaneurysms, exudates, and
cotton-wool spots which, if left untreated, can culminate
in irreversible vision loss [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        Manual grading of
fundus photographs by expert ophthalmologists remains
the clinical gold standard but is both time-consuming
and subject to inter-observer variability [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
Despite
the success of deep learning models—particularly Vision
Transformers (ViTs)—on single-source DR datasets [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ],
their performance sufers when confronted with domain
shifts caused by variations in imaging devices, resolution
settings, and patient demographics.
      </p>
      <p>
        Although Domain
Generalization (DG) strategies such as Empirical Risk
Minimization under the DomainBed protocol [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] ofer a
baseline for robustness, they often overlook the integration
of structured clinical knowledge and realistic augmentation
techniques that are critical for reliable cross-domain
deployment.
      </p>
      <p>Neuro-symbolic learning, which integrates deep learning
with symbolic reasoning, has gained traction as a promising
strategy to improve domain generalization in medical
imaging.</p>
      <sec id="sec-1-1">
        <title>Deep models extract complex patterns from</title>
        <p>raw data, while symbolic components encode high-level
domain knowledge and constraints, thereby efec tively
guiding model behavior across varying domains.</p>
      </sec>
      <sec id="sec-1-2">
        <title>This hybrid approach can mitigate overfitting to domain-specific artifacts by enforcing consistency with known anatomical or pathological rules. For example, Han et al. introduced a</title>
        <p>nEvelop-O
In
particular,
few</p>
        <p>approaches simultaneously
both
symbolic
knowledge integration
and
address
domain
generalization in a cohesive framework. This gap motivates
our proposed</p>
        <p>method, KG-DG, a knowledge-guided
domain generalization framework that unifies structured
clinical knowledge with deep learning
models in a
scalable
manner.</p>
        <p>KG-DG encodes domain-invariant
biomarkers—such as exudates, hemorrhages, and vascular
abnormalities—directly into the learning pipeline, guiding
classification tasks while enhancing out-of-distribution
(OOD) robustness. Unlike prior approaches, our framework
generalizes across imaging modalities, as demonstrated
by strong performance in both diabetic retinopathy
classification and seizure-onset detection from MRI scans.
By embedding clinical expertise at the architectural level,
KG-DG bridges the gap between symbolic interpretability
and neural robustness, advancing the development of
domain-generalizable medical AI.</p>
        <p>CEUR
Workshop</p>
        <p>ISSN1613-0073</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Vision transformers (ViTs) have revolutionized medical
image analysis, particularly in ophthalmology, ofering a
powerful alternative to traditional convolutional neural
networks. Dosovitskiy et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] established the foundation
by demonstrating ViTs’ state-of-the-art performance on
large-scale image recognition benchmarks, catalyzing
their adoption for diabetic retinopathy (DR) detection.
Subsequent work by Kothari et al. introduced TransDR [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ],
enhancing ViTs with lesion-aware attention mechanisms
that improve lesion localization capabilities, though without
explicitly addressing domain shift robustness challenges.
      </p>
      <p>
        The challenge of domain generalization (DG) has
become increasingly prominent in medical AI deployment.
DomainBed benchmarks [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] have illuminated the
fundamental dificulty of ensuring that deep models trained
on source domains maintain reliable performance on
unseen target domains. Gulrajani and Lopez-Paz’s findings
revealed that even sophisticated methods often fail to
outperform simple empirical risk minimization (ERM)
under rigorous evaluation protocols, emphasizing the
critical importance of careful DG methodology design.
      </p>
      <p>
        The integration of structured clinical knowledge
represents a promising direction for enhancing both
interpretability and performance in medical imaging
systems. GraphDR, developed by Khandelwal et al.
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], exemplifies this approach by leveraging diabetic
retinopathy lesion ontologies within graph convolutional
networks to guide feature learning processes. Similar
knowledge-driven strategies have been successfully applied
to chest X-ray and histopathology classification tasks,
grounding models in established disease relationships.
These developments suggest that embedding lesion
ontologies into transformer self-attention mechanisms
could provide an efective pathway for learning clinically
meaningful and domain-invariant representations.
      </p>
      <p>
        Contrastive learning methodologies have emerged as
another approach for addressing domain generalization
challenges by disentangling domain-specific artifacts
from semantic features. Deep CORAL [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] pioneered
domain alignment through feature covariance alignment in
convolutional neural networks, while contemporary
DG methods employ cross-domain contrastive
losses to promote separation between invariant and
domain-dependent representations. Chen et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
successfully extended these principles to medical imaging,
achieving improved generalization in histopathology
classification under distributional shifts. Nevertheless,
contrastive disentanglement approaches remain relatively
unexplored in diabetic retinopathy classification tasks.
      </p>
      <p>
        Our approach bears conceptual similarity to Concept
Bottleneck Models [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and their quantitative extensions,
such as Concept Embedding Models [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. While these
models learn to predict labels through intermediate
human-defined concepts, our KG-DG framework difers
by explicitly combining clinical rules with learned neural
embeddings via confidence fusion, rather than strictly
bottlenecking predictions through concepts. Compared to
GraphDR [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], which employs lesion ontologies in a Graph
Convolutional Network (GCN), our work emphasizes hybrid
neuro-symbolic fusion at the decision level, making it more
adaptable to diverse backbones and datasets.
      </p>
      <sec id="sec-2-1">
        <title>2.1. Domain Generalization</title>
        <p>
          In many real-world applications, particularly in biomedical
ifelds, it is unrealistic to expect access to new patients’ data
before model deployment due to domain shifts between
data from diferent patients [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. To address this challenge,
the concept of Domain Generalization (DG) was introduced
[
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. DG aims to train models on data from one or more
related but distinct source domains, enabling them to
generalize efectively to unseen, out-of-distribution (OOD)
target domains. Since its formal introduction by Blanchard
et al. in 2011 [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], a wide range of techniques have been
proposed to tackle the DG challenge [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]–[
          <xref ref-type="bibr" rid="ref16">16</xref>
          ].
        </p>
        <p>
          These approaches include learning domain-invariant
representations by aligning source domain distributions
[
          <xref ref-type="bibr" rid="ref17 ref18">17, 18</xref>
          ], simulating domain shifts during training using
meta-learning [
          <xref ref-type="bibr" rid="ref19 ref20">19, 20</xref>
          ], and generating synthetic data
through domain augmentation [
          <xref ref-type="bibr" rid="ref21 ref22">21, 22</xref>
          ]. From an application
perspective, DG has been explored in various areas such as
computer vision (e.g., object recognition [
          <xref ref-type="bibr" rid="ref23 ref24">23, 24</xref>
          ], semantic
segmentation [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ], and person re-identification [
          <xref ref-type="bibr" rid="ref15 ref21">15, 21</xref>
          ]),
speech recognition [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ], natural language processing [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ],
medical imaging [
          <xref ref-type="bibr" rid="ref27 ref28">27, 28</xref>
          ], and reinforcement learning [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ].
        </p>
        <p>
          Unlike related paradigms like domain adaptation or
transfer learning, DG uniquely addresses scenarios where
no target domain data is accessible during training.
The original motivation behind DG stemmed from a
medical application known as automatic gating in flow
cytometry [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. This technique involves classifying cells in
blood samples—such as distinguishing lymphocytes from
non-lymphocytes—based on measured cellular properties.
While such automation can significantly aid in diagnosis by
replacing the labor-intensive and expert-dependent manual
gating process, patient-specific distribution shifts hinder
model generalization. Collecting new labeled data for every
patient is not feasible, thereby underscoring the need for
DG solutions.
        </p>
        <p>
          In medical imaging, domain shift is especially prevalent
due to variations across clinical sites and individual
patients [
          <xref ref-type="bibr" rid="ref28 ref29">28, 29</xref>
          ]. Datasets like Multi-site Prostate MRI
Segmentation [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ] and Chest X-rays [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ] reflect this reality,
with diferences in imaging equipment and acquisition
protocols introducing substantial distribution variability.
        </p>
        <p>
          Domain alignment techniques have been applied across
diverse DG tasks, including object recognition [
          <xref ref-type="bibr" rid="ref18 ref31">18, 31</xref>
          ],
action recognition [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ], face anti-spoofing [
          <xref ref-type="bibr" rid="ref32 ref33">32, 33</xref>
          ], and
medical image analysis [
          <xref ref-type="bibr" rid="ref34 ref35">34, 35</xref>
          ]. Among the simplest and
most efective strategies for mitigating domain shift in
medical imaging are image transformations [
          <xref ref-type="bibr" rid="ref36 ref37 ref38">36, 37, 38</xref>
          ].
These transformations can emulate changes in color and
geometry, often caused by device heterogeneity—such as
diferent scanners in various medical centers. However,
care must be taken in selecting these transformations. In
some domains (e.g., digit recognition or optical character
recognition), certain transformations like horizontal or
vertical flips may alter the semantic label, leading to label
shift. Therefore, transformation-based strategies must be
chosen judiciously to preserve task relevance and integrity.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. The Role of Knowledge-Guided Systems in Improving Model Generalization</title>
        <p>Deep learning models, although powerful in capturing
visual patterns, are notoriously sensitive to domain
shifts—arising from demographic, device, or protocol
of clinically validated rules, visual biomarkers, and
demographic insights into conventional learning pipelines.
This approach is designed to improve robustness,
interpretability, and domain generalization, addressing
critical limitations commonly encountered in medical
deployments where data heterogeneity, distribution shifts,
and limited supervision can degrade model performance.</p>
      </sec>
      <sec id="sec-2-3">
        <title>3.1. Knowledge-Guided Augmentation of</title>
      </sec>
      <sec id="sec-2-4">
        <title>Deep Models</title>
        <p>Traditional deep learning models typically learn a predictive
mapping  DL
∶ 
→</p>
        <p>modalities (e.g., retinal images) and 
, where</p>
        <p>denotes input
represents target
disease labels. This approach inherently lacks structured
medical inductive biases, potentially limiting clinical
applicability. To overcome this limitation, we propose a
dual-branch architecture, integrating structured knowledge
representation</p>
        <p>We formalize 
into deep learning-based image analysis.</p>
        <p>as a set of diagnostic rules { 1,  2, … ,   },
each reflecting expert-validated correlations between
observable clinical features and disease states.
These
rules incorporate visual biomarkers such as (e.g., exudates,
hemorrhages,
vascular
patterns)
and
demographic
parameters (e.g., patient age, glycemic status).
For
practical implementation, we develop corresponding
feature extractors  = { 1,  2, … ,   }, instantiated via object
detection models (YOLOv11), segmentation architectures,
and logical rule functions.</p>
        <p>Each extractor   outputs a quantitative feature   ∈ ℝ,
aggregated into a structured vector:</p>
        <p>∗ = { 1,  2, … ,   }.</p>
        <p>This structured vector encodes clinical attributes such as
presence, severity, and spatial distribution of significant
retinal lesions, facilitating symbolic reasoning aligned
closely with clinical diagnostic criteria.</p>
        <p>A parallel knowledge-driven classifier  KD ∶  ∗ →  is
trained alongside the deep learning model  DL. The final
prediction can then be determined through diferent fusion
strategies. In the simplest case, a selective fusion rule is
applied:
 final = {
 DL, if  DL ≥  KD,
 KD, otherwise,
where  DL and  KD denote the maximum confidence scores
from the deep and symbolic classifiers, respectively.
This
strategy
enhances
robustness
by
leveraging
symbolic reasoning when the deep model predictions
exhibit uncertainty, particularly valuable in handling
out-of-distribution scenarios. Beyond this, we experimented
with three additional fusion techniques:
1.</p>
        <p>Max Confidence Fusion : both the neural (ViT)
and symbolic classifiers output calibrated probabilities via
softmax normalization. The class with the globally highest
confidence is selected, irrespective of source.</p>
        <p>Class-wise Max Fusion:
normalized per-class
confidence scores are compared across models, and the
prediction is made according to the higher class-specific
confidence.</p>
        <p>3. Weighted Fusion: empirically tuned weights
(  ,   ) are applied to balance neural and symbolic
predictions. Formally,
 final = arg max ( 
∈</p>
        <p>⋅   () +   ⋅   () ) ,
where   () and   () are the softmax confidence scores
assigned by the deep and symbolic classifiers, respectively,
for class  , and  is the set of all DR severity classes.</p>
        <p>Together, these strategies allow us to assess the trade-of
between model confidence, robustness, and the influence of
symbolic knowledge on final decision-making.</p>
      </sec>
      <sec id="sec-2-5">
        <title>3.2. Diabetic Retinopathy Classification</title>
        <p>We evaluated the proposed KG-DG framework on the task
of diabetic retinopathy (DR) classification using retinal
fundus images—well-suited for knowledge-guided learning
due to the presence of clearly defined visual pathologies
such as microaneurysms, hemorrhages, exudates, and
neovascularization. Domain-specific diagnostic rules were
curated from ophthalmological guidelines (see Table 1) and
operationalized via automated feature extraction pipelines
built using two open-source, modular tools: YOLOv11 and
a retinal vessel segmentation model.</p>
        <p>
          For lesion-level localization, we employed the YOLOv11
object detection model, a state-of-the-art one-stage
detector known for its eficiency and precision in
dense object environments. YOLOv11 extends the
YOLOv5/YOLOv7 series with advanced improvements
including CSPDarkNet-based backbones, decoupled heads,
and dynamic label assignment (DLA), achieving superior
mean average precision (mAP) with real-time inference
capabilities [
          <xref ref-type="bibr" rid="ref51">51</xref>
          ]. We fine-tuned YOLOv11 to detect clinically
relevant lesions such as hemorrhages, hard exudates, and
cotton wool spots. Bounding boxes produced by the model
were post-processed and validated using Intersection over
Union (IoU) scores against expert-labeled fundus images,
ensuring medical fidelity.
        </p>
        <p>In parallel, we integrated a vein segmentation module to
extract morphological vessel features. This module, adapted
from the open-source DRIVE and CHASE-DB1 datasets,
uses a modified U-Net architecture with spatial attention
layers to segment retinal vessels with high sensitivity.
From these segmented maps, we extracted quantitative
features including vessel tortuosity, branching angles, and
average caliber—biomarkers strongly associated with DR
progression.</p>
        <p>This structured knowledge vector was passed into a
parallel symbolic classifier trained independently from the
deep model, enabling our system to rely on rule-driven
inference when the deep model exhibits uncertainty.
Various machine learning models, including Logistic
Regression, Random Forest, Support Vector Machines
(SVM), Gradient Boosting, and K-Nearest Neighbors, were
evaluated for knowledge-based classification on the feature
set ( ). Among these, Gradient Boosting demonstrated the
best classification performance. Both YOLOv11 and the
vein segmentation module functioned solely as independent
auxiliary components to extract biomarkers from images,
facilitating symbolic reasoning. The biomarkers were
annotated by expert medical annotators on approximately
500 images, with random samples validated by respective
domain experts. These annotations were subsequently
used to fine-tune the YOLOv11 and vein segmentation
modules, which then act as knowledge extractors within the
pipeline. This accurate integration of clinical knowledge
enhances model robustness, promotes domain invariance,
and provides a solid foundation for understanding domain
shifts through distributional alignment.</p>
        <p>The classification results from the knowledge-based
machine learning model and the ViT model are integrated
using three main methods as shown in Figure 2: (1) selecting
the maximum confidence score across all predictions, (2)
computing the class-wise maximum confidence, and (3)
applying a weighted confidence scheme. The outcomes
of these three integration strategies are evaluated to assess
the overall performance of the final framework.</p>
      </sec>
      <sec id="sec-2-6">
        <title>3.3. Backbone Architectures and Training</title>
      </sec>
      <sec id="sec-2-7">
        <title>Strategy</title>
        <p>
          For the image-based analysis, we employed advanced
Vision Transformer (ViT) architectures. The DeiT-small
architecture, comprising approximately 22M parameters,
was used without distillation [
          <xref ref-type="bibr" rid="ref52">52</xref>
          ]. The CvT-13 model,
with 20M parameters, integrates convolutional layers with
transformer blocks to enhance spatial feature learning [
          <xref ref-type="bibr" rid="ref53">53</xref>
          ].
Additionally, we utilized T2T-ViT-14, featuring progressive
tokenization and encompassing 21.5M parameters [
          <xref ref-type="bibr" rid="ref54">54</xref>
          ].
        </p>
        <p>All ViT models were initialized with ImageNet-pretrained
weights, and during training, encoder parameters remained
ifxed to prevent overfitting. Only the classification heads
underwent optimization using class-weighted cross-entropy
loss. Training adhered to DomainBed protocols, employing
resizing to 224×224, random cropping, horizontal flipping,
color jitter, and grayscale augmentation. AdamW optimizer
was utilized with a learning rate of 5 × 10−5, and
early stopping was implemented after 10 epochs without
performance improvement.</p>
      </sec>
      <sec id="sec-2-8">
        <title>3.4. Evaluation Protocol and Results</title>
        <p>
          Initially, KG-DG is evaluated on the Aptos Dataset (60%
training, 20% cross-validation, and 20% testing), achieving
superior performance, exceeding a ViT benchmark by
6% (84.65% vs. 78.40%) and significantly outperforming
existing baselines. We conducted extensive evaluations in
both multi-source and single-source domain generalization
settings using publicly available DR datasets: APTOS [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ],
EyePACS [
          <xref ref-type="bibr" rid="ref55">55</xref>
          ], MESSIDOR, and MESSIDOR2 [
          <xref ref-type="bibr" rid="ref56">56</xref>
          ]. Each
dataset constituted a distinct domain. In multi-source
experiments, we trained models on three datasets while
testing on the fourth. In single-source setups, we trained on
a single dataset and evaluated on the remaining domains.
        </p>
        <p>Our knowledge-guided framework consistently
demonstrated superior performance, achieving a +2.1%
average accuracy improvement in multi-source domain
generalization and a notable +4.2% increase in single-source
domain generalization scenarios, particularly impactful on
imbalanced data distributions (see detailed results in Table
6).</p>
        <p>The structured knowledge-driven classifier notably
improved generalization by encapsulating domain-invariant
medical reasoning, whereas the deep learning branch
efectively modeled intricate visual patterns, validating the
efectiveness of integrating clinical expertise within deep
learning frameworks.</p>
        <p>Note. Unless otherwise stated, in all tables the
best-performing value within each column is highlighted in
bold.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Experiments</title>
      <sec id="sec-3-1">
        <title>4.1. Single Domain Generalization Results</title>
        <p>
          In the SDG setting, models were trained on one dataset
and evaluated on the remaining three to simulate clinical
deployment in unseen environments. Our method was
evaluated against DRGen, ERM-ViT, SD-ViT, and SPSD-ViT
using APTOS [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], EyePACS [
          <xref ref-type="bibr" rid="ref55">55</xref>
          ], Messidor-1 and Messidor-2.
[
          <xref ref-type="bibr" rid="ref56">56</xref>
          ] as source domains respectively. As shown in Tables 2-5,
our method consistently outperformed existing baselines in
three out of four training configurations.
        </p>
        <p>For instance, when trained on APTOS, the Non-Weighted
DL+KL fusion achieved the highest average accuracy
(59.9%), outperforming all transformer baselines and
showing superior generalization to diverse domains like
MESSIDOR2. Similarly, when trained on MESSIDOR2,
the Weighted DL+KL fusion delivered a performance
of 65.5%, highlighting robustness against shifts in both
demographic and imaging characteristics. These results
validate that symbolic knowledge integration enables
efective generalization from a single domain, crucial for
low-resource clinical settings.</p>
      </sec>
      <sec id="sec-3-2">
        <title>4.2. Multi Domain Generalization Results</title>
        <p>In the MDG setting, we trained our model on three
datasets and evaluated on the unseen fourth, as per
the DomainBed protocol. Results in Table 6 show
that our KG-DG model using Clip-ViT (ViT+KL) and
symbolic classifiers significantly improved generalization
compared to popular convolutional and transformer-based
DG methods, including ERM, IRM, Fishr, and SD-ViT.
Notably, the knowledge-guided symbolic model (KL
only) achieved the best average accuracy (63.67%), while
SPSD-ViT and ERM-ViT with strong augmentations reached
65.5%. Despite having fewer parameters, our model’s
performance indicates efective utilization of symbolic
lesion features and their generalization power across
domain shifts. In particular, the KL model exceeded both
standard ViT and ResNet baselines across most target
domains, demonstrating the critical role of encoded clinical
knowledge in cross-domain settings.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Evaluation</title>
      <sec id="sec-4-1">
        <title>5.1. Benchmark Setup</title>
        <p>
          To rigorously evaluate the generalization capability of the
proposed KG-DG framework, we conducted experiments on
four publicly available diabetic retinopathy (DR) fundus
image datasets: APTOS [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], EyePACS, Messidor-1, and
Messidor-2. Each dataset represents a distinct clinical
domain, difering significantly in patient demographics,
imaging devices, and image acquisition protocols. Following
the DomainBed benchmark protocol established by
Gulrajani et al. [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], we implemented two experimental
scenarios: Single-Domain Generalization (SDG), wherein
the model is trained on a single domain and evaluated on the
remaining three domains, and Multi-Domain Generalization
(MDG), where training is performed on three domains with
evaluation conducted on a separate unseen domain.
        </p>
        <p>
          For preprocessing, all images were uniformly resized
to 224 × 224 pixels and subjected to data augmentations
including center cropping, horizontal flipping, color
We evaluated our KG-DG framework against several
competitive baseline methods representative of both
convolutional neural network (CNN)-based and
transformer-based domain generalization strategies.
For convolutional architectures, we included Empirical Risk
Minimization (ERM) with ResNet-50 [
          <xref ref-type="bibr" rid="ref67">67</xref>
          ], a strong baseline
under fair evaluation standards [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Additionally, we
compared against Invariant Risk Minimization (IRM) [
          <xref ref-type="bibr" rid="ref40">40</xref>
          ],
Group Distributionally Robust Optimization (GroupDRO)
[
          <xref ref-type="bibr" rid="ref60">60</xref>
          ], Fishr [
          <xref ref-type="bibr" rid="ref41">41</xref>
          ], and Adaptive Risk Minimization (ARM) [
          <xref ref-type="bibr" rid="ref58">58</xref>
          ],
each employing distinct strategies to enforce robustness
and domain invariance.
        </p>
        <p>
          Transformer-based models considered included ERM-ViT
with DeiT-Small [
          <xref ref-type="bibr" rid="ref68">68</xref>
          ], CvT-13 [
          <xref ref-type="bibr" rid="ref53">53</xref>
          ], and T2T-ViT [
          <xref ref-type="bibr" rid="ref69">69</xref>
          ]. We
further included state-of-the-art transformer-based domain
generalization models, SD-ViT [
          <xref ref-type="bibr" rid="ref65">65</xref>
          ] and SPSD-ViT [
          <xref ref-type="bibr" rid="ref66">66</xref>
          ],
which utilize semantic alignment and pseudo-labeling to
enhance robustness. Lastly, we compared against DRGen
[
          <xref ref-type="bibr" rid="ref70">70</xref>
          ], a DR-specific DG method leveraging adversarial and
contrastive learning.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>5.3. Ablation Study</title>
        <p>Ablation Study I: APTOS-Trained Domain
Generalization To understand the individual and
combined contributions of neural and symbolic components
in our framework, we conducted a focused ablation study
Model
using the APTOS dataset as the source domain. Table 8
reports the accuracy performance on three unseen target
domains—EyePACS, Messidor-1 and Messidor-2 when
models were trained solely on APTOS.</p>
        <p>The neural-only baseline using Vision Transformer
(ViT) achieves a modest average accuracy of 53.9%,
indicating limited generalization under domain shift. The
symbolic-only model, based on knowledge-driven lesion
features (KL), improves the average accuracy to 56.6%,
highlighting the value of structured clinical priors. The
best performance is observed when combining both
neural and symbolic reasoning. In particular, the
non-weighted fusion approach yields the highest average
accuracy of 59.9%, outperforming both standalone models.
This result demonstrates the strength of the proposed
neuro-symbolic integration in improving robustness and
domain generalization in diabetic retinopathy classification.</p>
        <p>Ablation Study II: Performance of Symbolic Lesion
Biomarkers with and without Retinal Vein Features.
This experiment evaluates the discriminative capacity
of structured symbolic features extracted from retinal
images, focusing on four clinically validated lesion types:
exudates, hard hemorrhages, soft hemorrhages, and cotton
wool spots. The first group of results in Table 7 includes
only lesion-based features, while the second incorporates
additional vascular information derived from retinal vein
morphology—such as tortuosity, caliber, and branching
angles.</p>
        <p>Across all classifiers, models trained solely on lesion
features consistently outperform those that include both
lesions and vein information. Gradient Boosting achieves
the highest accuracy (84.65%) and macro F1-score (84.12%),
confirming the strong discriminative value of lesion-level
biomarkers. In contrast, the addition of vein-based features
leads to performance degradation, indicating that vessel
morphology introduces domain-sensitive variability that
hampers generalization.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>6. Conclusions</title>
      <p>This paper introduces KG-DG, an improved
knowledge-guided domain generalization framework
specifically tailored for medical imaging applications,
as exemplified by diabetic retinopathy classification.
KG-DG integrates symbolic clinical reasoning and deep
visual representations through a confidence-weighted
fusion approach, significantly enhancing robustness and
interpretability. Comprehensive experimental results
on four diverse DR datasets demonstrated that KG-DG
consistently achieved superior performance compared
to strong baselines of domain generalization methods,
achieving notable improvements in both single-source and
multi-source generalization settings, with gains of up to
5.2% accuracy in cross-domain accuracy.</p>
      <p>
        Our findings underscore the importance of embedding
structured clinical knowledge within deep learning
models, thereby significantly improving generalization
and trustworthiness in clinical settings. Future directions
include adapting the KG-DG framework to additional
medical imaging modalities, such as optical coherence
tomography and histopathology, and further integrating
dynamic symbolic reasoning via neuro-symbolic
architectures, enhancing real-time decision support
capabilities in medical AI deployments. Insights: Our
observations indicate that the integration of symbolic
clinical knowledge into traditional architectures—whether
Vision Transformers (ViTs) or domain-specific models
such as DeepXSOZ [
        <xref ref-type="bibr" rid="ref71">71</xref>
        ] consistently leads to significant
improvements in classification accuracy. Furthermore,
this knowledge imputation enhances both domain
generalization and the explainability of model behavior,
addressing critical challenges in clinical deployment.
      </p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work was partly funded by NSF (FDT-Biotech 2436801),
and the Helmsley Charitable Trust (2-SRA-2017-503-M-B).
Declaration on Generative AI: During the preparation of
this work, the author(s) used ChatGPT solely for grammar
and spelling checking.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Khandelwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Siyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bhaskaran</surname>
          </string-name>
          , Graphdr:
          <article-title>Lesion ontology guided graph convolution for diabetic retinopathy classification, in: Medical Image Computing</article-title>
          and
          <string-name>
            <surname>Computer-Assisted Intervention</surname>
          </string-name>
          (MICCAI),
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kauppi</surname>
          </string-name>
          , et al.,
          <article-title>The aptos 2019 blindness detection dataset</article-title>
          ,
          <source>Kaggle</source>
          ,
          <year>2019</year>
          . https://www.kaggle.com/c/ aptos2019-blindness-detection.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Dosovitskiy</surname>
          </string-name>
          , et al.,
          <article-title>An image is worth 16x16 words: Transformers for image recognition at scale</article-title>
          ,
          <source>in: International Conference on Learning Representations (ICLR)</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yu</surname>
          </string-name>
          , G. Chen,
          <article-title>Ddr: A diverse dataset for diabetic retinopathy classification, in: Medical Image Computing</article-title>
          and
          <string-name>
            <surname>Computer-Assisted Intervention</surname>
          </string-name>
          (MICCAI),
          <year>2022</year>
          , pp.
          <fpage>234</fpage>
          -
          <lpage>245</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>I.</given-names>
            <surname>Gulrajani</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          <article-title>Lopez-Paz, In search of lost domain generalization</article-title>
          ,
          <source>in: International Conference on Learning Representations (ICLR)</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Han</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J</given-names>
            .
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Luo</surname>
          </string-name>
          , D. Liu,
          <article-title>Neuro-symbolic generative model for medical report generation with prior knowledge</article-title>
          ,
          <source>IEEE Transactions on Medical Imaging</source>
          <volume>40</volume>
          (
          <year>2021</year>
          )
          <fpage>3436</fpage>
          -
          <lpage>3447</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>E.</given-names>
            <surname>Ozkan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Boix</surname>
          </string-name>
          ,
          <article-title>On the benefits of multi-domain training for medical image analysis</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>12456</fpage>
          -
          <lpage>12466</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Difusion-based domain augmentation for robust medical image analysis</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>12345</fpage>
          -
          <lpage>12355</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>B.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Saenko</surname>
          </string-name>
          ,
          <article-title>Deep coral: Correlation alignment for deep domain adaptation</article-title>
          ,
          <source>in: European Conference on Computer Vision (ECCV)</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>443</fpage>
          -
          <lpage>450</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>T.</given-names>
            <surname>Chen</surname>
          </string-name>
          , K. Han,
          <string-name>
            <surname>J</surname>
          </string-name>
          . Wang,
          <article-title>Fortifying medical image domain generalization via contrastive feature disentanglement</article-title>
          ,
          <source>IEEE Transactions on Medical Imaging</source>
          <volume>43</volume>
          (
          <year>2024</year>
          )
          <fpage>512</fpage>
          -
          <lpage>525</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>P. W.</given-names>
            <surname>Koh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. S.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mussmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Pierson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <article-title>Concept bottleneck models</article-title>
          ,
          <source>in: Proceedings of the 37th International Conference on Machine Learning (ICML)</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Espinosa Zarlenga</surname>
          </string-name>
          , et al.,
          <article-title>Concept embedding models: Toward interpretable and accurate concept-based learning</article-title>
          ,
          <source>in: Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS)</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>K.</given-names>
            <surname>Muandet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Balduzzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Schölkopf</surname>
          </string-name>
          ,
          <article-title>Domain generalization via invariant feature representation</article-title>
          ,
          <source>in: Proceedings of the 30th International Conference on Machine Learning (ICML)</source>
          ,
          <year>2013</year>
          , pp.
          <source>I-10-I-18.</source>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>G.</given-names>
            <surname>Blanchard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Scott</surname>
          </string-name>
          ,
          <article-title>Generalizing from several related classification tasks to a new unlabeled sample</article-title>
          ,
          <source>in: Proceedings of the 24th International Conference on Neural Information Processing Systems (NeurIPS)</source>
          ,
          <year>2011</year>
          , pp.
          <fpage>2178</fpage>
          -
          <lpage>2186</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Qiao</surname>
          </string-name>
          , T. Xiang,
          <article-title>Domain generalization with mixstyle</article-title>
          ,
          <source>in: International Conference on Learning Representations (ICLR)</source>
          ,
          <year>2021</year>
          . ArXiv:
          <year>2012</year>
          .03641.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>J.</given-names>
            <surname>Cha</surname>
          </string-name>
          , et al.,
          <article-title>Swad: Domain generalization by seeking lfat minima</article-title>
          ,
          <source>in: Proceedings of the 35th International Conference on Neural Information Processing Systems (NeurIPS)</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>22405</fpage>
          -
          <lpage>22418</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. C.</given-names>
            <surname>Kot</surname>
          </string-name>
          ,
          <article-title>Domain generalization with adversarial feature learning</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>5400</fpage>
          -
          <lpage>5409</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          , et al.,
          <article-title>Deep domain generalization via conditional invariant adversarial networks</article-title>
          ,
          <source>in: Proceedings of the European Conference on Computer Vision (ECCV)</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>647</fpage>
          -
          <lpage>663</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>D.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. M.</given-names>
            <surname>Hospedales</surname>
          </string-name>
          ,
          <article-title>Learning to generalize: Meta-learning for domain generalization</article-title>
          ,
          <source>in: Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI)</source>
          , volume
          <volume>32</volume>
          ,
          <year>2018</year>
          , pp.
          <fpage>427</fpage>
          -
          <lpage>434</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Balaji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sankaranarayanan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Chellappa</surname>
          </string-name>
          , Metareg:
          <article-title>Towards domain generalization using meta-regularization</article-title>
          ,
          <source>in: Proceedings of the 32nd International Conference on Neural Information Processing Systems (NeurIPS)</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>1006</fpage>
          -
          <lpage>1016</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. M.</given-names>
            <surname>Hospedales</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Xiang</surname>
          </string-name>
          ,
          <article-title>Learning to generate novel domains for domain generalization</article-title>
          ,
          <source>in: Proceedings of the European Conference on Computer Vision (ECCV)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>561</fpage>
          -
          <lpage>578</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. M.</given-names>
            <surname>Hospedales</surname>
          </string-name>
          , T. Xiang,
          <article-title>Deep domain-adversarial image generation for domain generalisation</article-title>
          ,
          <source>in: Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>13025</fpage>
          -
          <lpage>13032</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>D.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. M.</given-names>
            <surname>Hospedales</surname>
          </string-name>
          ,
          <article-title>Deeper, broader and artier domain generalization</article-title>
          ,
          <source>in: Proceedings of the IEEE International Conference on Computer Vision</source>
          (ICCV),
          <year>2017</year>
          , pp.
          <fpage>5543</fpage>
          -
          <lpage>5551</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. M.</given-names>
            <surname>Hospedales</surname>
          </string-name>
          ,
          <article-title>Feature-critic networks for heterogeneous domain generalization</article-title>
          ,
          <source>in: Proceedings of the 36th International Conference on Machine Learning (ICML)</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>3915</fpage>
          -
          <lpage>3924</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>R.</given-names>
            <surname>Volpi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Murino</surname>
          </string-name>
          ,
          <article-title>Addressing model vulnerability to distributional shifts over image transformation sets</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF International Conference on Computer Vision</source>
          (ICCV),
          <year>2019</year>
          , pp.
          <fpage>7979</fpage>
          -
          <lpage>7988</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>S.</given-names>
            <surname>Shankar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Piratla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chakrabarti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chaudhuri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Jyothi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sarawagi</surname>
          </string-name>
          ,
          <article-title>Generalizing across domains via cross-gradient training</article-title>
          ,
          <source>in: International Conference on Learning Representations (ICLR)</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Dou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yu</surname>
          </string-name>
          , P.-A. Heng,
          <article-title>Ms-net: Multi-site network for improving prostate segmentation with heterogeneous mri data</article-title>
          ,
          <source>IEEE Transactions on Medical Imaging</source>
          <volume>39</volume>
          (
          <year>2020</year>
          )
          <fpage>2713</fpage>
          -
          <lpage>2724</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Dou</surname>
          </string-name>
          , P.-A. Heng,
          <article-title>Shape-aware meta-learning for generalizing prostate mri segmentation to unseen domains, in: Medical Image Computing</article-title>
          and
          <string-name>
            <surname>Computer-Assisted Intervention</surname>
          </string-name>
          (MICCAI),
          <year>2020</year>
          , pp.
          <fpage>475</fpage>
          -
          <lpage>485</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Dou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. C.</given-names>
            <surname>Castro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kamnitsas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Glocker</surname>
          </string-name>
          ,
          <article-title>Domain generalization via model-agnostic learning of semantic features</article-title>
          ,
          <source>in: Advances in Neural Information Processing Systems (NeurIPS)</source>
          , volume
          <volume>32</volume>
          ,
          <year>2019</year>
          , pp.
          <fpage>579</fpage>
          -
          <lpage>589</lpage>
          . ArXiv:
          <year>1901</year>
          .10184.
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>D.</given-names>
            <surname>Mahajan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tople</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <article-title>Domain generalization using causal matching</article-title>
          ,
          <source>in: International Conference on Machine Learning (ICML)</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>7313</fpage>
          -
          <lpage>7324</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ghifary</surname>
          </string-name>
          , W. B.
          <string-name>
            <surname>Kleijn</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          , D. Balduzzi,
          <article-title>Domain generalization for object recognition with multi-task autoencoders</article-title>
          ,
          <source>in: Proceedings of the IEEE International Conference on Computer Vision</source>
          (ICCV),
          <year>2015</year>
          , pp.
          <fpage>2551</fpage>
          -
          <lpage>2559</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>R.</given-names>
            <surname>Shao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Lan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. C.</surname>
          </string-name>
          <article-title>Yuen, Multi-adversarial discriminative deep domain generalization for face presentation attack detection</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>10015</fpage>
          -
          <lpage>10023</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>Single-side domain generalization for face anti-spoofing</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>8481</fpage>
          -
          <lpage>8490</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Wan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.-Q.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. C.</given-names>
            <surname>Kot</surname>
          </string-name>
          ,
          <article-title>Domain generalization for medical imaging classification with linear-dependency regularization</article-title>
          ,
          <source>in: Advances in Neural Information Processing Systems (NeurIPS)</source>
          , volume
          <volume>33</volume>
          ,
          <year>2020</year>
          , pp.
          <fpage>3118</fpage>
          -
          <lpage>3129</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>S.</given-names>
            <surname>Aslani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Murino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dayan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Sona</surname>
          </string-name>
          , G. Hamarneh,
          <article-title>Scanner invariant multiple sclerosis lesion segmentation from mri</article-title>
          ,
          <source>in: Proceedings of the IEEE 17th International Symposium on Biomedical Imaging (ISBI)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>781</fpage>
          -
          <lpage>785</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>S.</given-names>
            <surname>Otálora</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Atzori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Andrearczyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <article-title>Staining invariant features for improving generalization of deep convolutional neural networks in computational pathology</article-title>
          ,
          <source>Frontiers in Bioengineering and Biotechnology</source>
          <volume>7</volume>
          (
          <year>2019</year>
          )
          <fpage>198</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>C.</given-names>
            <surname>Chen</surname>
          </string-name>
          , et al.,
          <article-title>Improving the generalizability of convolutional neural network-based segmentation on cmr images</article-title>
          ,
          <source>Frontiers in Cardiovascular Medicine</source>
          <volume>7</volume>
          (
          <year>2020</year>
          )
          <fpage>105</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , et al.,
          <article-title>Generalizing deep learning for medical image segmentation to unseen domains via deep stacked transformation</article-title>
          ,
          <source>IEEE Transactions on Medical Imaging</source>
          <volume>39</volume>
          (
          <year>2020</year>
          )
          <fpage>2531</fpage>
          -
          <lpage>2540</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>P.</given-names>
            <surname>Kamboj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Banerjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. K. S.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <article-title>Expert knowledge driven human-ai collaboration for medical imaging: A study on epileptic seizure onset zone identification</article-title>
          ,
          <source>IEEE Journal of Biomedical and Health Informatics</source>
          (
          <year>2023</year>
          ). In press.
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>M.</given-names>
            <surname>Arjovsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bottou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Gulrajani</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          Lopez-Paz,
          <article-title>Invariant risk minimization</article-title>
          , arXiv preprint arXiv:
          <year>1907</year>
          .
          <volume>02893</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rame</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Dancette</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cord</surname>
          </string-name>
          , Fishr:
          <article-title>Invariant gradient variances for out-of-distribution generalization</article-title>
          ,
          <source>in: International Conference on Machine Learning (ICML)</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>R. N.</given-names>
            <surname>Frank</surname>
          </string-name>
          , Diabetic retinopathy,
          <source>New England Journal of Medicine</source>
          <volume>350</volume>
          (
          <year>2004</year>
          )
          <fpage>48</fpage>
          -
          <lpage>58</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          [43]
          <string-name>
            <given-names>C. P.</given-names>
            <surname>Wilkinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. L.</given-names>
            <surname>Ferris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. E.</given-names>
            <surname>Klein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. P.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.-D. Agardh</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Davis</surname>
            ,
            <given-names>H.-P.</given-names>
          </string-name>
          <string-name>
            <surname>Hammes</surname>
          </string-name>
          ,
          <article-title>Proposed international clinical diabetic retinopathy and diabetic macular edema disease severity scales</article-title>
          ,
          <source>Ophthalmology</source>
          <volume>110</volume>
          (
          <year>2003</year>
          )
          <fpage>1677</fpage>
          -
          <lpage>1682</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          [44]
          <string-name>
            <given-names>R.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ramasamy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Abraham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <article-title>Diabetic retinopathy: An update</article-title>
          ,
          <source>Indian Journal of Ophthalmology</source>
          <volume>56</volume>
          (
          <year>2008</year>
          )
          <fpage>179</fpage>
          -
          <lpage>188</lpage>
          . doi:
          <volume>10</volume>
          . 4103/
          <fpage>0301</fpage>
          -
          <lpage>4738</lpage>
          .41167, https://www.ncbi.nlm.nih. gov/pmc/articles/PMC2636123/.
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          [45]
          <string-name>
            <surname>A. A.</surname>
          </string-name>
          of Ophthalmology,
          <source>Diabetic retinopathy preferred practice pattern</source>
          ,
          <year>2023</year>
          , https://www.aao.
          <article-title>org/ preferred-practice-pattern/diabetic-retinopathy-</article-title>
          <string-name>
            <surname>ppp</surname>
          </string-name>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          [46]
          <string-name>
            <given-names>S.</given-names>
            <surname>Publishing</surname>
          </string-name>
          , Diabetic retinopathy, https://www.ncbi. nlm.nih.gov/books/NBK560805/,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          [47]
          <string-name>
            <given-names>E. R.</given-names>
            <surname>Group</surname>
          </string-name>
          ,
          <article-title>Grading diabetic retinopathy and estimating its progression</article-title>
          ,
          <source>Ophthalmology</source>
          <volume>98</volume>
          (
          <year>1991</year>
          )
          <fpage>786</fpage>
          -
          <lpage>806</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          [48]
          <string-name>
            <given-names>U. V.</given-names>
            <surname>Shukla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Tripathy</surname>
          </string-name>
          , Diabetic retinopathy,
          <year>2025</year>
          .
          <source>Updated August 25</source>
          ,
          <year>2023</year>
          , https://www.ncbi.nlm.nih. gov/books/NBK560805/.
        </mixed-citation>
      </ref>
      <ref id="ref49">
        <mixed-citation>
          [49]
          <string-name>
            <given-names>M.</given-names>
            <surname>Yanof</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Duker</surname>
          </string-name>
          , Ophthalmology, 5th ed.,
          <source>Elsevier</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref50">
        <mixed-citation>
          [50]
          <string-name>
            <given-names>A. A. for Pediatric</given-names>
            <surname>Ophthalmology</surname>
          </string-name>
          , Strabismus, Proliferative diabetic retinopathy, https://aapos.org/ glossary/proliferative-diabetic-retinopathy,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref51">
        <mixed-citation>
          [51]
          <string-name>
            <surname>C.-Y. Wang</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Bochkovskiy</surname>
          </string-name>
          , H.
          <string-name>
            <surname>-Y. M. Liao</surname>
          </string-name>
          ,
          <article-title>Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors</article-title>
          ,
          <source>arXiv preprint arXiv:2207.02696</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref52">
        <mixed-citation>
          [52]
          <string-name>
            <given-names>H.</given-names>
            <surname>Touvron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cord</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Douze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Massa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sablayrolles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jégou</surname>
          </string-name>
          ,
          <article-title>Training data-eficient image transformers &amp; distillation through attention</article-title>
          ,
          <source>in: Proceedings of the 38th International Conference on Machine Learning (ICML)</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref53">
        <mixed-citation>
          [53]
          <string-name>
            <given-names>H.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Codella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yuan</surname>
          </string-name>
          , L. Zhang, Cvt: Introducing convolutions to vision transformers,
          <source>arXiv preprint arXiv:2103.15808</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref54">
        <mixed-citation>
          [54]
          <string-name>
            <given-names>L.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. E. H.</given-names>
            <surname>Tay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <article-title>Tokens-to-token vit: Training vision transformers from scratch on imagenet</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF International Conference on Computer Vision</source>
          (ICCV),
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref55">
        <mixed-citation>
          [55]
          <string-name>
            <surname>Kaggle</surname>
          </string-name>
          , Diabetic retinopathy detection, https://www. kaggle.com/c/diabeticretinopathy-detection,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref56">
        <mixed-citation>
          [56]
          <string-name>
            <given-names>E.</given-names>
            <surname>Decencière</surname>
          </string-name>
          , et al.,
          <article-title>Feedback on a publicly distributed image database: The messidor database</article-title>
          ,
          <source>Image Analysis and Stereology</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref57">
        <mixed-citation>
          [57]
          <string-name>
            <given-names>V.</given-names>
            <surname>Vapnik</surname>
          </string-name>
          ,
          <source>The Nature of Statistical Learning Theory, Springer Science &amp; Business Media</source>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref58">
        <mixed-citation>
          [58]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Marklund</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Dhawan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Levine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Finn</surname>
          </string-name>
          ,
          <article-title>Adaptive risk minimization: Learning to adapt to domain shift</article-title>
          ,
          <source>in: Advances in Neural Information Processing Systems (NeurIPS)</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref59">
        <mixed-citation>
          [59]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Seely</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. H. S.</given-names>
            <surname>Torr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Siddharth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hannun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Usunier</surname>
          </string-name>
          , G. Synnaeve,
          <article-title>Gradient matching for domain generalization</article-title>
          ,
          <source>arXiv preprint arXiv:2104.09937</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref60">
        <mixed-citation>
          [60]
          <string-name>
            <given-names>S.</given-names>
            <surname>Sagawa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. W.</given-names>
            <surname>Koh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Hashimoto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <article-title>Distributionally robust neural networks for group shifts</article-title>
          ,
          <source>in: International Conference on Learning Representations (ICLR)</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref61">
        <mixed-citation>
          [61]
          <string-name>
            <given-names>S.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zou</surname>
          </string-name>
          , L. Ren,
          <article-title>Improve unsupervised domain adaptation with mixup training, in: arXiv preprint</article-title>
          , volume arXiv:
          <year>2001</year>
          .00677,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref62">
        <mixed-citation>
          [62]
          <string-name>
            <given-names>B.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Saenko</surname>
          </string-name>
          ,
          <article-title>Deep coral: Correlation alignment for deep domain adaptation</article-title>
          ,
          <source>in: European Conference on Computer Vision (ECCV)</source>
          , Springer,
          <year>2016</year>
          , pp.
          <fpage>443</fpage>
          -
          <lpage>450</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref63">
        <mixed-citation>
          [63]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ganin</surname>
          </string-name>
          , E. Ustinova,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ajakan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Germain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Larochelle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Laviolette</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Marchand</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Lempitsky</surname>
          </string-name>
          ,
          <article-title>Domain-adversarial training of neural networks</article-title>
          ,
          <source>in: Journal of Machine Learning Research</source>
          , volume
          <volume>17</volume>
          ,
          <year>2016</year>
          , pp.
          <fpage>2096</fpage>
          -
          <lpage>2030</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref64">
        <mixed-citation>
          [64]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Tian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gong</surname>
          </string-name>
          , Y. Liu, T. Liu,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , D. Tao,
          <article-title>Deep domain generalization via conditional invariant adversarial networks</article-title>
          ,
          <source>in: European Conference on Computer Vision (ECCV)</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref65">
        <mixed-citation>
          [65]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sultana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Naseer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <article-title>Self-distilled vision transformer for domain generalization</article-title>
          ,
          <source>in: Asian Conference on Computer Vision (ACCV)</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref66">
        <mixed-citation>
          [66]
          <string-name>
            <given-names>C.</given-names>
            <surname>Jayanga</surname>
          </string-name>
          , G. Kuruppu,
          <string-name>
            <given-names>M. H.</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <article-title>Generalizing to unseen domains in diabetic retinopathy classification</article-title>
          ,
          <source>arXiv preprint arXiv:2311.01673</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref67">
        <mixed-citation>
          [67]
          <string-name>
            <given-names>K.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Ren,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>Deep residual learning for image recognition</article-title>
          ,
          <source>in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>770</fpage>
          -
          <lpage>778</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref68">
        <mixed-citation>
          [68]
          <string-name>
            <given-names>H.</given-names>
            <surname>Touvron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cord</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Douze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Massa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sablayrolles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jégou</surname>
          </string-name>
          ,
          <article-title>Training data-eficient image transformers &amp; distillation through attention</article-title>
          ,
          <source>in: International Conference on Machine Learning (ICML)</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref69">
        <mixed-citation>
          [69]
          <string-name>
            <given-names>L.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. E. H.</given-names>
            <surname>Tay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <article-title>Tokens-to-token vit: Training vision transformers from scratch on imagenet</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF International Conference on Computer Vision</source>
          (ICCV),
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref70">
        <mixed-citation>
          [70]
          <string-name>
            <given-names>M.</given-names>
            <surname>Atwany</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yaqub</surname>
          </string-name>
          , Drgen:
          <article-title>Domain generalization in diabetic retinopathy classification, in: Medical Image Computing</article-title>
          and
          <string-name>
            <surname>Computer-Assisted Intervention</surname>
          </string-name>
          (MICCAI),
          <year>2022</year>
          , pp.
          <fpage>635</fpage>
          -
          <lpage>644</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref71">
        <mixed-citation>
          [71]
          <string-name>
            <surname>D. M. Shama</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Jing</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Venkataraman</surname>
          </string-name>
          ,
          <article-title>Deepsoz: A robust deep model for joint temporal and spatial seizure onset localization from multichannel eeg data, in: Medical Image Computing</article-title>
          and
          <string-name>
            <surname>Computer-Assisted Intervention</surname>
          </string-name>
          (MICCAI),
          <year>2023</year>
          , pp.
          <fpage>183</fpage>
          -
          <lpage>193</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>