1. Introduction

Co-located with ECAI

Generalization in Diabetic Retinopathy: A Neuro-Symbolic Learning Approach

Midhat Urooj

murooj@asu.edu 0

Ayan Banerjee

Farhat Shaikh

Kuntal Thakur

kthakur9@asu.edu 0

Sandeep Gupta

sandeep.gupta@asu.edu 0

Similarly

Diabetic Retinopathy, Vision Transformers, Out-of-Distribution Robustness

0 Impact Lab, Arizona State University , Tempe, AZ , USA

2025

Domain generalization remains a critical challenge in medical imaging, where models trained on single sources often fail under real-world distribution shifts. We propose KG-DG, a neuro-symbolic framework for diabetic retinopathy (DR) classification that integrates vision transformers with expert-guided symbolic reasoning to enable robust generalization across unseen domains. Our approach leverages clinical lesion ontologies through structured, rule-based features and retinal vessel segmentation, fusing them with deep visual representations via a confidence-weighted integration strategy. The framework addresses both Single-Domain Generalization (SDG) and Multi-Domain Generalization (MDG) by aligning high-level clinical semantics across domains, thereby improving robustness and cross-domain performance.

1. Introduction

Diabetic Retinopathy (DR) is a microvascular complication of Diabetes Mellitus that afects the retinal vasculature, leading to hemorrhages, microaneurysms, exudates, and cotton-wool spots which, if left untreated, can culminate in irreversible vision loss [ 1 ].

Manual grading of fundus photographs by expert ophthalmologists remains the clinical gold standard but is both time-consuming and subject to inter-observer variability [ 2 ]. Despite the success of deep learning models—particularly Vision Transformers (ViTs)—on single-source DR datasets [ 3, 4 ], their performance sufers when confronted with domain shifts caused by variations in imaging devices, resolution settings, and patient demographics.

Although Domain Generalization (DG) strategies such as Empirical Risk Minimization under the DomainBed protocol [ 5 ] ofer a baseline for robustness, they often overlook the integration of structured clinical knowledge and realistic augmentation techniques that are critical for reliable cross-domain deployment.

Neuro-symbolic learning, which integrates deep learning with symbolic reasoning, has gained traction as a promising strategy to improve domain generalization in medical imaging.

Deep models extract complex patterns from

raw data, while symbolic components encode high-level domain knowledge and constraints, thereby efec tively guiding model behavior across varying domains.

This hybrid approach can mitigate overfitting to domain-specific artifacts by enforcing consistency with known anatomical or pathological rules. For example, Han et al. introduced a

nEvelop-O In particular, few

approaches simultaneously both symbolic knowledge integration and address domain generalization in a cohesive framework. This gap motivates our proposed

method, KG-DG, a knowledge-guided domain generalization framework that unifies structured clinical knowledge with deep learning models in a scalable manner.

KG-DG encodes domain-invariant biomarkers—such as exudates, hemorrhages, and vascular abnormalities—directly into the learning pipeline, guiding classification tasks while enhancing out-of-distribution (OOD) robustness. Unlike prior approaches, our framework generalizes across imaging modalities, as demonstrated by strong performance in both diabetic retinopathy classification and seizure-onset detection from MRI scans. By embedding clinical expertise at the architectural level, KG-DG bridges the gap between symbolic interpretability and neural robustness, advancing the development of domain-generalizable medical AI.

CEUR Workshop

ISSN1613-0073

2. Related Work

Vision transformers (ViTs) have revolutionized medical image analysis, particularly in ophthalmology, ofering a powerful alternative to traditional convolutional neural networks. Dosovitskiy et al. [ 3 ] established the foundation by demonstrating ViTs’ state-of-the-art performance on large-scale image recognition benchmarks, catalyzing their adoption for diabetic retinopathy (DR) detection. Subsequent work by Kothari et al. introduced TransDR [ 8 ], enhancing ViTs with lesion-aware attention mechanisms that improve lesion localization capabilities, though without explicitly addressing domain shift robustness challenges.

The challenge of domain generalization (DG) has become increasingly prominent in medical AI deployment. DomainBed benchmarks [ 5 ] have illuminated the fundamental dificulty of ensuring that deep models trained on source domains maintain reliable performance on unseen target domains. Gulrajani and Lopez-Paz’s findings revealed that even sophisticated methods often fail to outperform simple empirical risk minimization (ERM) under rigorous evaluation protocols, emphasizing the critical importance of careful DG methodology design.

The integration of structured clinical knowledge represents a promising direction for enhancing both interpretability and performance in medical imaging systems. GraphDR, developed by Khandelwal et al. [ 1 ], exemplifies this approach by leveraging diabetic retinopathy lesion ontologies within graph convolutional networks to guide feature learning processes. Similar knowledge-driven strategies have been successfully applied to chest X-ray and histopathology classification tasks, grounding models in established disease relationships. These developments suggest that embedding lesion ontologies into transformer self-attention mechanisms could provide an efective pathway for learning clinically meaningful and domain-invariant representations.

Contrastive learning methodologies have emerged as another approach for addressing domain generalization challenges by disentangling domain-specific artifacts from semantic features. Deep CORAL [ 9 ] pioneered domain alignment through feature covariance alignment in convolutional neural networks, while contemporary DG methods employ cross-domain contrastive losses to promote separation between invariant and domain-dependent representations. Chen et al. [ 10 ] successfully extended these principles to medical imaging, achieving improved generalization in histopathology classification under distributional shifts. Nevertheless, contrastive disentanglement approaches remain relatively unexplored in diabetic retinopathy classification tasks.

Our approach bears conceptual similarity to Concept Bottleneck Models [ 11 ] and their quantitative extensions, such as Concept Embedding Models [ 12 ]. While these models learn to predict labels through intermediate human-defined concepts, our KG-DG framework difers by explicitly combining clinical rules with learned neural embeddings via confidence fusion, rather than strictly bottlenecking predictions through concepts. Compared to GraphDR [ 1 ], which employs lesion ontologies in a Graph Convolutional Network (GCN), our work emphasizes hybrid neuro-symbolic fusion at the decision level, making it more adaptable to diverse backbones and datasets.

2.1. Domain Generalization

In many real-world applications, particularly in biomedical ifelds, it is unrealistic to expect access to new patients’ data before model deployment due to domain shifts between data from diferent patients [ 13 ]. To address this challenge, the concept of Domain Generalization (DG) was introduced [ 14 ]. DG aims to train models on data from one or more related but distinct source domains, enabling them to generalize efectively to unseen, out-of-distribution (OOD) target domains. Since its formal introduction by Blanchard et al. in 2011 [ 14 ], a wide range of techniques have been proposed to tackle the DG challenge [ 15 ]–[ 16 ].

These approaches include learning domain-invariant representations by aligning source domain distributions [ 17, 18 ], simulating domain shifts during training using meta-learning [ 19, 20 ], and generating synthetic data through domain augmentation [ 21, 22 ]. From an application perspective, DG has been explored in various areas such as computer vision (e.g., object recognition [ 23, 24 ], semantic segmentation [ 25 ], and person re-identification [ 15, 21 ]), speech recognition [ 26 ], natural language processing [ 20 ], medical imaging [ 27, 28 ], and reinforcement learning [ 15 ].

Unlike related paradigms like domain adaptation or transfer learning, DG uniquely addresses scenarios where no target domain data is accessible during training. The original motivation behind DG stemmed from a medical application known as automatic gating in flow cytometry [ 14 ]. This technique involves classifying cells in blood samples—such as distinguishing lymphocytes from non-lymphocytes—based on measured cellular properties. While such automation can significantly aid in diagnosis by replacing the labor-intensive and expert-dependent manual gating process, patient-specific distribution shifts hinder model generalization. Collecting new labeled data for every patient is not feasible, thereby underscoring the need for DG solutions.

In medical imaging, domain shift is especially prevalent due to variations across clinical sites and individual patients [ 28, 29 ]. Datasets like Multi-site Prostate MRI Segmentation [ 28 ] and Chest X-rays [ 30 ] reflect this reality, with diferences in imaging equipment and acquisition protocols introducing substantial distribution variability.

Domain alignment techniques have been applied across diverse DG tasks, including object recognition [ 18, 31 ], action recognition [ 17 ], face anti-spoofing [ 32, 33 ], and medical image analysis [ 34, 35 ]. Among the simplest and most efective strategies for mitigating domain shift in medical imaging are image transformations [ 36, 37, 38 ]. These transformations can emulate changes in color and geometry, often caused by device heterogeneity—such as diferent scanners in various medical centers. However, care must be taken in selecting these transformations. In some domains (e.g., digit recognition or optical character recognition), certain transformations like horizontal or vertical flips may alter the semantic label, leading to label shift. Therefore, transformation-based strategies must be chosen judiciously to preserve task relevance and integrity.

2.2. The Role of Knowledge-Guided Systems in Improving Model Generalization

Deep learning models, although powerful in capturing visual patterns, are notoriously sensitive to domain shifts—arising from demographic, device, or protocol of clinically validated rules, visual biomarkers, and demographic insights into conventional learning pipelines. This approach is designed to improve robustness, interpretability, and domain generalization, addressing critical limitations commonly encountered in medical deployments where data heterogeneity, distribution shifts, and limited supervision can degrade model performance.

3.1. Knowledge-Guided Augmentation of Deep Models

Traditional deep learning models typically learn a predictive mapping DL ∶ →

modalities (e.g., retinal images) and , where

denotes input represents target disease labels. This approach inherently lacks structured medical inductive biases, potentially limiting clinical applicability. To overcome this limitation, we propose a dual-branch architecture, integrating structured knowledge representation

We formalize into deep learning-based image analysis.

as a set of diagnostic rules { 1, 2, … , }, each reflecting expert-validated correlations between observable clinical features and disease states. These rules incorporate visual biomarkers such as (e.g., exudates, hemorrhages, vascular patterns) and demographic parameters (e.g., patient age, glycemic status). For practical implementation, we develop corresponding feature extractors = { 1, 2, … , }, instantiated via object detection models (YOLOv11), segmentation architectures, and logical rule functions.

Each extractor outputs a quantitative feature ∈ ℝ, aggregated into a structured vector:

∗ = { 1, 2, … , }.

This structured vector encodes clinical attributes such as presence, severity, and spatial distribution of significant retinal lesions, facilitating symbolic reasoning aligned closely with clinical diagnostic criteria.

A parallel knowledge-driven classifier KD ∶ ∗ → is trained alongside the deep learning model DL. The final prediction can then be determined through diferent fusion strategies. In the simplest case, a selective fusion rule is applied: final = { DL, if DL ≥ KD, KD, otherwise, where DL and KD denote the maximum confidence scores from the deep and symbolic classifiers, respectively. This strategy enhances robustness by leveraging symbolic reasoning when the deep model predictions exhibit uncertainty, particularly valuable in handling out-of-distribution scenarios. Beyond this, we experimented with three additional fusion techniques: 1.

Max Confidence Fusion : both the neural (ViT) and symbolic classifiers output calibrated probabilities via softmax normalization. The class with the globally highest confidence is selected, irrespective of source.

Class-wise Max Fusion: normalized per-class confidence scores are compared across models, and the prediction is made according to the higher class-specific confidence.

3. Weighted Fusion: empirically tuned weights ( , ) are applied to balance neural and symbolic predictions. Formally, final = arg max ( ∈

⋅ () + ⋅ () ) , where () and () are the softmax confidence scores assigned by the deep and symbolic classifiers, respectively, for class , and is the set of all DR severity classes.

Together, these strategies allow us to assess the trade-of between model confidence, robustness, and the influence of symbolic knowledge on final decision-making.

3.2. Diabetic Retinopathy Classification

We evaluated the proposed KG-DG framework on the task of diabetic retinopathy (DR) classification using retinal fundus images—well-suited for knowledge-guided learning due to the presence of clearly defined visual pathologies such as microaneurysms, hemorrhages, exudates, and neovascularization. Domain-specific diagnostic rules were curated from ophthalmological guidelines (see Table 1) and operationalized via automated feature extraction pipelines built using two open-source, modular tools: YOLOv11 and a retinal vessel segmentation model.

For lesion-level localization, we employed the YOLOv11 object detection model, a state-of-the-art one-stage detector known for its eficiency and precision in dense object environments. YOLOv11 extends the YOLOv5/YOLOv7 series with advanced improvements including CSPDarkNet-based backbones, decoupled heads, and dynamic label assignment (DLA), achieving superior mean average precision (mAP) with real-time inference capabilities [ 51 ]. We fine-tuned YOLOv11 to detect clinically relevant lesions such as hemorrhages, hard exudates, and cotton wool spots. Bounding boxes produced by the model were post-processed and validated using Intersection over Union (IoU) scores against expert-labeled fundus images, ensuring medical fidelity.

In parallel, we integrated a vein segmentation module to extract morphological vessel features. This module, adapted from the open-source DRIVE and CHASE-DB1 datasets, uses a modified U-Net architecture with spatial attention layers to segment retinal vessels with high sensitivity. From these segmented maps, we extracted quantitative features including vessel tortuosity, branching angles, and average caliber—biomarkers strongly associated with DR progression.

This structured knowledge vector was passed into a parallel symbolic classifier trained independently from the deep model, enabling our system to rely on rule-driven inference when the deep model exhibits uncertainty. Various machine learning models, including Logistic Regression, Random Forest, Support Vector Machines (SVM), Gradient Boosting, and K-Nearest Neighbors, were evaluated for knowledge-based classification on the feature set ( ). Among these, Gradient Boosting demonstrated the best classification performance. Both YOLOv11 and the vein segmentation module functioned solely as independent auxiliary components to extract biomarkers from images, facilitating symbolic reasoning. The biomarkers were annotated by expert medical annotators on approximately 500 images, with random samples validated by respective domain experts. These annotations were subsequently used to fine-tune the YOLOv11 and vein segmentation modules, which then act as knowledge extractors within the pipeline. This accurate integration of clinical knowledge enhances model robustness, promotes domain invariance, and provides a solid foundation for understanding domain shifts through distributional alignment.

The classification results from the knowledge-based machine learning model and the ViT model are integrated using three main methods as shown in Figure 2: (1) selecting the maximum confidence score across all predictions, (2) computing the class-wise maximum confidence, and (3) applying a weighted confidence scheme. The outcomes of these three integration strategies are evaluated to assess the overall performance of the final framework.

3.3. Backbone Architectures and Training Strategy

For the image-based analysis, we employed advanced Vision Transformer (ViT) architectures. The DeiT-small architecture, comprising approximately 22M parameters, was used without distillation [ 52 ]. The CvT-13 model, with 20M parameters, integrates convolutional layers with transformer blocks to enhance spatial feature learning [ 53 ]. Additionally, we utilized T2T-ViT-14, featuring progressive tokenization and encompassing 21.5M parameters [ 54 ].

All ViT models were initialized with ImageNet-pretrained weights, and during training, encoder parameters remained ifxed to prevent overfitting. Only the classification heads underwent optimization using class-weighted cross-entropy loss. Training adhered to DomainBed protocols, employing resizing to 224×224, random cropping, horizontal flipping, color jitter, and grayscale augmentation. AdamW optimizer was utilized with a learning rate of 5 × 10−5, and early stopping was implemented after 10 epochs without performance improvement.

3.4. Evaluation Protocol and Results

Initially, KG-DG is evaluated on the Aptos Dataset (60% training, 20% cross-validation, and 20% testing), achieving superior performance, exceeding a ViT benchmark by 6% (84.65% vs. 78.40%) and significantly outperforming existing baselines. We conducted extensive evaluations in both multi-source and single-source domain generalization settings using publicly available DR datasets: APTOS [ 2 ], EyePACS [ 55 ], MESSIDOR, and MESSIDOR2 [ 56 ]. Each dataset constituted a distinct domain. In multi-source experiments, we trained models on three datasets while testing on the fourth. In single-source setups, we trained on a single dataset and evaluated on the remaining domains.

Our knowledge-guided framework consistently demonstrated superior performance, achieving a +2.1% average accuracy improvement in multi-source domain generalization and a notable +4.2% increase in single-source domain generalization scenarios, particularly impactful on imbalanced data distributions (see detailed results in Table 6).

The structured knowledge-driven classifier notably improved generalization by encapsulating domain-invariant medical reasoning, whereas the deep learning branch efectively modeled intricate visual patterns, validating the efectiveness of integrating clinical expertise within deep learning frameworks.

Note. Unless otherwise stated, in all tables the best-performing value within each column is highlighted in bold.

4. Experiments 4.1. Single Domain Generalization Results

In the SDG setting, models were trained on one dataset and evaluated on the remaining three to simulate clinical deployment in unseen environments. Our method was evaluated against DRGen, ERM-ViT, SD-ViT, and SPSD-ViT using APTOS [ 2 ], EyePACS [ 55 ], Messidor-1 and Messidor-2. [ 56 ] as source domains respectively. As shown in Tables 2-5, our method consistently outperformed existing baselines in three out of four training configurations.

For instance, when trained on APTOS, the Non-Weighted DL+KL fusion achieved the highest average accuracy (59.9%), outperforming all transformer baselines and showing superior generalization to diverse domains like MESSIDOR2. Similarly, when trained on MESSIDOR2, the Weighted DL+KL fusion delivered a performance of 65.5%, highlighting robustness against shifts in both demographic and imaging characteristics. These results validate that symbolic knowledge integration enables efective generalization from a single domain, crucial for low-resource clinical settings.

4.2. Multi Domain Generalization Results

In the MDG setting, we trained our model on three datasets and evaluated on the unseen fourth, as per the DomainBed protocol. Results in Table 6 show that our KG-DG model using Clip-ViT (ViT+KL) and symbolic classifiers significantly improved generalization compared to popular convolutional and transformer-based DG methods, including ERM, IRM, Fishr, and SD-ViT. Notably, the knowledge-guided symbolic model (KL only) achieved the best average accuracy (63.67%), while SPSD-ViT and ERM-ViT with strong augmentations reached 65.5%. Despite having fewer parameters, our model’s performance indicates efective utilization of symbolic lesion features and their generalization power across domain shifts. In particular, the KL model exceeded both standard ViT and ResNet baselines across most target domains, demonstrating the critical role of encoded clinical knowledge in cross-domain settings.

5. Evaluation 5.1. Benchmark Setup

To rigorously evaluate the generalization capability of the proposed KG-DG framework, we conducted experiments on four publicly available diabetic retinopathy (DR) fundus image datasets: APTOS [ 2 ], EyePACS, Messidor-1, and Messidor-2. Each dataset represents a distinct clinical domain, difering significantly in patient demographics, imaging devices, and image acquisition protocols. Following the DomainBed benchmark protocol established by Gulrajani et al. [ 5 ], we implemented two experimental scenarios: Single-Domain Generalization (SDG), wherein the model is trained on a single domain and evaluated on the remaining three domains, and Multi-Domain Generalization (MDG), where training is performed on three domains with evaluation conducted on a separate unseen domain.

For preprocessing, all images were uniformly resized to 224 × 224 pixels and subjected to data augmentations including center cropping, horizontal flipping, color We evaluated our KG-DG framework against several competitive baseline methods representative of both convolutional neural network (CNN)-based and transformer-based domain generalization strategies. For convolutional architectures, we included Empirical Risk Minimization (ERM) with ResNet-50 [ 67 ], a strong baseline under fair evaluation standards [ 5 ]. Additionally, we compared against Invariant Risk Minimization (IRM) [ 40 ], Group Distributionally Robust Optimization (GroupDRO) [ 60 ], Fishr [ 41 ], and Adaptive Risk Minimization (ARM) [ 58 ], each employing distinct strategies to enforce robustness and domain invariance.

Transformer-based models considered included ERM-ViT with DeiT-Small [ 68 ], CvT-13 [ 53 ], and T2T-ViT [ 69 ]. We further included state-of-the-art transformer-based domain generalization models, SD-ViT [ 65 ] and SPSD-ViT [ 66 ], which utilize semantic alignment and pseudo-labeling to enhance robustness. Lastly, we compared against DRGen [ 70 ], a DR-specific DG method leveraging adversarial and contrastive learning.

5.3. Ablation Study

Ablation Study I: APTOS-Trained Domain Generalization To understand the individual and combined contributions of neural and symbolic components in our framework, we conducted a focused ablation study Model using the APTOS dataset as the source domain. Table 8 reports the accuracy performance on three unseen target domains—EyePACS, Messidor-1 and Messidor-2 when models were trained solely on APTOS.

The neural-only baseline using Vision Transformer (ViT) achieves a modest average accuracy of 53.9%, indicating limited generalization under domain shift. The symbolic-only model, based on knowledge-driven lesion features (KL), improves the average accuracy to 56.6%, highlighting the value of structured clinical priors. The best performance is observed when combining both neural and symbolic reasoning. In particular, the non-weighted fusion approach yields the highest average accuracy of 59.9%, outperforming both standalone models. This result demonstrates the strength of the proposed neuro-symbolic integration in improving robustness and domain generalization in diabetic retinopathy classification.

Ablation Study II: Performance of Symbolic Lesion Biomarkers with and without Retinal Vein Features. This experiment evaluates the discriminative capacity of structured symbolic features extracted from retinal images, focusing on four clinically validated lesion types: exudates, hard hemorrhages, soft hemorrhages, and cotton wool spots. The first group of results in Table 7 includes only lesion-based features, while the second incorporates additional vascular information derived from retinal vein morphology—such as tortuosity, caliber, and branching angles.

Across all classifiers, models trained solely on lesion features consistently outperform those that include both lesions and vein information. Gradient Boosting achieves the highest accuracy (84.65%) and macro F1-score (84.12%), confirming the strong discriminative value of lesion-level biomarkers. In contrast, the addition of vein-based features leads to performance degradation, indicating that vessel morphology introduces domain-sensitive variability that hampers generalization.

6. Conclusions

This paper introduces KG-DG, an improved knowledge-guided domain generalization framework specifically tailored for medical imaging applications, as exemplified by diabetic retinopathy classification. KG-DG integrates symbolic clinical reasoning and deep visual representations through a confidence-weighted fusion approach, significantly enhancing robustness and interpretability. Comprehensive experimental results on four diverse DR datasets demonstrated that KG-DG consistently achieved superior performance compared to strong baselines of domain generalization methods, achieving notable improvements in both single-source and multi-source generalization settings, with gains of up to 5.2% accuracy in cross-domain accuracy.

Our findings underscore the importance of embedding structured clinical knowledge within deep learning models, thereby significantly improving generalization and trustworthiness in clinical settings. Future directions include adapting the KG-DG framework to additional medical imaging modalities, such as optical coherence tomography and histopathology, and further integrating dynamic symbolic reasoning via neuro-symbolic architectures, enhancing real-time decision support capabilities in medical AI deployments. Insights: Our observations indicate that the integration of symbolic clinical knowledge into traditional architectures—whether Vision Transformers (ViTs) or domain-specific models such as DeepXSOZ [ 71 ] consistently leads to significant improvements in classification accuracy. Furthermore, this knowledge imputation enhances both domain generalization and the explainability of model behavior, addressing critical challenges in clinical deployment.

Acknowledgments

This work was partly funded by NSF (FDT-Biotech 2436801), and the Helmsley Charitable Trust (2-SRA-2017-503-M-B). Declaration on Generative AI: During the preparation of this work, the author(s) used ChatGPT solely for grammar and spelling checking.

[1]

Khandelwal ,

Siyal ,

Xu ,

Bhaskaran , Graphdr: Lesion ontology guided graph convolution for diabetic retinopathy classification, in: Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2023 .

[2]

Kauppi , et al., The aptos 2019 blindness detection dataset , Kaggle , 2019 . https://www.kaggle.com/c/ aptos2019-blindness-detection.

[3]

Dosovitskiy , et al., An image is worth 16x16 words: Transformers for image recognition at scale , in: International Conference on Learning Representations (ICLR) , 2021 .

[4]

Zhao ,

Yu , G. Chen, Ddr: A diverse dataset for diabetic retinopathy classification, in: Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2022 , pp. 234 - 245 .

[5]

Gulrajani , D. Lopez-Paz, In search of lost domain generalization , in: International Conference on Learning Representations (ICLR) , 2021 .

[6]

Han , J . Wang ,

Luo , D. Liu, Neuro-symbolic generative model for medical report generation with prior knowledge , IEEE Transactions on Medical Imaging 40 ( 2021 ) 3436 - 3447 .

[7]

Ozkan ,

Boix , On the benefits of multi-domain training for medical image analysis , in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , 2024 , pp. 12456 - 12466 .

[8]

Wang ,

Zhang ,

Li , Difusion-based domain augmentation for robust medical image analysis , in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , 2024 , pp. 12345 - 12355 .

[9]

Sun ,

Feng ,

Saenko , Deep coral: Correlation alignment for deep domain adaptation , in: European Conference on Computer Vision (ECCV) , 2016 , pp. 443 - 450 .

[10]

Chen , K. Han, J . Wang, Fortifying medical image domain generalization via contrastive feature disentanglement , IEEE Transactions on Medical Imaging 43 ( 2024 ) 512 - 525 .

[11]

P. W.

Koh ,

Nguyen ,

Y. S.

Tang ,

Mussmann ,

Pierson ,

Kim ,

Liang , Concept bottleneck models , in: Proceedings of the 37th International Conference on Machine Learning (ICML) , 2020 .

[12]

Espinosa Zarlenga , et al., Concept embedding models: Toward interpretable and accurate concept-based learning , in: Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS) , 2022 .

[13]

Muandet ,

Balduzzi ,

Schölkopf , Domain generalization via invariant feature representation , in: Proceedings of the 30th International Conference on Machine Learning (ICML) , 2013 , pp. I-10-I-18.

[14]

Blanchard ,

Lee ,

Scott , Generalizing from several related classification tasks to a new unlabeled sample , in: Proceedings of the 24th International Conference on Neural Information Processing Systems (NeurIPS) , 2011 , pp. 2178 - 2186 .

[15]

Zhou ,

Yang ,

Qiao , T. Xiang, Domain generalization with mixstyle , in: International Conference on Learning Representations (ICLR) , 2021 . ArXiv: 2012 .03641.

[16]

Cha , et al., Swad: Domain generalization by seeking lfat minima , in: Proceedings of the 35th International Conference on Neural Information Processing Systems (NeurIPS) , 2021 , pp. 22405 - 22418 .

[17]

Li ,

Pan ,

Wang ,

A. C.

Kot , Domain generalization with adversarial feature learning , in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , 2018 , pp. 5400 - 5409 .

[18]

Li , et al., Deep domain generalization via conditional invariant adversarial networks , in: Proceedings of the European Conference on Computer Vision (ECCV) , 2018 , pp. 647 - 663 .

[19]

Li ,

Yang ,

Song ,

T. M.

Hospedales , Learning to generalize: Meta-learning for domain generalization , in: Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI) , volume 32 , 2018 , pp. 427 - 434 .

[20]

Balaji ,

Sankaranarayanan ,

Chellappa , Metareg: Towards domain generalization using meta-regularization , in: Proceedings of the 32nd International Conference on Neural Information Processing Systems (NeurIPS) , 2018 , pp. 1006 - 1016 .

[21]

Zhou ,

Yang ,

T. M.

Hospedales ,

Xiang , Learning to generate novel domains for domain generalization , in: Proceedings of the European Conference on Computer Vision (ECCV) , 2020 , pp. 561 - 578 .

[22]

Zhou ,

Yang ,

T. M.

Hospedales , T. Xiang, Deep domain-adversarial image generation for domain generalisation , in: Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI) , 2020 , pp. 13025 - 13032 .

[23]

Li ,

Yang ,

Song ,

T. M.

Hospedales , Deeper, broader and artier domain generalization , in: Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017 , pp. 5543 - 5551 .

[24]

Li ,

Yang ,

Zhou ,

T. M.

Hospedales , Feature-critic networks for heterogeneous domain generalization , in: Proceedings of the 36th International Conference on Machine Learning (ICML) , 2019 , pp. 3915 - 3924 .

[25]

Volpi ,

Murino , Addressing model vulnerability to distributional shifts over image transformation sets , in: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019 , pp. 7979 - 7988 .

[26]

Shankar ,

Piratla ,

Chakrabarti ,

Chaudhuri ,

Jyothi ,

Sarawagi , Generalizing across domains via cross-gradient training , in: International Conference on Learning Representations (ICLR) , 2018 .

[27]

Liu ,

Dou ,

Yu , P.-A. Heng, Ms-net: Multi-site network for improving prostate segmentation with heterogeneous mri data , IEEE Transactions on Medical Imaging 39 ( 2020 ) 2713 - 2724 .

[28]

Liu ,

Dou , P.-A. Heng, Shape-aware meta-learning for generalizing prostate mri segmentation to unseen domains, in: Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2020 , pp. 475 - 485 .

[29]

Dou ,

D. C.

Castro ,

Kamnitsas ,

Glocker , Domain generalization via model-agnostic learning of semantic features , in: Advances in Neural Information Processing Systems (NeurIPS) , volume 32 , 2019 , pp. 579 - 589 . ArXiv: 1901 .10184.

[30]

Mahajan ,

Tople ,

Sharma , Domain generalization using causal matching , in: International Conference on Machine Learning (ICML) , 2021 , pp. 7313 - 7324 .

[31]

Ghifary , W. B. Kleijn , M. Zhang , D. Balduzzi, Domain generalization for object recognition with multi-task autoencoders , in: Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2015 , pp. 2551 - 2559 .

[32]

Shao ,

Lan ,

Li , P. C. Yuen, Multi-adversarial discriminative deep domain generalization for face presentation attack detection , in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , 2019 , pp. 10015 - 10023 .

[33]

Jia ,

Zhang ,

Shan ,

Chen , Single-side domain generalization for face anti-spoofing , in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , 2020 , pp. 8481 - 8490 .

[34]

Li ,

Wang ,

Wan ,

Wang ,

T.-Q.

Li ,

A. C.

Kot , Domain generalization for medical imaging classification with linear-dependency regularization , in: Advances in Neural Information Processing Systems (NeurIPS) , volume 33 , 2020 , pp. 3118 - 3129 .

[35]

Aslani ,

Murino ,

Dayan ,

Tam ,

Sona , G. Hamarneh, Scanner invariant multiple sclerosis lesion segmentation from mri , in: Proceedings of the IEEE 17th International Symposium on Biomedical Imaging (ISBI) , 2020 , pp. 781 - 785 .

[36]

Otálora ,

Atzori ,

Andrearczyk ,

Khan ,

Müller , Staining invariant features for improving generalization of deep convolutional neural networks in computational pathology , Frontiers in Bioengineering and Biotechnology 7 ( 2019 ) 198 .

[37]

Chen , et al., Improving the generalizability of convolutional neural network-based segmentation on cmr images , Frontiers in Cardiovascular Medicine 7 ( 2020 ) 105 .

[38]

Zhang , et al., Generalizing deep learning for medical image segmentation to unseen domains via deep stacked transformation , IEEE Transactions on Medical Imaging 39 ( 2020 ) 2531 - 2540 .

[39]

Kamboj ,

Banerjee ,

S. K. S.

Gupta , Expert knowledge driven human-ai collaboration for medical imaging: A study on epileptic seizure onset zone identification , IEEE Journal of Biomedical and Health Informatics ( 2023 ). In press.

[40]

Arjovsky ,

Bottou ,

Gulrajani , D. Lopez-Paz, Invariant risk minimization , arXiv preprint arXiv: 1907 . 02893 ( 2019 ).

[41]

Rame ,

Dancette ,

Cord , Fishr: Invariant gradient variances for out-of-distribution generalization , in: International Conference on Machine Learning (ICML) , 2022 .

[42]

R. N.

Frank , Diabetic retinopathy, New England Journal of Medicine 350 ( 2004 ) 48 - 58 .

[43]

C. P.

Wilkinson ,

F. L.

Ferris ,

R. E.

Klein ,

P. P.

Lee , C.-D. Agardh , M.

Davis , H.-P.

Hammes , Proposed international clinical diabetic retinopathy and diabetic macular edema disease severity scales , Ophthalmology 110 ( 2003 ) 1677 - 1682 .

[44]

Singh ,

Ramasamy ,

Abraham ,

Gupta ,

Gupta , Diabetic retinopathy: An update , Indian Journal of Ophthalmology 56 ( 2008 ) 179 - 188 . doi: 10 . 4103/ 0301 - 4738 .41167, https://www.ncbi.nlm.nih. gov/pmc/articles/PMC2636123/.

[45] A. A. of Ophthalmology, Diabetic retinopathy preferred practice pattern , 2023 , https://www.aao. org/ preferred-practice-pattern/diabetic-retinopathy- ppp , 2023 .

[46]

Publishing , Diabetic retinopathy, https://www.ncbi. nlm.nih.gov/books/NBK560805/, 2024 .

[47]

E. R.

Group , Grading diabetic retinopathy and estimating its progression , Ophthalmology 98 ( 1991 ) 786 - 806 .

[48]

U. V.

Shukla ,

Tripathy , Diabetic retinopathy, 2025 . Updated August 25 , 2023 , https://www.ncbi.nlm.nih. gov/books/NBK560805/.

[49]

Yanof ,

J. S.

Duker , Ophthalmology, 5th ed., Elsevier , 2019 .

[50]

A. A. for Pediatric

Ophthalmology , Strabismus, Proliferative diabetic retinopathy, https://aapos.org/ glossary/proliferative-diabetic-retinopathy, 2023 .

[51] C.-Y. Wang , A. Bochkovskiy , H. -Y. M. Liao , Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors , arXiv preprint arXiv:2207.02696 ( 2022 ).

[52]

Touvron ,

Cord ,

Douze ,

Massa ,

Sablayrolles ,

Jégou , Training data-eficient image transformers & distillation through attention , in: Proceedings of the 38th International Conference on Machine Learning (ICML) , 2021 .

[53]

Wu ,

Xiao ,

Codella ,

Liu ,

Dai ,

Yuan , L. Zhang, Cvt: Introducing convolutions to vision transformers, arXiv preprint arXiv:2103.15808 ( 2021 ).

[54]

Yuan ,

Chen ,

Wang ,

Yu ,

Shi ,

Jiang ,

F. E. H.

Tay ,

Feng ,

Yan , Tokens-to-token vit: Training vision transformers from scratch on imagenet , in: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021 .

[55] Kaggle , Diabetic retinopathy detection, https://www. kaggle.com/c/diabeticretinopathy-detection, 2015 .

[56]

Decencière , et al., Feedback on a publicly distributed image database: The messidor database , Image Analysis and Stereology , 2014 .

[57]

Vapnik , The Nature of Statistical Learning Theory, Springer Science & Business Media , 1999 .

[58]

Zhang ,

Marklund ,

Dhawan ,

Gupta ,

Levine ,

Finn , Adaptive risk minimization: Learning to adapt to domain shift , in: Advances in Neural Information Processing Systems (NeurIPS) , 2021 .

[59]

Shi ,

Seely ,

P. H. S.

Torr ,

Siddharth ,

Hannun ,

Usunier , G. Synnaeve, Gradient matching for domain generalization , arXiv preprint arXiv:2104.09937 ( 2021 ).

[60]

Sagawa ,

P. W.

Koh ,

Hashimoto ,

Liang , Distributionally robust neural networks for group shifts , in: International Conference on Learning Representations (ICLR) , 2020 .

[61]

Yan ,

Song ,

Li ,

Zou , L. Ren, Improve unsupervised domain adaptation with mixup training, in: arXiv preprint , volume arXiv: 2001 .00677, 2020 .

[62]

Sun ,

Saenko , Deep coral: Correlation alignment for deep domain adaptation , in: European Conference on Computer Vision (ECCV) , Springer, 2016 , pp. 443 - 450 .

[63]

Ganin , E. Ustinova,

Ajakan ,

Germain ,

Larochelle ,

Laviolette ,

Marchand ,

Lempitsky , Domain-adversarial training of neural networks , in: Journal of Machine Learning Research , volume 17 , 2016 , pp. 2096 - 2030 .

[64]

Li ,

Tian ,

Gong , Y. Liu, T. Liu,

Zhang , D. Tao, Deep domain generalization via conditional invariant adversarial networks , in: European Conference on Computer Vision (ECCV) , 2018 .

[65]

Sultana ,

Naseer ,

Khan ,

Khan , Self-distilled vision transformer for domain generalization , in: Asian Conference on Computer Vision (ACCV) , 2022 .

[66]

Jayanga , G. Kuruppu,

M. H.

Khan , Generalizing to unseen domains in diabetic retinopathy classification , arXiv preprint arXiv:2311.01673 ( 2023 ).

[67]

He ,

Zhang , S. Ren,

Sun , Deep residual learning for image recognition , in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2016 , pp. 770 - 778 .

[68]

Touvron ,

Cord ,

Douze ,

Massa ,

Sablayrolles ,

Jégou , Training data-eficient image transformers & distillation through attention , in: International Conference on Machine Learning (ICML) , 2021 .

[69]

Yuan ,

Chen ,

Wang ,

Yu ,

Shi ,

Jiang ,

F. E. H.

Tay ,

Feng ,

Yan , Tokens-to-token vit: Training vision transformers from scratch on imagenet , in: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021 .

[70]

Atwany ,

Yaqub , Drgen: Domain generalization in diabetic retinopathy classification, in: Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2022 , pp. 635 - 644 .

[71] D. M. Shama , J.

Jing , A.

Venkataraman , Deepsoz: A robust deep model for joint temporal and spatial seizure onset localization from multichannel eeg data, in: Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2023 , pp. 183 - 193 .