<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Multi-contrast Medical Image Segmentation⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tianyi Ren</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Juampablo Heras Rivera</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hitender Oswal</string-name>
          <email>hitender@uw.edu</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yutong Pan</string-name>
          <email>ypan4@cs.washington.edu</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Agamdeep Chopra</string-name>
          <email>achopra4@uw.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jacob Ruzevick</string-name>
          <email>ruzevick@uw.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mehmet Kurt</string-name>
          <email>mkurtr@uw.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Mechanical Engineering, University of Washington</institution>
          ,
          <addr-line>3900 E Stevens Way NE, Seattle, WA 98195</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Neurological Surgery, University of Washington</institution>
          ,
          <addr-line>1959 NE Pacific Street, Seattle, WA 98195</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Paul G. Allen School of Computer Science, University of Washington</institution>
          ,
          <addr-line>185 E Stevens Way NE Seattle, WA 98195</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Deep learning has been successfully applied to medical image segmentation, enabling accurate identification of regions of interest such as organs and lesions. This approach works efectively across diverse datasets, including those with single-image contrast, multi-contrast, and multimodal imaging data. To improve human understanding of these black-box models, there is a growing need for Explainable AI (XAI) techniques for model transparency and accountability. Previous research has primarily focused on post hoc pixel-level explanations, using methods gradient-based and perturbation-based approaches. These methods rely on gradients or perturbations to explain model predictions. However, these pixel-level explanations often struggle with the complexity inherent in multicontrast magnetic resonance imaging (MRI) segmentation tasks, and the sparsely distributed explanations have limited clinical relevance. In this study, we propose using contrast-level Shapley values to explain state-of-the-art models trained on standard metrics used in brain tumor segmentation. Our results demonstrate that Shapley analysis provides valuable insights into diferent models' behavior used for tumor segmentation. We demonstrated a bias for U-Net towards over-weighing T1-contrast and FLAIR, while Swin-UNETR provided a cross-contrast understanding with balanced Shapley distribution.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Image Segmentation</kwd>
        <kwd>XAI</kwd>
        <kwd>Shapley Value</kwd>
        <kwd>MRI</kwd>
        <kwd>Brain Tumor</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Segmentation is a fundamental task in medical imaging, involving identifying regions of interest (ROIs)
such as organs, lesions, and tissues. By precisely outlining anatomical and pathological structures,
segmentation plays a pivotal role in computer-aided diagnosis, ultimately improving diagnostic precision
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Typically, segmentations task are carried out using multi-contrast MRI or multi-modal imaging
datasets, due to the necessity of identifying unique microstructural features, such as in gliomas [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ],
that are only apparent in some MRI contrasts, but not others. Many deep learning models, including
those used for segmentation, are considered black boxes, ofering limited interpretability, resulting
in a lack of transparency and accountability [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Various Explainable AI (XAI) techniques have been
developed in the literature [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] to tackle this problem, primarily categorized into gradient-based and
perturbation-based methods.
      </p>
      <p>
        Gradient-based techniques, such as saliency maps [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and Grad-CAM [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], visualize deep learning
predictions by identifying influential regions in input data, while perturbation-based approaches (Shapley
values [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and LIME [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]) observe model behavior by systematically perturbing inputs and measuring
impact. These methods have been applied successfully to explain the classification problem, however,
explaining segmentation still presents significant challenges. There is ongoing debate about whether
explanations are necessary for segmentation, as the masks themselves may serve as explanations.
Furthermore, there remains uncertainty regarding which components should be explained—when using
(a) Brain tumor segmentation from multi contrast MRI
t1c
t1n
t2
t2f
      </p>
      <p>GT
(b) 1C)linical practice:</p>
      <p>Which image contrast or modality conveys the most information?
2) Which image feature does the model look at?
(1) Cross contrast understanding (Proposed Methods)
t1c
t1n
t2f
t2w
(2) Pixel-level understanding (ex. GradCam)</p>
      <p>
        t1c t1n t2f
Clinical benefits:
1) Which method is more intuitive and comparative?
2) Which result conveys more information?
t2w
gradient-based approaches for models like U-Net, no consensus exists on which layer to target, and
in clinical application, which MRI contrasts to explain. Moreover, pixel-level explanations, typically
represented as discretized heatmap maps, require further interpretation for grouping analysis [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>
        Since in clinical practice radiologists detect lesions by analyzing diferences between diferent MRI
contrasts [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], an explainability framework that reveals deep learning model behavior with regards to how
they weigh diferent MRI contrasts in the segmentation process would be immediately clinically relevant.
Therefore, the main objective of this paper is to establish a framework for explaining the contributions
of diferent MRI contrasts in the segmentation process with an application in brain tumor segmentation.
This method delivers intuitive quantitative model explanations and enables efective comparisons at
multiple levels: between contrasts within a subject (see Figure 4), and between model architectures for
comprehensive model behavior interpretation (see Section 3). We perform systematic experiments to
explain how the state-of-the-art models such as U-Net and Transformer (Swin-UNETR) weigh diferent
MRI contrasts with respect to diferent evaluation metrics such as Dice and HD95. We conduct statistical
analyses to provide an in-depth understanding of how and why diferent model architectures weigh
MRI contrasts diferently, even when they achieve similar segmentation performance. In summary,
our paper, to the best of our knowledge, is the first study to propose a clinically-relevant explanation
framework for brain tumor segmentation in multi-contrast MRI.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Methods</title>
      <sec id="sec-2-1">
        <title>2.1. Dataset and Learning Objectives</title>
        <p>
          The training dataset is sourced from the Brain Tumor Segmentation (BraTS) Challenge 2024 GoAT
challenge [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], consisting of 1,351 subjects. For each subject, four MRI contrasts were given: Native
(1), Post-contrast T1-weighted (1), T2-weighted (2), and T2 Fluid Attenuated Inversion Recovery
(2 ). The ground truth annotations consist of three disjoint classes: Enhancing tumor (ET), Peritumoral
edematous tissue (ED), and Necrotic tumor core (NCR). The detailed preprocessing and training pipeline
can be found in our previous research [
          <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Model Architectures and Evaluating Metric</title>
        <p>
          Several state-of-the-art model architectures are tested in this study, including U-Net [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], Seg-Resnet
[
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], UNETR [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], and Swin-UNETR [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. To evaluate the segmentation quality, we used common
metrics, including the Dice coeficient and the 95th percentile Hausdorf distance (HD95).
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Contrast Level Shapley Value</title>
        <p>Given a training dataset comprised of the pairs {(, 0)}=1, where  ∈ R4× ×  ×  represents the
four 3D-MRI contrast as a multi-channel input, 0 ∈ R3× ×  ×  represents the associated one-hot
encoded segmentation mask, with 3 tumor labels: ED, NCR, and ET as described in Section 2.1. The
deep learning models () were trained to predict the tumor labels ^0 given the input :
^0 = ().</p>
        <p>
          Derived from the Shapley value [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. The Contrast level Shapley value (M ) was then evaluated
with respect to each specific metric (M) by:
(M ) =
        </p>
        <p>∑︁
⊆ ∖{}
||!(| | − | | − 1)!
| |!
(M ( ∪ {}) −</p>
        <p>M ())
where  is the set of all of MRI contrasts; | | is the total number of contrasts;  is a subset of MRI
contrasts excluding certain contrast  ( ⊆  ∖ {}) and || is the number of contrasts in ; M () is
the target metric evaluated on the subset .</p>
        <p>The contrast-level Shapley values are examined to assess whether observed diferences(group means
and variances) across folds or between models are statistically significant. Test for equal variance:
Levene’s test is applied to assess homogeneity of variance even when the normality assumption cannot
be guaranteed. Test for equal mean: If the normality assumption cannot be guaranteed, the
KruskalWallis test is used instead of ANOVA, and Dunn’s test is applied for post-hoc analysis instead of Tukey’s
test. Confidence interval of the diference : If a significant diference in means is observed, we
further generate the confidence interval of the mean diference between groups when the normality
assumption is not violated.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experiments and Results</title>
      <p>puted using four model architectures across five data folds.</p>
      <p>We define the matrix of
contrastlevel Shapley values for each combination of metric 
{U-Net, SegResNet, UNETR, Swin-UNETR}, and fold  = 1, . . . , 5 as:
∈
{Dice, HD95}, model 
∈
⎛1,,1( ) 1,,2( ) · · ·
subject-wise vector S</p>
      <p>, ( ) ( = 1, . . . ,  ) are defined as follows:
and metric  . We use  to denote the total number of subjects in fold  .
where ,, ( ) represents the Shapley value for the -th subject in fold  , given contrast , model ,
For a given combination (, ,  ), the contrast-wise vector C, ( ) ( ∈ {t1n, t1c, t2w, t2f}) and

C, ( ) = Φ,·, ( ) = ︁( ,1, ( ), ,2, ( ), · · · , ,, ( ) , C, ( ) ∈ R
︁)

S, ( ) = Φ·,, ( ) = ︁( 1,, ( ), 1,, ( ), 2, ( ), 2, ( )
, ,

︁) 
, S, ( ) ∈ R
4</p>
      <p>In this study, we utilized four NVIDIA A40 GPUs to train our deep learning model and calculate the
Shapley value. The evaluation time for each fold and model is approximately 1–2 minutes per subject.</p>
      <sec id="sec-3-1">
        <title>3.1. Shapley-based prediction insights: a clustering analysis</title>
        <p>To analyze how segmentation performance overlaps with model weighting of MRI contrasts via
contrastlevel Shapley values, we applied k-means clustering. For each model-metric pair (, ), clustering
was performed on the S
, ( ) across five folds, i.e.,</p>
        <p>5
∪=1 ∪=1 {S
, ( )}.</p>
        <p>We then use UMAP to visualize the clusters of Shapley value embeddings. Figure 2 illustrates
an example with a significant pattern. For U-Net and Swin-UNETR, Shapley embedding clusters
diferentiate subjects with higher Dice scores from those with lower Dice scores.
(3)
(4)
(a) UNETR</p>
        <p>(b) SEGRES</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Shapley-based model prediction consistency: a comparative analysis</title>
        <sec id="sec-3-2-1">
          <title>3.2.1. Does each model learn consistent explanations?</title>
          <p>To assess the consistency of explanations across folds for each model, we analyzed the distribution of
C, ( ). The group standard deviation  and mean  are key factors for determining distribution
similarity, and statistical tests were applied to these metrics:
0( |, , ) :  (C,1( )) =  (C,2( )) =  (C,3( )) =  (C,4( )) =  (C,5( )), (5)
0( |, , ) :  (C,1( )) =  (C,2( )) =  (C,3( )) =  (C,4( )) =  (C,5( )).
If significant diferences in mean or standard deviation are found, we conclude that inconsistent
explanations are present across folds for a given pair of (, , ).</p>
          <p>Since the normality assumption for the Shapley value distribution C, ( ) could not be guaranteed

for some contrasts , as indicated by the normality tests and non-zero skewness (Figure 3), Levene’s
test, Kruskal-Wallis, and Dunn’s post-hoc tests were applied.</p>
          <p>For all combinations of (, , ), we get  &lt; 0.01 in all 32 Levene’s tests, rejecting 0( |, , ) and
indicating unequal variances across the five folds. Similarly, all 32 Kruskal-Wallis tests yield  &lt; 0.01,
rejecting 0( |, , ) and suggesting unequal means. These results invalidate the assumption that
“Model  learns consistent explanations across all five folds using contrast  for metric  evaluation,"
indicating significant diferences in variance and means for at least one fold pair of each (, , )
combination.</p>
          <p>Post-hoc tests are conducted to evaluate which pairs ( , ′ ) show consistency explanation with the
following null hypothesis:
0( |, , , ( , ′ )) :  (C ,′ ( )),  , ′ ∈ {1, 2, · · · , 5};  ̸= ′ .
, ( )) =  (C
(6)
Dunn’s post-hoc tests reveal no significant diferences in the 1 explanation between fold pairs 1 &amp; 5, 2 &amp;
3, 2 &amp; 4, and 4 &amp; 5 for Swin-UNETR, while significant diferences exist in all other tests (Table 3). For
example, in Table 3,  = 0.038 in the 1st column, the null hypothesis  (CS1win-UNETR,1()) =  (CS1win-UNETR,5())
is not rejected, indicating “Swin-UNETR learns consistent 1 contrast-level explanations between the
1st and 5th folds."
(a) U-net
(b) Swin-UNETR</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>3.2.2. Do diferent models learn consistent explanations?</title>
          <p>We first visualize the contrast-level Shapley value across all five folds for U-net, CU-net, (Dice), and
Swin-UNETR, CS win-UNETR, (Dice), using violin plot in Figure 3. We could observe that 1 and 2
are the most important image contrasts with the highest contrast-level Shapley value, this finding is
consistent with the clinical explanation where 2 suppresses cerebrospinal fluid signal, making edema
and infiltration more visible, while 1 provides clear delineation of enhancing tumor (see section 2.1).
We can also observe from this figure that Swin-UNETR weights 1 significantly higher than U-Net.</p>
          <p>To further investigate how model explanations are diferent within folds, we follow the procedure
from Section 3.2.1, with the key diference being that we compare results across multiple models while
ifxing the fold, unlike the previous tests where the models were fixed:
0( |, ,  ) :  (CU -Net, ( )) =  (CSegresnet, ( )) =  (CU NETR, ( )) =  (CSwin-UNETR, ( )),
0( |, ,  ) :  (CU -Net, ( )) =  (CSegresnet, ( )) =  (CU NETR, ( )) =  (CSwin-UNETR, ( )).
(7)
For all combinations of (, ,  ), the assumption that “Within each fold  , all models learned consistent
explanations when using contrast  for metric  " is invalid [Levene’s test ( &lt; 0.01), Kruskal-Wallis
test ( &lt; 0.01) for all tests]. However, the post-hoc tests do not reveal generalizable patterns across the
models similar to the conclusion we presented in Table 3. To highlight performance diferences, we
provide the confidence intervals.</p>
          <p>Since the distributions of Shapley values are independent across models, and for each input , the
diferences between Shapley values, ,, ( )− ,</p>
          <p>′, ( ) ( ̸= ′), passed the normality test, we further
assess the diference between models by evaluating the confidence interval
a desired level  , where we define:
 ( (C(,′), ( ))) given
C(,′), ( ) = ︁( ,1, ( ) − ,1′, ( ), · · · , ,, ( ) − ,′, ( )︁) 
(8)
with  denoting the total number of subjects in fold  from Definition (3).</p>
          <p>Here, we focus on the model diference in t1n, to test the hypothesis that Swin-UNETR has a higher
contrast shapley value compared to other models, indicating a more balanced shapley value distribution
and less basis toward t1c and t2f. The confidence intervals for the mean diference in Shapley values
(Swin-UNETR minus the other models) indicate a significant positive diference at a confidence level
of 0.95, suggesting that Swin-UNETR places more attention on the 1 contrast (Figure 3).</p>
          <p>To understand how transformer-based models difer from convolutional neural networks, we analyze
cases where the Swin-UNETR model achieves a Dice score at least 20% higher than U-Net and vice
versa. Specifically, we examine cases where the Swin-UNETR model achieves a Dice score 25% higher
than U-Net (Figure 4), and U-Net achieves a Dice score 23% higher than Swin-UNETR (Figure 4).
This comparison highlights the advantages and limitations of each architecture in medical image
segmentation tasks.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Discussion</title>
      <p>In this study, we systematically investigated the Shapley value for model explanation in multi-contrast
medical image segmentation. Our proposed contrast-level Shapley explainability framework has three
key contributions: (1) It is the first study to use Shapley analysis to explain multi-contrast medical
image segmentation; (2) It is the first paper to analyze how diferent network structures weigh various
GT</p>
      <p>UNet
SwinUNETR</p>
      <p />
      <p>MRI contrasts when making segmentation decisions; (3) It enhances clinical relevance by providing
deeper insights into model performance with aggregate contributions of each MRI contrast in the tumor
segmentation process, which is inherently interpretable by neuroradiologists, as they detect lesions by
analyzing diferences between diferent MRI contrasts in clinical practice.</p>
      <p>Specifically, the contrast-level Shapley value reveals the (in)consistency of each model’s explanations.
The statistics indicate that Swin-UNETR is the most robust among all tested architectures. Despite
being trained on diferent folds, Swin-UNETR consistently learns invariant representations across data
subsets, whereas other models show variations in their explanations across folds (Table 1).</p>
      <p>Moreover, the contrast-level Shapley value provides insights on the diferences among model
architectures. As shown in Figure 3, the model explanations indicate that U-Net exhibits a bias toward
features from 1 and 2 , while Swin-UNETR distributes its explanations more evenly across contrasts.
This was further confirmed by comparing 1 Shapley values across diferent models, which revealed
statistically higher Shapley values for Swin-UNETR (Table 3).</p>
      <p>We also present a case in Figure 4 to demonstrate how explanations of diferent models could provide
key insights into model failure. As discussed before, the training data includes 3 diferent tumor subtypes
(see section 2.1). The innermost component of the tumor (shown in red in Figure 4) is necrotic tissue in
glioblastoma and meningioma, however, in metastasis, the definition of the innermost component is
any tumor component that is not enhancing (but not necrotic). This implies that in 2 images, the
necrotic core will appear dark but non-enhancing metastatic tumor core and edema will appear bright.</p>
      <p>Due to its dependence on contrasts with the highest intensity diferences, namely 1 and 2 , the
U-Net architecture fails to accurately capture the innermost component (NCR). This suggests a potential
bias towards 1 and 2 , as indicated by the distribution of 1, (Dice) and 2, (Dice) exhibiting a
significantly higher central tendency compared to 1, (Dice) and 2,(Dice) across all folds  and
models  ∈ {UNET, Seg-Resnet, UNETR, Swin-UNETR }, as shown in Figure 2 and supported by
statistical tests in Section 3.2. This bias may contribute to confusion with edema prediction, causing
over-prediction relying on 2 (edema appears bright as shown in Figure 4). However, swin-UNETR
efectively learns both local and global relationships within diferent contrasts through its self-attention
mechanism, and was able to more accurately localize the tumor core in this challenging case.</p>
      <p>Finally, for this case, we provide a comparison between GradCAM and our proposed contrast-level
Shapley. As seen in Figure 4, pixel-level explanations provided by GradCAM on each MRI contrast show
model diferences in terms of using pixel-level features. The heatmap of Swin-UNETR is more smooth
while the heatmap of U-Net highlights only a few regions, but both of the explanations fail to capture
clinically relevant explanations regarding contrast-level importance. For example, in Swin-UNETR,
GradCAM exhibits a higher attention to 1 compared to 2 . However, Contrast Shapley reveals that
t1c negatively impacts the final Dice score, with a lower impact magnitude compared to 2 .</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>In this study, we propose Contrast Shapley for multi-contrast glioma segmentation. This method
provides a quantitative framework for model explanation, ofering insights into the fundamental
characteristics of diferent deep learning architectures.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>The author has not employed any Generative AI tools.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M. H.</given-names>
            <surname>Hesamian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Jia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kennedy</surname>
          </string-name>
          ,
          <article-title>Deep learning techniques for medical image segmentation: achievements and challenges</article-title>
          ,
          <source>Journal of digital imaging 32</source>
          (
          <year>2019</year>
          )
          <fpage>582</fpage>
          -
          <lpage>596</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Zhong</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y. Wang,</surname>
          </string-name>
          <article-title>Clinical inspired mri lesion segmentation</article-title>
          ,
          <source>arXiv preprint arXiv:2502.16032</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Rudin</surname>
          </string-name>
          ,
          <article-title>Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead</article-title>
          ,
          <source>Nature machine intelligence</source>
          <volume>1</volume>
          (
          <year>2019</year>
          )
          <fpage>206</fpage>
          -
          <lpage>215</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B. H.</given-names>
            <surname>Van der Velden</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. J.</given-names>
            <surname>Kuijf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. G.</given-names>
            <surname>Gilhuijs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Viergever</surname>
          </string-name>
          ,
          <article-title>Explainable artificial intelligence (xai) in deep learning-based medical image analysis</article-title>
          ,
          <source>Medical Image Analysis</source>
          <volume>79</volume>
          (
          <year>2022</year>
          )
          <fpage>102470</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>K.</given-names>
            <surname>Simonyan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vedaldi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zisserman</surname>
          </string-name>
          ,
          <article-title>Deep inside convolutional networks: Visualising image classification models and saliency maps</article-title>
          ,
          <source>arXiv preprint arXiv:1312.6034</source>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Selvaraju</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cogswell</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Das</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Vedantam</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Parikh</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Batra</surname>
          </string-name>
          , Grad-cam:
          <article-title>Visual explanations from deep networks via gradient-based localization</article-title>
          ,
          <source>in: Proceedings of the IEEE international conference on computer vision</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>618</fpage>
          -
          <lpage>626</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Lundberg</surname>
          </string-name>
          ,
          <article-title>A unified approach to interpreting model predictions</article-title>
          ,
          <source>arXiv:1705.07874</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Ribeiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Guestrin</surname>
          </string-name>
          ,
          <article-title>" why should i trust you?" explaining the predictions of any classifier</article-title>
          ,
          <source>in: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>1135</fpage>
          -
          <lpage>1144</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S. N.</given-names>
            <surname>Hasany</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Mériaudeau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Petitjean</surname>
          </string-name>
          ,
          <article-title>Misure is all you need to explain your image segmentation</article-title>
          ,
          <source>arXiv preprint arXiv:2406.12173</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>U.</given-names>
            <surname>Baid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghodasara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mohan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bilello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Calabrese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Colak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Farahani</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. KalpathyCramer</surname>
            ,
            <given-names>F. C.</given-names>
          </string-name>
          <string-name>
            <surname>Kitamura</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Pati</surname>
          </string-name>
          , et al.,
          <article-title>The rsna-asnr-miccai brats 2021 benchmark on brain tumor segmentation and radiogenomic classification</article-title>
          ,
          <source>arXiv preprint arXiv:2107.02314</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>T.</given-names>
            <surname>Ren</surname>
          </string-name>
          , E. Honey,
          <string-name>
            <given-names>H.</given-names>
            <surname>Rebala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chopra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kurt</surname>
          </string-name>
          ,
          <article-title>An optimization framework for processing and transfer learning for the brain tumor segmentation</article-title>
          ,
          <source>arXiv:2402.07008</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>T.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E. H.</given-names>
            <surname>Rivera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. H.</given-names>
            <surname>Rebala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Honey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chopra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kurt</surname>
          </string-name>
          , Re-difinet:
          <article-title>Modeling discrepancy in tumor segmentation using difusion models</article-title>
          ,
          <source>in: Medical Imaging with Deep Learning</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Ö. Çiçek</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Abdulkadir</surname>
            ,
            <given-names>S. S.</given-names>
          </string-name>
          <string-name>
            <surname>Lienkamp</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Brox</surname>
            ,
            <given-names>O. Ronneberger,</given-names>
          </string-name>
          <article-title>3d u-net: Learning dense volumetric segmentation from sparse annotation</article-title>
          ,
          <source>CoRR abs/1606</source>
          .06650 (
          <year>2016</year>
          ). arXiv:
          <volume>1606</volume>
          .
          <fpage>06650</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Myronenko</surname>
          </string-name>
          ,
          <article-title>3d mri brain tumor segmentation using autoencoder regularization</article-title>
          , in: Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: 4th International Workshop, BrainLes 2018,
          <article-title>Held in Conjunction with MICCAI 2018, Granada</article-title>
          , Spain,
          <year>September 16</year>
          ,
          <year>2018</year>
          ,
          <string-name>
            <given-names>Revised</given-names>
            <surname>Selected</surname>
          </string-name>
          <string-name>
            <surname>Papers</surname>
          </string-name>
          ,
          <source>Part II 4</source>
          , Springer,
          <year>2019</year>
          , pp.
          <fpage>311</fpage>
          -
          <lpage>320</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hatamizadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Nath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. R.</given-names>
            <surname>Roth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <article-title>Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images</article-title>
          , in: International MICCAI Brainlesion Workshop, Springer,
          <year>2021</year>
          , pp.
          <fpage>272</fpage>
          -
          <lpage>284</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hatamizadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Nath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Myronenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Landman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. R.</given-names>
            <surname>Roth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <article-title>Unetr: Transformers for 3d medical image segmentation</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF winter conference on applications of computer vision</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>574</fpage>
          -
          <lpage>584</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>