Latent Diffusion Models for Privacy-preserving
                                Medical Case-based Explanations
                                Filipe Campos1,2,3,∗ , Liliana Petrychenko3 , Luís F. Teixeira1 and Wilson Silva1,2,3
                                1
                                  INESC TEC, Faculdade de Engenharia, Universidade do Porto, Porto, Portugal
                                2
                                  AI Technology for Life, Department of Information and Computing Sciences, Department of Biology, Utrecht University,
                                Utrecht, Netherlands
                                3
                                  Department of Radiology, The Netherlands Cancer Institute, Amsterdam, Netherlands


                                            Abstract
                                            Deep-learning techniques can improve the efficiency of medical diagnosis while challenging human
                                            experts’ accuracy. However, the rationale behind these classifier’s decisions is largely opaque, which is
                                            dangerous in sensitive applications such as healthcare. Case-based explanations explain the decision
                                            process behind these mechanisms by exemplifying similar cases using previous studies from other patients.
                                            Yet, these may contain personally identifiable information, which makes them impossible to share without
                                            violating patients’ privacy rights. Previous works have used GANs to generate anonymous case-based
                                            explanations, which had limited visual quality. We solve this issue by employing a latent diffusion
                                            model in a three-step procedure: generating a catalogue of synthetic images, removing the images that
                                            closely resemble existing patients, and using this anonymous catalogue during an explanation retrieval
                                            process. We evaluate the proposed method on the MIMIC-CXR-JPG dataset and achieve explanations
                                            that simultaneously have high visual quality, are anonymous, and retain their explanatory value.

                                            Keywords
                                            Privacy-preserving machine learning, medical imaging, case-based explainability, latent-diffusion models


                                1. Introduction
                                Various medical imaging techniques such as X-ray or MRI are important to detect and diagnose
                                multiple medical conditions. These imaging modalities provide clinicians insights into patients’
                                health, facilitating accurate and timely diagnoses. In recent years, Deep Learning techniques
                                have shown the capability to rival human performance in diverse diagnosis tasks while being
                                more efficient. However, this leap brings a large challenge – the inherent lack of interpretability
                                in Deep Learning models. While these models exhibit remarkable diagnostic capabilities,
                                their decision-making processes remain largely opaque, posing a barrier to understanding
                                the rationale behind their predictions. This lack of interpretability raises legitimate concerns,
                                especially in critical medical decisions where transparency is necessary for establishing trust.


                                EXPLIMED - First Workshop on Explainable Artificial Intelligence for the medical domain - 19-20 October 2024, Santiago
                                de Compostela, Spain
                                ∗
                                    Corresponding author.
                                Envelope-Open filipe.p.campos@inesctec.pt (F. Campos); l.petrychenko@nki.nl (L. Petrychenko); luisft@fe.up.pt (L. F. Teixeira);
                                w.j.dossantossilva@uu.nl (W. Silva)
                                Orcid 0009-0006-4753-8846 (F. Campos); 0009-0008-5522-9562 (L. Petrychenko); 0000-0002-4050-7880 (L. F. Teixeira);
                                0000-0002-4080-9328 (W. Silva)
                                          © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
   One way to better understand deep-learning reasoning is through case-based explanations,
which explain by example, closely mimicking the rationale of human professionals. Using
these mechanisms, a doctor would have access to a prediction and a set of image explanations
for each machine-made diagnosis, allowing the professional to compare the current situation
with previous historical examples. Ideally, these explanations should be shareable between
medical professionals, doctors, and patients or even between hospitals, which is particularly
important to build a diverse catalogue of explanations. Yet, this is not possible since medical
images contain personally identifiable data, which may allow for the re-identification of patients;
therefore, they are protected under strict regulations such as the GDPR [1] and the HIPAA [2].
These issues can be overcome by anonymizing said explanations, therefore protecting patients’
privacy while maintaining the utility of the images.
   The generation of anonymous case-based explanations using GAN architectures has been
previously explored [3], yet the perceptual utility of these explanations is limited by their reduced
image quality. We propose to solve this issue by synthesizing and anonymizing images using
Latent Diffusion Models [4], which have emerged as a leading approach for image generation
tasks in recent years. The proposed methodology builds upon an anonymization method
proposed by Packhäuser et al. [5] and follows three key processes: generating a synthetic
dataset using a latent diffusion model, removing images from the synthetic dataset that closely
resemble patients in the training data and retrieving explanations from the newly created
anonymous synthetic dataset. The key contributions of this work are the following:

    • Generate a synthetic dataset based on the MIMIC-CXR-JPG [6] dataset using a latent
      diffusion model and anonymize it using a post-model approach.
    • Retrieve anonymous case-based explanations with high visual quality and utility.
    • Evaluate the proposed solution quantitatively and with an experienced radiologist’s aid.


2. Related Work
This work combines three distinct research areas: anonymization techniques, diffusion models
and explainability. Although some works connect the bridge between two of these topics, the
combination of all three is still largely unexplored.
Diffusion Models Recently, diffusion models such as DDPM [7] have emerged as an alter-
native to GANs to generate high-quality images while avoiding common difficulties of GANs,
such as mode collapse. The diffusion method consists of two distinct processes. During the
forward process, Gaussian noise is iteratively added to a training image. In contrast, in the
reverse process, a Deep-learning model, typically a U-Net [8], starting from pure noise, predicts
what noise was added at each step and iteratively removes it, generating a new image.
   Since the diffusion process is computationally expensive to scale to higher resolutions, instead
of performing the diffusion process on the image space, Latent Diffusion Models (LDM) [4]
perform it on a low-dimensionality latent space. Instead of generating images, during sampling,
the diffusion process will generate new latent vectors, which can be decoded back into the
image space using a variational autoencoder (VAE) [9]. To better adapt to different medical
imaging domains Medfusion [10] modifies the traditional LDM architecture by modifying the
different number of channels in the VAE [9] used to encode and decode latent vectors. While
the original LDM architecture employed 4 channels, the Müller-Franzes et al. found that, for
medical images, using 8 channels led to fewer visual artefacts.
Anonymization Visual anonymization methods can generally be divided into classical and
machine-learning-based. Within the classical methods, two of the most common anonymization
techniques include blurring the image or K-Same [11], which overlays 𝐾 images, therefore
obtaining K-Anonymity. Yet, these methods require aggressive image modifications, making
them difficult to analyse visually. Recently, there has been a focus on generating synthetic
images using machine learning methods. These methods can be divided into two sub-groups:
differential-privacy methods [12, 13], which provide strong privacy guarantees at the cost of
having a reduced image quality, struggling to be scaled beyond 32 × 32 pixel images due to
their computational complexity. The other methods present a more ad-hoc approach; instead of
providing strong statistical guarantees, they typically employ identity verification networks [5,
3], which empirically promote privacy.
Case-based Explanations Case-based explanations are a post hoc mechanism which justifies
a model’s decisions by retrieving cases or examples from the training data. This method is
analogue to human reasoning, making them easy to interpret. While certain image-retrieval
scenarios employ similarity metrics such as the Euclidean distance or the Structural similarity
index measure (SSIM) [14], which analyses an image as a whole, for case-based explanations,
the focus is set on small localized features which would be overlooked by said measures. One
way of retrieving relevant images is by comparing the features obtained by a task-specific
Deep-Learning classifier [15]. Additionally, Montenegro et al. [3] proposes a GAN architecture
which generates anonymous case-based explanations by employing a privacy loss function
that increases the distance between an identity embedding calculated for the generated image
and the identity embeddings of the remaining training data images. The main shortcoming of
this work is the limited image quality of the synthetic explanations caused by said privacy loss
function.


3. Method
Our methodology, summarized in Figure 1, follows the anonymization protocol defined by
Packhäuser et al. [5]. Initially, we train a latent diffusion model and synthesize a dataset.
Afterwards, a retrieval model for each image within this dataset will be used to identify which
images in the training set closely resemble the synthetic image. Then, we employ an identity
verification network to compare the anonymous image with its closest match and determine
whether it likely belongs to the same patient. This information allows us to remove non-
anonymous images and, consequently, create an anonymous catalogue from which we can
retrieve case-based explanations.

3.1. Image Generation using Latent Diffusion Models
We train a latent diffusion model which uses the Medfusion [10] architecture for image genera-
tion. Using it, we sample synthetic images to build a synthetic dataset that will be anonymized.
Figure 1: Anonymization Pipeline. A generative model generates synthetic samples based on the real
training data. From these samples, we remove those closely resembling patients in the training set
based on a patient retrieval and a patient verification model. This anonymous dataset serves as our
knowledge base from which to retrieve explanations.


3.2. Retrieval and Verification Network
We must identify if each synthetic image closely resembles any real patient in the training set.
To do so, first, a patient retrieval network consists of a Siamese Neural Network (SNN) [16] based
on the ResNet-50 [17] architecture, which takes as input two images and, using a contrastive
loss function, approximates embeddings so that the embeddings from the same patients are
grouped together.
   The verification model follows the same architecture, but instead of using a contrastive loss,
this network compares both input images and classifies whether or not they belong to the same
patient. For both models, the training data consists of positive training pairs obtained based on
the patient identity information of a dataset and randomized negative pairs.

3.3. Anonymization Pipeline
Considering the synthetic dataset, the verification model, and the retrieval network, we have
all the components required to obtain an anonymous dataset. The anonymization procedure
we employ can be broken down into three separate steps:

   1. Compute training identity embeddings: We obtain an identity embedding for each
      training set image using the identity retrieval network. These embeddings are stored in a
      KD-tree [18], used for more efficient queries using a nearest-neighbour strategy later.
   2. Search nearest training image: For each synthetic sample, we retrieve the top-1 most
      similar training sample. In this step, we compute the identity embedding of each image
      and perform a lookup of the previously mentioned KD-tree. Each real-synthetic image
      pair is stored in a list used in the next step.
   3. Remove non-anonymous image pairs: For each real-synthetic image pair, we use the
      identity verification network to obtain a prediction of whether or not both images belong
      to the same patient. If this likelihood exceeds a predefined threshold, the synthetic image
      is deemed non-anonymous and removed from the synthetic dataset.

3.4. Case-based Explanation Retrieval
We aim to retrieve explanations similar in task-related features and not merely structurally
similar. We employ a DenseNet121 [19] classifier and use the feature vector present in the last
layer, which contains 1024 features, to compare images. The intuition behind this method is
that the classifier will represent the most important diagnosis characteristics in its feature space.
   During inference, an image is passed through the classifier, and we use the features to obtain
the most similar features from a catalogue of synthetic images. The catalogue size will vary
depending on the application. In a real-case scenario, we want to have as many images as
possible to capture the most variability and obtain images that are as relevant as possible.


4. Experiments
Data We perform experiments on the MIMIC-CXR-JPG dataset [6], consisting of 377,110 chest
X-ray images in the JPG format from 65,379 unique patients. We chose to predict cardiomegaly
as our classification task since it is a common diagnosis with 66,799 samples. It makes up 29.32%
of the available samples and is easily identified visually, making it ideal to be explained by
case-based explanations. From the available images, we select the ones from the posteroanterior
(PA) view (96,161) since it is the view commonly used by radiology to diagnose Cardiomegaly
since the anteroposterior (AP) view tends to magnify the heart [20] hindering the diagnosis.
We use the recommended data splits provided alongside the dataset.
Image Generation We generate MIMIC-CXR-JPG images using the Medfusion [10] architec-
ture, the latent embeddings are encoded using a VAE, which accepts 3 × 512 × 512 pixel images
and has 8 embedding channels. The diffusion model uses DDIM [21], has label embedding and
time embeddings of size 1024 and employs a scaled linear noise scheduler with 𝑇 = 1000, which
varies from Β0 = 0.002, Β𝑇 = 0.02. Using a U-Net [8] architecture, we train the model for a
maximum of 1001 epochs with early stopping with a patience of 30 epochs. We generate 1000
synthetic images for both datasets, with a balanced split between positive and negative cases.
Retrieval and Verification For the retrieval phase, similarly to Packhäuser et al. [5], we train
our models for 30 epochs in the first phase, during which the model backbone is frozen, and 50
epochs during the second phase where all parameters are unfrozen. We use a learning rate of
0.158489 with weight decay of 1𝑒 −5 and batch size 32. Our verification model was trained using
the Adam optimizer [22] with 3 × 256 × 256 images and a learning rate of 0.0001. We used a
batch size of 32 and limited training to a maximum of 100 epochs with an early stopping criteria
with 5 epoch patient. During the anonymization step, we considered images non-anonymous if
the verification model prediction of a synthetic image and a real image belonging to the same
patient was above a threshold of 0.5.
Explanation Retrieval For explanation retrieval, we employ a DenseNet121 [19] network
trained using the Adam optimizer with a learning rate of 1𝑒 −3 up to 10 epochs with an early
stopping criteria and patience 𝑝 = 3. To search for similar images, we use the last feature vector
of the model, located before the dense layer and sized 1 × 1024.


5. Results and discussion
Table 1 shows a quantitative image quality evaluation of the synthetic images generated by the
Medfusion architecture. We report precision (P), recall (R) and the Fréchet inception distance
(FID) [23], which compares the distribution of real and generated images based on the deepest
layer of an Inception V3 network. To evaluate the image utility for classification tasks, we
compare a classifier trained on synthetic data with a classifier trained on real data in Table 2.
The classifier trained on synthetic data still achieves acceptable performance even though it
has been trained on a comparatively small dataset. This indicates that the signal required to
diagnose Cardiomegaly is still present in the new images.

                                                Table 2
Table 1
                                                Classification performance reported for both
Quantitative evaluation of the images
                                                the MIMIX-CXR-JPG and a synthetic dataset.
generated using the Medfusion model.
                                                 Dataset                 F1     ACC       Image Count
     FID        P        R
                                                 MIMIC-CXR-JPG       91.67      87.33        96,161
     62.25    68.00    22.17
                                                 Synthetic           75.97      69.00        1,000

   Both the verification and retrieval models showcase the ability to re-identify patients, as
demonstrated by the metrics highlighted in Tables 3 and 4, respectively. For the verification
model, we report the Area under the Receiver operating characteristic (AUC), precision (P),
recall (R) and F1-score. For the retrieval task, we track the R-precision, a common information
retrieval metric which is calculated as shown in Equation 1 using the number of relevant images
retrieved (𝑟) within the first 𝑅 images, where 𝑅 is the total number of relevant images existing
in the dataset. Finally, the 𝑚𝐴𝑃@𝑅, shown in Equation 2, is the mean of the average precision
at 𝑅 (𝐴𝑃@𝑅) for each query.
                                                                             𝑄
                                 𝑟                                        1
                R-Precision =                  (1)           𝑚𝐴𝑃@𝑅 = ∑ 𝐴𝑃𝑖 @𝑅                 (2)
                                 𝑅                                       𝑄 𝑖=1

  Table 3                                               Table 4
  Quantitative evaluation of patient                    Quantitative evaluation of patient
  verification model.                                   retrieval model.
    AUC      ACC       F1       P       R                      mAP@R          P@1       R_prec
   93.72     95.32    90.92    90.77   91.07                     79.32        94.27     80.87
   From the original 1,000 synthetic images, we use the anonymization procedure proposed in
Section 3.3 to remove all synthetic images similar to their closest real counterpart based on the
predictions made by the verification system. Using this process, 161 images were removed. In
Figure 2, we show image pairs at different prediction ranges, and, as expected, the higher-value
pairs are more similar than those within the lower ranges.
            Prediction 0.0 to 0.1             Prediction 0.45 to 0.55            Prediction 0.9 to 1.0
Real
Synthetic


Figure 2: Different image pairs of anonymous images and their closest real counterpart in the training
set, sorted by the predicted likelihood of belonging to the same patient. The images on the left are less
likely to contain identifiable patient data, while images on the right are highly likely.


  Similar to previous studies [15], the explanation ranking is evaluated with the help of a
radiologist. For this purpose, we perform two different experiments. For the first experiment,
we used 5 test cases, each containing a test image we aim to diagnose and 10 synthetic catalogue
images from which we will retrieve explanations. The second experiment differs by using a
catalogue of 5 synthetic images and 5 real images to compare the utility of both image types. All
the images were randomly sampled from the test set while ensuring a balanced class distribution.
The catalogue is limited to 10 images due to the time-consuming nature of evaluating the images.
  To evaluate ranking performance, we use a metric commonly used in ranking tasks, the
normalized Discounted Cumulative Gain (nDCG𝑝 ) [24] metric (Equation 3) where 𝑝 is the
number of retrieved images. The relevance values for each image (𝑟𝑒𝑙𝑖 ) vary from 5.5, the most
similar image, to 1.0, the least similar example, with a 0.5 step between each relevance value.
And IDCG𝑝 is the ideal DCG𝑝 value.
                            DCG𝑝                                       𝑝
                                                                            2rel𝑖 − 1
                nDCG𝑝 =                        (3)           DCG𝑝 = ∑                          (4)
                           IDCG𝑝                                       𝑖 log2 (𝑖 + 1)
  The results of the ranking evaluation are shown in Figure 3. We can notice that our CNN-
based method, on average, outperforms its SSIM counterpart. An example of the retrieval test
we used to evaluate the solution is shown in Figure 4. Interestingly, the CNN-based retrieval
mechanism obtained the same Top-3 images as the expert-based choices, although in a different
order.
  Additionally, a radiologist accessed each image belonging to the test cases. In Table 5, we see
that the generative model can conditionally generate images with a specific diagnosis while
maintaining a similar class agreement compared to real images. We can also note that synthetic
and real images are similarly relevant, supporting the fact that the generated images can be
used as alternatives to real ones.
  Finally, this method still presents some limitations. There is no upper bound on the number
        1.0
                           Top-4                  1.0
                                                              Top-7                       1.0
                                                                                                          Top-10

        0.8                                       0.8                                     0.8


        0.6                                       0.6                                     0.6


                                                                                 nDCG10
nDCG4


                                          nDCG7
        0.4                                       0.4                                     0.4


        0.2                                       0.2                                     0.2


        0.0                                       0.0                                     0.0
                  CNN              SSIM                 CNN           SSIM                      CNN                SSIM

Figure 3: Boxplots regarding nDCG for two retrieval approaches, CNN and SSIM for the Top-4, Top-7
and Top-10 retrieved images.


                                                  1            2             3                        4

                                                                                                            Expert-based

              Test image
                                                   2           1             3                        6

                                                                                                            CNN-based


                                                  9            2             8                        3
                                                                                                            SSIM-based


Figure 4: Example test image and corresponding Top-4 catalogue images retrieved by each corresponding
method. The color outline around each retrieved image indicates whether the test and catalogue images
share the same class (green) or not (red). The ranking annotated on the top-right corner of each image
indicates the ground truth based on expert rating.


of synthetic images that will be removed, leading to a waste of computing power on generating
images that will effectively be deleted. In future work, the anonymization mechanism could be
integrated into the generative model to combat this issue. Another key issue is the empirical
nature of this anonymization procedure, making it unproven. Unfortunately, the current best
method to provide anonymization, with strong guarantees, is through Differential Privacy,
which struggles to scale beyond low-resolution images, making it ineffective in generating
explanations which are interpretable by humans.
Table 5
Label agreement between expert evaluation and the ground-truth label. A classification is correct if
the image label and the radiologist agree on the diagnosis (Cardiomegaly or No Cardiomegaly). Cases
where it is impossible to diagnose a sample confidently are deemed non-conclusive. We also report the
average relevance of each type of image for the test cases with mixed image types.
              Type        Correct    Incorrect   Non-conclusive    Average Relevance
              Synthetic    70.67%     20.00%          9.33%           16.00 ± 4.30
              Real         76.00%     20.00%          4.00%           16.50 ± 4.30


6. Conclusion
We proposed a method to generate visually anonymous case-based explanations that retain
their utility and realism by levering latent diffusion models. This solution demonstrates that the
generated images are empirically unlikely to be traced back to the original patients, allowing
for the visual explanation of classifier decisions without exposing patient data. This is a step
towards integrating trustworthy and interpretable classifiers in the medical domain while
preserving patient privacy.


Acknowledgments
This work is financed by National Funds through the FCT - Fundação para a Ciência e a
Tecnologia, I.P. (Portuguese Foundation for Science and Technology) within the project CAGING,
with reference 2022.10486.PTDC (DOI 10.54499/2022.10486.PTDC).


References
 [1] Council of European Union, Council regulation (EU) no 679/2016, Online at https://eur-lex.
     europa.eu/legal-content/EN/TXT/?uri=CELEX:02016R0679-20160504, 2016.
 [2] U.S. Department of Health and Human Services, The Health Insurance Portability and
     Accountability Act of 1996 (HIPAA), Online at http://www.hhs.gov/hipaa/, 1996.
 [3] H. Montenegro, W. Silva, J. S. Cardoso, Privacy-preserving generative adversarial net-
     work for case-based explainability in medical image analysis, IEEE Access 9 (2021)
     148037–148047. doi:10.1109/ACCESS.2021.3124844 .
 [4] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, B. Ommer, High-resolution image synthesis
     with latent diffusion models, in: Proceedings of the IEEE/CVF Conference on Computer
     Vision and Pattern Recognition, 2022, pp. 10684–10695.
 [5] K. Packhäuser, S. Gündel, N. Münster, C. Syben, V. Christlein, A. Maier, Deep learning-
     based patient re-identification is able to exploit the biometric nature of medical chest x-ray
     data, Scientific Reports 12 (2022) 14851. doi:10.1038/s41598- 022- 19045- 3 .
 [6] A. E. W. Johnson, T. J. Pollard, N. R. Greenbaum, M. P. Lungren, C.-y. Deng, Y. Peng, Z. Lu,
     R. G. Mark, S. J. Berkowitz, S. Horng, MIMIC-CXR-JPG, a large publicly available database of
     labeled chest radiographs, 2019. doi:10.48550/arXiv.1901.07042 . arXiv:1901.07042 .
 [7] A. Nichol, P. Dhariwal, Improved Denoising Diffusion Probabilistic Models, 2021. doi:10.
     48550/arXiv.2102.09672 . arXiv:2102.09672 .
 [8] O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for biomedical image
     segmentation, CoRR (2015). arXiv:1505.04597 .
 [9] D. P. Kingma, M. Welling, Auto-encoding variational bayes, 2022. arXiv:1312.6114 .
[10] G. Müller-Franzes, J. M. Niehues, F. Khader, S. T. Arasteh, C. Haarburger, C. Kuhl, T. Wang,
     T. Han, T. Nolte, S. Nebelung, J. N. Kather, D. Truhn, A multimodal comparison of latent
     denoising diffusion probabilistic models and generative adversarial networks for medical
     image synthesis, Scientific Reports 13 (2023) 12098. doi:10.1038/s41598- 023- 39278- 0 .
[11] R. Gross, E. Airoldi, B. Malin, L. Sweeney, Integrating utility into face de-identification, in:
     G. Danezis, D. Martin (Eds.), Privacy Enhancing Technologies, Springer Berlin Heidelberg,
     Berlin, Heidelberg, 2006, pp. 227–242.
[12] T. Dockhorn, T. Cao, A. Vahdat, K. Kreis, Differentially Private Diffusion Models, Trans-
     actions on Machine Learning Research (2023). URL: https://openreview.net/forum?id=
     ZPpQk7FJXF.
[13] Z. Chu, J. He, D. Peng, X. Zhang, N. Zhu, Differentially private denoise diffusion probability
     models, IEEE Access 11 (2023) 108033–108040. doi:10.1109/ACCESS.2023.3315592 .
[14] Z. Wang, A. Bovik, H. Sheikh, E. Simoncelli, Image quality assessment: from error visibility
     to structural similarity, IEEE Transactions on Image Processing 13 (2004) 600–612.
[15] W. Silva, A. Poellinger, J. S. Cardoso, M. Reyes, Interpretability-guided content-based
     medical image retrieval, in: A. L. Martel, P. Abolmaesumi, D. Stoyanov, D. Mateus, M. A.
     Zuluaga, S. K. Zhou, D. Racoceanu, L. Joskowicz (Eds.), Medical Image Computing and
     Computer Assisted Intervention – MICCAI 2020, Springer International Publishing, Cham,
     2020, pp. 305–314.
[16] J. Bromley, I. Guyon, Y. LeCun, E. Säckinger, R. Shah, Signature verification using a
     “siamese” time delay neural network, in: Proceedings of the 6th International Conference
     on Neural Information Processing Systems, NIPS’93, Morgan Kaufmann Publishers Inc.,
     San Francisco, CA, USA, 1993, p. 737–744.
[17] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, CoRR
     abs/1512.03385 (2015). arXiv:1512.03385 .
[18] J. L. Bentley, Multidimensional binary search trees used for associative searching, Commun.
     ACM 18 (1975) 509–517. doi:10.1145/361002.361007 .
[19] G. Huang, Z. Liu, L. Van Der Maaten, K. Q. Weinberger, Densely connected convolutional
     networks, in: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
     IEEE, 2017, pp. 2261–2269. doi:10.1109/CVPR.2017.243 .
[20] E. Puddy, C. Hill, Interpretation of the chest radiograph, Continuing Education in
     Anaesthesia Critical Care & Pain 7 (2007) 71–75. doi:10.1093/bjaceaccp/mkm014 .
[21] J. Song, C. Meng, S. Ermon, Denoising diffusion implicit models, CoRR abs/2010.02502
     (2020). arXiv:2010.02502 .
[22] D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, 2017. arXiv:1412.6980 .
[23] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, S. Hochreiter, Gans trained by a two
     time-scale update rule converge to a local nash equilibrium, 2018. arXiv:1706.08500 .
[24] K. Fernandes, J. S. Cardoso, Hypothesis transfer learning based on structural model
     similarity, Neural Computing and Applications 31 (2019) 3417–3430.