Patch-based Intuitive Multimodal Prototypes Network
                                (PIMPNet) for Alzheimer’s Disease classification
                                Lisa Anita De Santi1,2,* , Jörg Schlötterer3,4 , Meike Nauta5 , Vincenzo Positano2 and
                                Christin Seifert3
                                1
                                  Department of Information Engineering, University of Pisa, Pisa, Italy
                                2
                                  Fondazione Toscana G Monasterio - Bioengineering Unit, Pisa, Italy
                                3
                                  University of Marburg, Marburg, Germany
                                4
                                  University of Mannheim, Mannheim, Germany
                                5
                                  Datacation, Eindhoven, Netherlands


                                            Abstract
                                            Volumetric neuroimaging examinations like structural Magnetic Resonance Imaging (sMRI) are routinely
                                            applied to support the clinical diagnosis of dementia like Alzheimer’s Disease (AD). Neuroradiologists
                                            examine 3D sMRI to detect and monitor abnormalities in brain morphology due to AD, like global and/or
                                            local brain atrophy and shape alteration of characteristic structures. There is a strong research interest
                                            in developing diagnostic systems based on Deep Learning (DL) models to analyse sMRI for AD. However,
                                            anatomical information extracted from an sMRI examination needs to be interpreted together with
                                            patient’s age to distinguish AD patterns from the regular alteration due to a normal ageing process.
                                            In this context, part-prototype neural networks integrate the computational advantages of DL in an
                                            interpretable-by-design architecture and showed promising results in medical imaging applications. We
                                            present PIMPNet, the first interpretable multimodal model for 3D images and demographics applied to
                                            the binary classification of AD from 3D sMRI and patient’s age. Despite age prototypes do not improve
                                            predictive performance compared to the single modality model, this lays the foundation for future work
                                            in the direction of the model’s design and multimodal prototype training process.

                                            Keywords
                                            Interpretability-by-design, Prototype, Prototype-network, Multimodal Deep Learning, Alzheimer, MRI,
                                            Age


                                1. Introduction
                                There is a significant research interest in supporting Alzheimer’s Disease (AD) diagnosis with
                                Deep Learning (DL) models [1]. Existing diagnostic guidelines often integrate the clinical evalu-
                                ation of the patient with structural Magnetic Resonance Imaging (sMRI), to detect pathological
                                brain patterns like gray matter atrophy.


                                Late-breaking work, Demos and Doctoral Consortium, colocated with The 2nd World Conference on eXplainable Artificial
                                Intelligence: July 17–19, 2024, Valletta, Malta
                                *
                                  Corresponding author.
                                $ lisa.desanti@pdh.unipi.it (L. A. De Santi); joerg.schloetterer@uni-marburg.de (J. Schlötterer);
                                m.nauta@datacation.nl (M. Nauta); positano@ftgm.it (V. Positano); christin.seifert@uni-marburg.de (C. Seifert)
                                 0000-0001-7239-4270 (L. A. De Santi); 0000-0002-3678-0390 (J. Schlötterer); 0000-0002-0558-3810 (M. Nauta);
                                0000-0001-6955-9572 (V. Positano); 0000-0002-6776-3868 (C. Seifert)
                                          © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
   Brain alterations in sMRI might support the early and differential diagnosis and the prediction
of disease’s progression. There are sets of common practices used for analysing sMRI acquisition,
but there are still no universally accepted methods [2, 3, 4]. In addition, information collected
from sMRI should be interpreted together with the patient’s age, as there are anatomical brain
changes due to the physiological ageing process [5, 6].
   DL architectures can facilitate the analysis of neuroimaging data, and might be able to
identify unconventional AD subtypes and extract yet unknown image-based biomarkers [7, 8].
Prototypical-Part (PP) networks combine the advantages of DL models in an interpretable-by-
design architecture, and are collecting interesting results in medical imaging applications where
the black-box nature of standard DL models poses controversy [9].
   There are currently different variants of PP networks, including PIPNet [10], originally
applied to 2D images and then extended to handle 3D scans [11]. PIPNet showed appealing
properties in the medical imaging domain [12], including a reduced number of part-prototypes,
semantic significance of learned prototypes, and ability to cope with Out-of-Distribution data
(which might be particularly useful in dementia diagnosis, where unusual neurodegeneration
pattern are reported [4]). However, sMRI data should be interpreted together with patients’
demographics to discern age-related image alteration from pathological alteration, and existing
PP models cannot be directly applied to perform this task. Adding non-image prototypes to the
standard PP architecture is a non-trivial task, and there are no unique strategies available. There
are some works which integrate the concept of learning prototypes from multiple modalities
which are based on the concatenation (deterministic prototypes) or on the multimodal feature
extraction (shifted prototypes). However, available models cannot be applied to our task, as
specifically designed for images and textual data [13].
   We present Patch-based Intuitive Multimodal Prototypes Network (PIMPNet), the first multi-
modal prototype classifier which learns 3D image part-prototypes and prototypical values from
structured data, to predict patient’s cognitive level in AD from sMRI and age values.


2. Method
This section introduces the architecture (cf. Sect. 2.1 and Fig. 1) and the training process
(Sect. 2.2), of PIMPNet.

2.1. Proposed Model: PIMPNet
We propose an age-prototypes layer integrated into the original PIPNet 3D model [11] to create
our multimodal architecture. In contrast to “ordinary” age-binning for the inclusion of age
information, the age-prototypes layer has the advantages of: (i) being able to learn important
age values for the diagnostic task performed (which might not be equally distributed, and might
not be easily identifiable in priors); (ii) not to assign different age bins to two patients of similar
age close to the bin boundary.
   Our PIMPNet has an input layer which takes the 3D image x𝑖𝑚𝑔 ∈ R𝑐ℎ×𝑆×𝑅×𝐶 and the
age x𝑎𝑔𝑒 ∈ R1 as input, where 𝑐ℎ, 𝑆, 𝑅, 𝐶 respectively represents the number of channels,
slices, rows and columns of the input image volume. Image x𝑖𝑚𝑔 and age x𝑎𝑔𝑒 are processed
in parallel. A CNN backbone processes x𝑖𝑚𝑔 , z = 𝑓 (x𝑖𝑚𝑔 ; w𝑓 ) extracting 𝑀 3-dimensional
Figure 1: PIMPNet architecture


(𝐷 × 𝐻 × 𝑊 ) feature maps, where z𝑚,𝑑,ℎ,𝑤 represents the activation of image-prototype 𝑚
at patch location 𝑑, ℎ, 𝑤. Next, a 3D max-pooling applied to every feature map extracts 𝑀
image prototype presence scores p𝑖𝑚𝑔 ∈ [0., 1.]𝑀 , where p𝑖𝑚𝑔,𝑚 measures the presence of the
image prototype 𝑚 in the input image. This defines the image-prototypes layer.
   In parallel, we have the age-prototypes layer, constituted by 𝑁 trainable 1-dimensional
tensors t𝑎𝑔𝑒,𝑛 ∈ R1×𝑁 , which aims to learn prototypical age values for the classification task.
This layer computes age prototype presence scores p𝑎𝑔𝑒 ∈ [0., 1.]𝑁 , a similarity measurement
between the input age to every age prototype, defining a smooth age binning 1 :
                                                                    1
                                        p𝑎𝑔𝑒,𝑛 = √︂          (︁                  )︁2𝑠 ,                             (1)
                                                                  x𝑎𝑔𝑒 −t𝑎𝑔𝑒,𝑛
                                                       1+               𝑡


where t𝑎𝑔𝑒 are trainable parameters and 𝑡 and 𝑠 are hyper-parameters which regulate the band
and the slope of the similarity function. We then have a prototypes layer which concatenates
the image and age prototype presence scores obtaining a layer of 𝐿 = 𝑀 + 𝑁 prototypes
p ∈ [0., 1.]𝐿 p = 𝑐𝑜𝑛𝑐𝑎𝑡(p𝑖𝑚𝑔 , p𝑎𝑔𝑒 ). The final classification is performed by a sparse linear
1
    The similarity function employed is inspired by the magnitude of a Butterworth filter [14]. In preliminary exper-
    iments, we used an exponential similarity function as in ProtoTree: p𝑎𝑔𝑒,𝑛 = 𝑒𝑥𝑝(−||x𝑎𝑔𝑒 − t𝑎𝑔𝑒,𝑛 ||), but as
    𝑒𝑥𝑝(−2) ≈ 0.13, a 2 years age difference would already result in little similarity, which is not in line with domain
    knowledge about age relevance for Alzheimer’s disease.
positive layer wc ∈ R𝐿×𝐾≥0   which connects image and age prototypes to the 𝐾 classes acting
as a scoring sheet system. The 𝐾 class output scores are given by the sum of the prototypes’
presence score weighted for∑︀ the contribution of prototype 𝑙 to class 𝑘 wc 𝑙,𝑘 , i.e., o = pwc ,
where o is 1 × 𝐾 and o𝑘 = 𝐿      𝑙=1 p𝑙 wc . PIMPNet returns the output class only using the
                                            𝑙,𝑘

most activated age-prototype, i.e., closest to the patient’s age according to the similarity metric2 .

2.2. PIMPNet Training
We optimize PIMPNet’s parameters by integrating the training of age prototypes into the original
PIPNet training process [10]. This includes two main stages: (1) Self-Supervised pre-training of
Image-Prototypes, and (2) PIMPNet training.
  As the original PIPNet [10], the 1st stage generates positive pairs x′𝑖𝑚𝑔 , x′′𝑖𝑚𝑔 by applying
data augmentation transformation to x𝑖𝑚𝑔 selected so that humans consider the two views
similar. These are used∑︀to minimize the loss function 𝜆𝐴 ℒ𝐴 + 𝜆𝑇 ℒ𝑇 by updating w𝑓 , where
ℒ𝐴 = − 𝐷𝐻𝑊  1
                                𝑙𝑜𝑔(z′:,𝑑,ℎ,𝑤 · z′′:,𝑑,ℎ,𝑤 ) is an Alignment Loss which optimizes
                      (𝑑,ℎ,𝑤)∈𝐷×𝐻×𝑊
positive pairs to activate the same prototype. Together with a softmax over z:,𝑑,ℎ,𝑤 , the alignment
results in near-binary   encodings  where an image patch corresponds to exactly one prototype.
          1 ∑︀
                𝑙𝑜𝑔(𝑡𝑎𝑛ℎ( p𝑖𝑚𝑔,𝑏 ) + 𝜖) is a Tanh-Loss used to prevent the trivial solution that
                            ∑︀
ℒ𝑇 = − 𝑀
one prototype node is activated on all image patches in each image in the dataset and instead
activates multiple distinct prototypes per batch 𝑏. Only during training, output scores are
calculated as o = log((pw𝑐 )2 + 1), acting as regularization for sparsity.
  The 2nd training stage includes the training of age prototypes, optimization of classification
performance and image-prototypes fine-tuning for the downstream classification task. The
optimization minimizes 𝜆𝐴 ℒ𝐴 + 𝜆𝑇 ℒ𝑇 + 𝜆𝐶 ℒ𝐶 by updating w𝑓 , t𝑎𝑔𝑒 , w𝑐 , where ℒ𝐶 is the
Log-likelihood classification loss


3. Evaluation
We used the multimodal dataset from the Alzheimer’s Disease Neuroimaging Initiative (ADNI)
database3 . We selected “ADNI1 Standardized Screening Data Collection for 1.5T scans” processed
with Gradwarp, B1 non-uniformity, and N3 correction, obtaining 307 CN and 243 AD sMRI brain
scans and the corresponding patients’ age. We report statistics on patients’ demographic of the
selected ADNI cohort in Table 1. We preprocessed sMRI data inspired by the pre-processing

Table 1
Patients’ demographics of the selected ADNI cohort, further divided according to the clinical labels.
                              Class     N° subjects    Mean ± SD Age        Age Range
                              Both          550              76 ± 6            55-91
                              CN            307              76 ± 5            60-90
                              AD            243              75 ± 8            55-91

2
    We selected only the most activated age prototype during inference (not during the optimization process).
3
    https://adni.loni.usc.edu
Table 2
Performances comparison between PIPNet trained on 3D sMRI and PIMPNet trained on 3D sMRI + Age
averaged over 5 folds.
                 Model                     Acc       Bal Acc       SENS        SPEC          F1
                 PIPNet
                 ResNet-18 3D            83 ± 04      83 ± 04     86 ± 06     79 ± 07     81 ± 05
                 ConvNeXt-Tiny 3D        65 ± 12      66 ± 09     56 ± 32     76 ± 15     66 ± 05
                 PIMPNet
                 ResNet-18 3D            84 ± 04      83 ± 04     89 ± 03     77 ± 08     81 ± 05
                 ConvNeXt-Tiny 3D        72 ± 04      70 ± 04     86 ± 10     55 ± 14     63 ± 09


pipeline applied in previous works [15]. We tranformed all the images to the common ICBM152
Non-Linear Symmetric 2009c standard space [16] with affine registration. We selected the grey
matter structures by applying the ICBM152 Non-Linear Symmetric 2009c brain mask and kept
a margin of 3 from its first and last non-empty slices. We applied an image downsampling of
2 and we scaled all image intensities to the range [0-1] with a min-max normalization. We
implemented PIMPNet using PyTorch and MONAI4 , training our models on an Intel Core i7 5.1
MHz PC, 32 GB RAM, equipped with an NVIDIA RTX3090 GPU with 24 GB of embedded RAM.
As CNN backbones we used ResNet-18 3D pretrained on Kinetics400 [17] and ConvNeXt-Tiny
3D pretrained on the STOIC medical dataset (Study of Thoracic CT in COVID-19) [18]. We
finetuned PIMPNet with Adam optimizer using the same hyperparameter settings of the original
PIPNet [10]. We only reduced the batch size to 12 to adapt it to our computational capabilities
and we set the learning rate of age prototypes to 0.15 . We arbitrarily set the number of age
prototypes = 5 evenly spaced between 40 and 90 to cover the patients’ age range of our dataset.
For the age similarity function, we respectively set 𝑡 = 4 and 𝑠 = 86 . We performed 5-fold
cross-validation with patient-wise splits. 20% of training images are used for validation.
   We evaluated the models in terms of classification performance and with functionally
grounded metrics of explainability. Results are reported in Tables 2 and 3. We compared
PIMPNet performance (sMRI + age) with PIPNet-3D (sMRI only) [11], to evaluate if including
age information improves diagnostic performance. We measured performance using Accuracy
(Acc), Balanced Accuracy (Bal Acc), Sensitivity (SENS, Acc of Cognitive Normal class), Specificity
(SPEC, Acc of Alzheimer’s Disease class), and F1 score (F1). We measured the Global size (GS)
of the model as the total number of prototypes, and the Local size (LS) of explanations as the
number of detected prototypes in a single 3D sMRI, averaged over all the images in the test set.
Additionally, we report the Sparsity (Sp) of the decision layer as the percentage of zero-weights
in the linear classification layer [10] to assess the compactness of the prototypes-classes layer.
We further assessed whether prototypes are consistently located in the same brain region, and
the purity of the prototypes in terms of the anatomical regions included based on the CerabrA
atlas annotation [19]. More specifically, the Prototypes Localization Consistency (LC𝑝 ) evaluates
4
  https://monai.io
5
  Using the same learning rate of the original PIPNet used to train the image-prototypes (0.05) results in irrelevant
  updates of the Age Prototypes
6
  We leave an extensive hyperparameter search for learning the age prototypes for future work
Table 3
Functionally-grounded evaluation of PIPNet trained on 3D sMRI and PIMPNet trained on 3D sMRI +
Age averaged over 5 folds. ↑ and ↓: tendency for better values.
        Model                         GS ↓          LS ↓            Sp ↑            LC𝑝 ↓            H𝑝 ↓
        ResNet-18 3D
        PIPNet                       149 ± 18     73 ± 10     0.855 ± 0.018      0.008 ± 0.006   2.474 ± 0.249
        PIMPNet                      143 ± 35     74 ± 20     0.861 ± 0.033      0.006 ± 0.006   2.424 ± 0.162
        ConvNeXt-Tiny 3D
        PIPNet                        4±2           2±1       0.997 ± 0.001      0.000 ± 0.000   1.803 ± 0.999
        PIMPNet                       10 ± 9        4±4       0.993 ± 0.002      0.000 ± 0.000   1.543 ± 0.626


Table 4
Prototypical Age Values t𝐴𝑔𝑒,𝑖 learned for folds M1, ..., M5 trained with different backbones.

      Fold     t𝐴𝑔𝑒,1    t𝐴𝑔𝑒,2      t𝐴𝑔𝑒,3     t𝐴𝑔𝑒,4     t𝐴𝑔𝑒,5   t𝐴𝑔𝑒,1    t𝐴𝑔𝑒,2    t𝐴𝑔𝑒,3   t𝐴𝑔𝑒,4     t𝐴𝑔𝑒,5
                                  ResNet-18 3D                                    ConvNeXt-Tiny 3D
      M1       65.77      65.81      66.14       76.81     80.99     56.81      65.00   64.96    74.13      85.80
      M2       68.46      69.40      70.38       77.04     82.38     55.75      58.39   64.96    74.32      85.59
      M3       66.37      67.27      67.91       75.87     81.96     54.86      56.63   65.21    74.40      85.11
      M4       66.72      66.72      67.07       77.07     79.75     58.22      58.59   66.50    75.88      89.09
      M5       66.51      66.52      67.23       77.37     80.00     57.79      66.94   65.44    72.55      84.58


the differences in the coordinate centre of the prototypical part in the input image, while the
Prototype Brain Entropy (H𝑝 ) as a measure of purity computes the Shannon Entropy of the
brain regions included in the prototypical part [11]. We show the learned age prototypes t𝑎𝑔𝑒
from five different folds (denoted as Mx where x indicates the current fold) in Table 4.


4. Discussion and Conclusion
Both PIPNet and PIMPNet with the ResNet-18 3D backbone achieve higher classification per-
formance than with the ConvNext-Tiny backbone. Our preliminary results also show that the
proposed Age-Prototype layer can learn prototypical age values; however, these do not im-
prove classification performances compared to the baseline model. Our functionally-grounded
evaluation of prototypes shows that all models learn prototypes consistently located in the
same anatomical brain regions (low LC𝑝 values). We also observe that the models trained with
the ConvNeXt-Tiny 3D backbone have higher compactness. This might partially explain the
lower performance scores (the number of prototypes learned is not enough for performing
the diagnosis), but is an interesting observation for future research as such a highly compact
model can be considered more interpretable than larger ones and can be easily evaluated by
domain experts. We also observe that the image prototypes of the ConvNeXt-Tiny 3D backbone
are generally purer7 . Despite purity being a desirable property for prototypes [20], because of

7
    Purity is measured w.r.t. to the annotation provided by the CerebrA atlas
the design of the purity metric, also a prototype which only includes the background, i.e., a
clinically irrelevant prototype, will have high purity8 .
   In summary, we proposed PIMPNet, an interpretable multimodal prototype-based classifier.
The proposed architecture is the first prototypes-based network which performs an interpretable
classification based on the detection of prototypes learned from different data modalities (3D
images and age information). We applied PIMPNet to the binary classification of Alzheimer’s
Disease from 3D sMRI images together with the patient’s age. Despite the usage of age prototypes
do not improve predictive performance compared to the model trained with only images, we
identified different potential reasons which define the future directions of our work. First, as
the original PIPNet training paradigm includes a pretraining stage [10] of image prototypes,
we plan to include an age prototypes pretraining step w.r.t. the log-likelihood classification
loss. Second, we also plan to work on the model’s design. As the simple concatenation of the
prototype presence score might not be able to properly represent the relationship between age
and image prototypes for the downstream task, we plan to combine image and age prototypes
using a different (but still interpretable) classifier than a scoring-sheet system.


Acknowledgments
Data used in the preparation of this article was obtained from the Alzheimer’s Disease Neu-
roimaging Initiative (ADNI) database. The ADNI was launched in 2003 as a public-private
partnership with the primary goal to test whether serial magnetic resonance imaging (MRI),
positron emission tomography (PET), other biological markers, and clinical and neuropsycho-
logical assessment can be combined to measure the progression of mild cognitive impairment
(MCI) and early Alzheimer’s disease (AD).


References
    [1] M. A. Ebrahimighahnavieh, S. Luo, R. Chiong, Deep learning to detect alzheimer’s disease
        from neuroimaging: A systematic literature review, Computer Methods and Programs in
        Biomedicine 187 (2020). doi:10.1016/j.cmpb.2019.105242.
    [2] L. De Santi, E. Pasini, M. Santarelli, D. Genovesi, V. Positano, An Explainable Convolutional
        Neural Network for the Early Diagnosis of Alzheimer’s Disease from 18F-FDG PET, Journal
        of Digital Imaging 36 (2023). doi:10.1007/s10278-022-00719-3.
    [3] A. Chandra, G. Dervenoulas, M. Politis, Magnetic resonance imaging in Alzheimer’s disease
        and mild cognitive impairment, 2019. doi:10.1007/s00415-018-9016-3.
    [4] P. Vemuri, C. Jack, Role of structural MRI in Alzheimer’s disease, Alzheimer’s Research
        and Therapy 2 (2010). doi:10.1186/alzrt47.
    [5] L. Zhao, W. Matloff, K. Ning, H. Kim, I. D. Dinov, A. W. Toga, Age-related differences in
        brain morphology and the modifiers in middle-aged and older adults, Cerebral Cortex 29
        (2019) 4169–4193. doi:10.1093/cercor/bhy300.
8
    Posterior quantitative evaluation performed w.r.t. the CerebrA atlas revealed that the test set image prototypes
    (averaged over the 5-folds) obtained with the ConvNeXt-Tiny backbone have a higher percentage of background
    voxels included compared to the ones obtained with ResNet-18 (76.6% vs 59.2%)
 [6] R. Sivera, H. Delingette, M. Lorenzi, X. Pennec, N. Ayache, A model of brain morpholog-
     ical changes related to aging and alzheimer’s disease from cross-sectional assessments,
     NeuroImage 198 (2019) 255–270. doi:10.1016/j.neuroimage.2019.05.040.
 [7] M. Böhle, F. Eitel, M. Weygandt, K. Ritter, Layer-wise relevance propagation for explaining
     deep neural network decisions in MRI-based Alzheimer’s disease classification, Frontiers
     in Aging Neuroscience 10 (2019). doi:10.3389/fnagi.2019.00194.
 [8] M. Khojaste-Sarakhsi, S. S. Haghighi, S. F. Ghomi, E. Marchiori, Deep learning for
     Alzheimer’s disease diagnosis: A survey, Artificial Intelligence in Medicine 130 (2022)
     102332. doi:https://doi.org/10.1016/j.artmed.2022.102332.
 [9] L. Longo, M. Brcic, F. Cabitza, J. Choi, R. Confalonieri, J. D. Ser, R. Guidotti, Y. Hayashi,
     F. Herrera, A. Holzinger, R. Jiang, H. Khosravi, F. Lecue, G. Malgieri, A. Páez, W. Samek,
     J. Schneider, T. Speith, S. Stumpf, Explainable artificial intelligence (xai) 2.0: A mani-
     festo of open challenges and interdisciplinary research directions, Information Fusion
     106 (2024) 102301. URL: http://creativecommons.org/licenses/by/4.0/. doi:10.1016/j.
     inffus.2024.102301.
[10] M. Nauta, J. Schlötterer, M. van Keulen, C. Seifert, PIP-Net: Patch-Based Intuitive Pro-
     totypes for Interpretable Image Classification, in: IEEE/CVF Conference on Computer
     Vision and Pattern Recognition (CVPR), 2023. doi:10.1109/CVPR52729.2023.00269.
[11] L. A. D. Santi, J. Schlötterer, M. Scheschenja, J. Wessendorf, M. Nauta, V. Positano, C. Seifert,
     Pipnet3d: Interpretable detection of alzheimer in mri scans, 2024. arXiv:2403.18328.
[12] M. Nauta, J. H. Hegeman, J. Geerdink, J. Schlötterer, M. v. Keulen, C. Seifert, Interpreting
     and correcting medical image classification with pip-net, in: Artificial Intelligence. ECAI
     2023 International Workshops, 2024, pp. 198–215.
[13] Y. Ma, S. Zhao, W. Wang, Y. Li, I. King, Multimodality in meta-learning: A comprehensive
     survey, Knowledge-Based Systems 250 (2022). doi:10.1016/j.knosys.2022.108976.
[14] S. Butterworth, et al., On the theory of filter amplifiers, Wireless Engineer 7 (1930) 536–541.
[15] A. W. Mulyadi, W. Jung, K. Oh, J. S. Yoon, K. H. Lee, H.-I. Suk, Estimating explainable
     Alzheimer’s disease likelihood map via clinically-guided prototype learning, NeuroImage
     273 (2023). doi:10.1016/j.neuroimage.2023.120073.
[16] V. Fonov, A. Evans, R. McKinstry, C. Almli, D. Collins, Unbiased nonlinear average
     age-appropriate brain templates from birth to adulthood, NeuroImage 47 (2009) S102.
     URL: https://www.sciencedirect.com/science/article/pii/S1053811909708845. doi:https:
     //doi.org/10.1016/S1053-8119(09)70884-5, organization for Human Brain Map-
     ping 2009 Annual Meeting.
[17] D. Tran, H. Wang, L. Torresani, J. Ray, Y. LeCun, M. Paluri, A closer look at spatiotemporal
     convolutions for action recognition (2017). URL: http://arxiv.org/abs/1711.11248.
[18] D. Kienzle, J. Lorenz, R. Schön, K. Ludwig, R. Lienhart, Covid detection and severity
     prediction with 3d-convnext and custom pretrainings (2022). URL: http://arxiv.org/abs/
     2206.15073.
[19] A. L. Manera, M. Dadar, V. Fonov, D. L. Collins, Cerebra, registration and manual label
     correction of mindboggle-101 atlas for mni-icbm152 template, Scientific Data 7 (2020).
     doi:10.1038/s41597-020-0557-9.
[20] M. Nauta, C. Seifert, The co-12 recipe for evaluating interpretable part-prototype image
     classifiers, in: Explainable Artificial Intelligence, 2023, pp. 397–420.