1. Introduction

IEEE Journal of Biomedical and Health Informatics 28 (2024) 4084-4093. doi:10.1109/JBHI.2024.3385504. [5] N. J. Dhinagar

1613-0073

10.1038/s41598-023-34341-2

Neuroimaging data using a Difusion Model

Andrea Basile

andrea.basile@uniba.it 1

Fabio Calefato

fabio.calefato@uniba.it 1

Filippo Lanubile

filippo.lanubile@uniba.it 1

Giancarlo Logroscino

giancarlo.logroscino@uniba.it 0 2

Giulio Mallardi

giulio.mallardi@uniba.it 1

Benedetta Tafuri

benedetta.tafuri@uniba.it 0 2

Workshop

0 Center for Neurodegenerative Diseases and the Aging Brain, University of Bari Aldo Moro , Tricase , Italy 1 Dept. of Computer Science, University of Bari Aldo Moro , Bari , Italy 2 Dept. of Translational Biomedicine and Neuroscience (DiBraiN), University of Bari Aldo Moro , Bari , Italy

2014

2 25 28

This preliminary study explores the use of difusion models for brain imaging generation to address the limitations of small datasets in rare neurodegenerative conditions. Our goal is to improve model robustness by generating realistic variations in medical images. Data scarcity is the main issue for the application of deep learning techniques in neurodegeneration. In the last decade, difusion models have tried to address this problem as a novel generative technique widely applied for image and video generation. A difusion model, known for capturing complex data distributions, was trained on a multicenter dataset of Structural Magnetic Resonance Images of healthy subjects to generate a high-quality synthetic dataset. Our results show that the Maximum Mean Discrepancy between two distributions is 0.036, thus indicating that the two distributions are quite similar. However, other metrics such as the Frechet Inception Distance and the Multiscale Structural Similarity Index Measure achieve suboptimal results. Although far from model optimization, these preliminary results demonstrate that difusion models can be a valid tool to generate high-quality brain imaging data.

deep learning data augmentation difusion models imaging healthcare

1. Introduction

Magnetic resonance imaging (MRI) represents a valuable technique for investigating brain structure and function. This imaging modality has revolutionized our understanding of various neurological conditions by providing detailed brain images. In the contest of rare neurodegenerative diseases, such as Amyotrophic Lateral Sclerosis (ALS) and Frontotemporal Dementia (FTD), due to the low prevalence of these conditions, the main issue is the lack of a suficiently high-quality neuroimaging dataset to conduct research studies with robust and generalizable results. This scarcity of data not only impacts the reliability of research in this field but also poses significant challenges for clinical diagnosis and monitoring.

Recent advances in machine learning, particularly the development of difusion models, ofer an efective solution to address these limitations. Difusion models are capable of capturing complex data distributions and generating high-quality synthetic images. By integrating these models into the neuroimaging workflow, it is possible to augment existing datasets, potentially overcoming the challenges posed by small samples and improving the quality of neuroimaging studies.

This study aims to explore the use of difusion models to augment neuroimaging data for rare neurodegenerative diseases. We hypothesize that the synthetic images generated by difusion model will provide high-fidelity MRIs, comparable with real neuroimaging data. Our results could represent a crucial point to significantly improve not only quality and reliability of neuroimaging studies but also the clinical management of conditions like ALS and FTD.

CEUR

ceur-ws.org

Our paper is organized as follows. Section 2 reviews related work on difusion models for image synthesis. Section 3 presents our proposed method using denoising difusion probabilistic models (DDPM) to generate synthetic 3D MRI data. Section 4 describes the experimental setup, including datasets and preprocessing steps. Section 5 discusses the results and evaluations of our model. Finally, Section 6 concludes with a summary of the findings and future research directions.

2. Related Work

Difusion models have emerged as powerful tools for high-quality image synthesis across various domains, surpassing traditional generative models such as GANs. We explore key advancements in the application of difusion models to general image synthesis, medical imaging, and neuroimaging. Ho et al. [ 1 ] introduce difusion probabilistic models for high-quality image synthesis, demonstrating that these models outperform traditional generative techniques by linking difusion processes with denoising score matching. Their method achieves impressive results on standard benchmarks such as CIFAR10. Building on this foundation, Rombach et al. [ 2 ] propose Latent Difusion Models (LDMs), which apply difusion processes in the latent space of pre-trained autoencoders. This approach significantly reduces computational requirements while maintaining high image quality. It is versatile enough to handle text-to-image synthesis and super-resolution tasks through eficient training and cross-attention mechanisms.

Transitioning to specific applications in the medical domain, Khader et al. [3] propose the integration of denoising difusion probabilistic models with VQ-GANs to generate realistic and diverse 3D medical images. This approach proves to be efective for data augmentation and enhancing segmentation tasks across various medical imaging datasets, outperforming traditional GAN approaches. Similarly, Dorjsembe et al. [4] present Med-DDPM, a conditional difusion model designed to generate realistic 3D brain MRI images conditioned on segmentation masks, efectively addressing data scarcity and privacy concerns in medical imaging. Their model not only enhances visual fidelity over existing methods but also improves tumor segmentation accuracy, showcasing its utility for data augmentation and anonymization purposes.

In the context of neurological research, Dhinagar et al. [5] introduce conditional difusion models tailored to generate synthetic brain MRI images for Alzheimer’s disease research. By creating counterfactual images that highlight disease-specific changes, their approach enhances classifier performance and interpretability, supporting the visualization and detection of Alzheimer’s efects, which is beneficial for clinical diagnostics and neuroscience studies. Extending this specialized application, Pinaya et al. [6] use LDMs to generate high-resolution synthetic 3D brain MRI images conditioned on demographic and anatomical variables such as age and brain volume. Their method not only achieves superior quality and stability compared to GANs but also produces a large, publicly available dataset of synthetic brain images, facilitating further research in the field.

3. Proposed Approach

We use a denoising difusion probabilistic model (DDPM) to generate 3D MRI data. Figure 1 shows the proposed model that takes preprocessed T1-weighted MRIs to generate synthetic images, similar to the inputs. The details of the proposed framework are presented in the next section.

3.1. Difusion for MRI Data Synthesis in Rare Neurodegenerative Diseases

Inspired by non-equilibrium thermodynamics, difusion models ofer a promising alternative to Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs) for high-quality data synthesis. Recent studies, such as those by Dhinagar et al. [5] and Pinaya et al. [6], have efectively applied difusion models to neuroimaging for Alzheimer’s research and high-resolution synthetic brain generation. Building on these advancements, our study adapts the denoising difusion probabilistic model

Denoising Process (DDPM) to generate realistic 3D brain MRIs tailored specifically for handling limited datasets in rare neurodegenerative conditions.

In contrast to GANs, difusion models provide better training stability and produce high-fidelity results for audio and graphics [7]. VAEs, instead, tend to have lower image quality than DDPMs, mainly due to their limited expressiveness or imperfect loss criteria. DDPMs, on the other hand, excel in synthesizing more complex and diverse image modalities with less risk of model collapse. However, DDPMs present a unique challenge: they often alter the original data distribution of input images due to introduced random noise, potentially neglecting the structural consistency inherent in the input data. Despite this, DDPMs ofer greater flexibility and potential for generating high-quality images than VAEs, especially when capturing complex details and maintaining diversity in the generated images [8].

Ho et al. [ 1 ], the authors of the first DDPM, defined a difusion probabilistic model as a parameterized Markov chain trained using variational inference to produce samples that match the data after a finite time. Transitions of this chain are learned to reverse a difusion process, which is a Markov chain that gradually adds noise to the data in the opposite direction of sampling until the signal is destroyed. Following the principles outlined in previous studies on difusion models, we apply conditional Gaussian sampling to preserve key structural elements during the image generation process. This approach provides a stable foundation for generating clinically viable synthetic brain images.

Our approach leverages a customized DDPM model to handle 3D neuroimaging data, introducing specific network architecture and noise schedule adjustments to preserve anatomical accuracy in MRI synthesis. Not previously employed in medical image synthesis, these modifications ensure that the generated images retain structural consistency critical for clinical relevance. In the context of brain MRI synthesis for rare neurological conditions, we begin with a complex distribution of brain structures ( 0), where 0 represents a 84×128×84 3D T1-weighted MRI volume. The forward difusion process, ( 1∶ | 0), progressively adds Gaussian noise to the image over timesteps, defined by: ( | −1 ) = ( ; √1 − −1 , I) (1) where is a noise schedule that typically increases with , carefully chosen to balance the trade-of between training stability and sampling speed, this process transforms the intricate distribution of brain images into a simple Gaussian distribution ( ), efectively destroying all structural information in the original image.

The core of the DDPM lies in learning the reverse process, ( −1 | ), parameterized by a neural network ( , ) , which estimates the noise added at each step. This network is typically a U-Net ℒ = , 0, [‖ − ( , )‖ 2]

1 √1 − −1 = ( −

( , )) + √1 − ̄ architecture, modified to handle 3D volumetric data and conditioned on the timestep . The training objective is to minimize the diference between the predicted and actual noise: using a single network to approximate the reverse process for all timesteps.

where = √ ̄ 0 + √

1 − ̄ , and ̄ = ∏=1 (1 − ). This formulation allows for eficient training During inference, we sample ∼ (0, I) and iteratively apply the learned denoising process: (2) (3) where ∼ (0,

I) and is a slight noise term added to prevent mode collapse.

The application of DDPMs to MRI synthesis for rare brain diseases ofers several advantages, particularly in addressing the challenge of small datasets. Rare neurological conditions often result in limited available MRI data, making it challenging to train robust models or conduct comprehensive studies. Several techniques can be employed to overcome this limitation, such as Transfer Learning, Few-shot Learning, Standard data augmentation, and so on.

When applied to DDPMs for rare brain disease MRI synthesis, these techniques may significantly enhance the model’s ability to generate diverse, high-quality samples despite limited data availability. DDPMs’ progressive denoising process allows for interpretable intermediate results, providing insights into the model’s decision-making process—a crucial feature when dealing with rare diseases where every aspect of the generation process needs to be scrutinized.

However, applying DDPMs to 3D MRI synthesis for rare brain diseases also presents unique challenges. The high dimensionality of the volumetric data (84×128×84 in this case) combined with the scarcity of samples requires careful architectural design and regularization strategies to prevent low performance. In addition, ensuring the preservation of disease-specific anatomical details while maintaining overall brain structure consistency requires sophisticated loss functions and potentially incorporating domainspecific knowledge from neurologists and radiologists.

In conclusion, while DDPMs show great promise for MRI data synthesis in the context of rare brain diseases, their successful application requires a careful balance between leveraging their generative power and addressing the specific challenges posed by limited data availability.

3.2. Model Configuration and Training

We implemented a modified version of UNet adapted for 3D data using MONAI 1 [9], an open source framework based on PyTorch and specialized in deep learning in healthcare imaging. Our difusion model used a 3D UNet architecture with spatial dimensions of 3, 1 input and output channels and a channel structure of [128, 128, 256]. Attention mechanisms were applied at the deepest level, with 256 head channels. The model processed tensors of size 84 × 128 × 84 (depth × height × width) through 2 residual blocks at each level.

We utilized the DDPMScheduler2, a class from the MONAI repository that defines the methodology for adding noise to an image, configured as follows: scheduler = DDPMScheduler(num_train_timesteps=1000, schedule="scaled_linear_beta", beta_start=0.0005, beta_end=0.0195)

The ”scaled_linear_beta” schedule determines how the noise is added and removed during the difusion process. This schedule begins by defining a linear schedule for the beta_start and ending at beta_end over num_train_timesteps. The linear schedule is then scaled to improve numerical stability, particularly for longer difusion processes. values, starting from

1MONAI: Medical Open Network for AI, https://monai.io/ 2DDPMScheduler, https://github.com/Project-MONAI/MONAI/blob/dev/monai/networks/schedulers/ddpm.py

For each timestep , the scheduler computes = 1 − and ̄ = ∏=1 . During the forward process, noise is gradually added to the image according to these and ̄ values. In the reverse process (image generation), the scheduler uses these values to guide the gradual denoising of random noise into a coherent brain MRI.

The training process employed a scaled linear beta schedule with 1000 timesteps, beta values ranging from 0.0005 to 0.0195, and 1000 inference steps. We optimized the model using Adam optimizer with a learning rate of 5 × 10−5 over 400 epochs. To accelerate training, we leveraged the ‘Accelerate’ library for eficient multi-GPU processing.

The training was carried out on LEONARDO, a pre-exascale Tier-0 EuroHPC supercomputer at CINECA3. Specifically, we used the Booster Module partition, which consists of BullSequana X2135 “Da Vinci” GPU Blades. Each node is equipped with a 32-core Intel Xeon Platinum 8358 CPU, 512 GB of RAM, and 4 NVIDIA custom Ampere A100 GPUs with 64GB HBM2e memory each, connected via NVLink 3.0. This configuration allowed us to eficiently train our 3D difusion model on large-scale medical imaging data, leveraging state-of-the-art hardware and software optimizations.

4. Experimental Setting 4.1. Dataset Details

We initially focused on building a model based on images of healthy subjects to establish its capabilities to capture normal brain structures and their variability. This approach represents the first step for future adaptations to rare neurodegenerative diseases, such as Frontotemporal Dementia (FTD), where datasets are typically much smaller and more specialized.

Through our collaboration with the Center for Neurodegenerative Diseases and the Aging Brain, University of Bari Aldo Moro at Pia Foundation of Cult and Religion ”Card. G. Panico” (CMND), we obtained a diverse dataset of T1-weighted healthy brain MRI scans from multiple public sources. This dataset was curated to provide a robust foundation for our initial model development, focusing primarily on healthy subjects. The dataset comprises images from the following sources: • ADNI (Alzheimer’s Disease Neuroimaging Initiative): A longitudinal multicenter study designed to develop clinical, imaging, genetic, and biochemical biomarkers for the early detection and tracking of Alzheimer’s disease. • NIFD (Neuroimaging in Frontotemporal Dementia): A dataset focused on frontotemporal dementia, providing valuable insights into brain structure changes associated with this condition. • OASIS (Open Access Series of Imaging Studies): A project aimed at making neuroimaging datasets freely available to the scientific community. We utilized data from three OASIS projects: – OASIS-1: A cross-sectional collection of young, middle-aged, nondemented, and demented – OASIS-2: A longitudinal collection of older adults with and without dementia. – OASIS-3: A compilation of MRI and PET imaging and related clinical data for normal aging older adults.

and Alzheimer’s Disease.

Parkinson’s disease progression.

• PPMI (Parkinson’s Progression Markers Initiative): A landmark study to identify biomarkers of All datasets are publicly accessible for research and were used in full compliance with ethical guidelines and data privacy standards. All T1-weighted MRI scans from these datasets underwent consistent preprocessing, as detailed in the subsequent subsection, to ensure the uniformity of the images. The final curated dataset consisted of 1,017 preprocessed images. We allocated 80% of these images (approximately 814) for model training, with the remaining 20% (approximately 203) reserved for validation purposes. This split allows for robust model training while retaining a substantial portion of performance evaluation. 3CINECA HPC, https://www.cineca.it/en/

4.2. Preprocessing and Input Representation

To ensure consistency in input orientations and intensities across our datasets, all images underwent a standardized preprocessing pipeline using AssemblyNet [10]. This comprehensive preprocessing involved the following steps [11]: 1. Denoising Manjón et al. [12]: This step reduced random variations in image intensity, enhancing the signal-to-noise ratio and improving overall image quality. 2. Inhomogeneity correction Tustison et al. [13]: Also known as bias field correction, this process addressed variations in image intensity caused by magnetic field inhomogeneities, ensuring uniform intensity across the entire brain volume. 3. Afine registration to MNI space

Avants et al. [14]: Images were spatially aligned to a standard

Montreal Neurological Institute (MNI) template. This transformation mapped each brain to a common coordinate system (181 x 217 x 181 voxels at 1 x 1 x 1 mm3 resolution), facilitating inter-subject comparisons. 4. Fine Inhomogeneity correction using SPM Ashburner and Friston [15] 5. Tissue-based intensity normalization Manjón et al. [16]: This step adjusted image intensities based on specific tissue types (e.g., gray matter, white matter), standardizing intensity ranges across diferent scans and scanners.

images, isolating the brain for subsequent analysis. 6. Brain extraction Manjón et al. [17]: Non-brain tissues (e.g., skull, scalp) were removed from the Following these steps, we further refined the images by centralizing and normalizing intensities within the brain mask while setting the background to zero. This process ensured that all brain regions were scaled consistently across the dataset.

Finally, to accommodate GPU memory constraints and optimize computational eficiency during model training, we resized the images from their original dimensions (181 x 217 x 181) to 84 x 128 x 84. This adjustment preserved essential structural information while reducing computational load, which is critical given the high memory demands of the difusion model settings. The memory-intensive nature of these settings limits the model’s ability to process full-resolution images on available hardware.

5. Results In this section, we present the results of our difusion model. 5.1. Model optimization

In the context of difusion models, Mean Squared Error (MSE) serves as a crucial loss function during the training process, as it quantifies the diference between the predicted noise and the actual noise added to the data at each timestep of the difusion process. More specifically: =

1 =1

∑( − ̂ )2 where represents the actual noise added to the image, and ̂ is the noise predicted by the model. A lower MSE indicates that the model has become more adept at predicting the noise added during the forward difusion process, which is essential for efective image generation during the reverse difusion process. The training of our difusion model took about six hours and covered 400 epochs. At this point, we observed convergence in our primary loss metric (MSE). The final MSE value achieved was 0,0002, as illustrated in Figure 2. (4)

5.2. Evaluation metrics

Frechet Inception Distance (FID) [18] calculates the distance between two distributions of feature vectors. This metric was explicitly applied to assess the quality of synthetic images compared to real ones. In order to compute the distance, it is necessary to load a pre-trained model (for example, RadImageNet for 2D and MedicalNet for 3D images), which will extract feature vectors from the images and then compute the statistics like mean and variance used to compute the Frechet distance. A lower value of FID means that the two distributions are similar.

Unbiased Maximum Mean Discrepancy (MMD) [19] is a kernel-based method to measure the similarity between two distributions. It is a non-negative metric where a smaller value indicates a closer match between the two distributions. Multi-Scale Structural Similarity Index Measure (MS-SSIM) [20] is a similarity metric usually used in image generation contexts to measure the structural similarity of data within the same dataset. This index is a value between -1 and 1, where 1 indicates perfect similarity, 0 indicates no similarity, and -1 indicates perfect anti-correlation. We evaluated the metrics over 86 images from both the real images dataset and the synthetic one, achieving the following results: medical-3D-DDPM (Ours) real images

MMD 0.036

FID 19.39

MS-SSIM 0.58 0.74

The MMD shows promising preliminary results. The value is very close to 0, indicating that the two distributions are pretty similar. However, the FID is higher, suggesting that the features extracted from the real and synthetic datasets are somewhat diferent. However, the result is promising, given that this is a preliminary study, as depicted in Figure 3.

Lastly, MS-SSIM computed on the synthetic dataset is lower than that of the real dataset, indicating that our model generates suficiently similar brains. In contrast, the structural similarity in the real dataset is higher, suggesting that the brains within it are approximately 16% more similar to each other than those generated by our model.

Expert neuroradiologists from CMND reviewed a selection of generated images qualitatively, validating their anatomical plausibility and structural consistency, which are crucial for clinical applications. The combination of quantitative metrics and expert validation emphasizes our model’s utility and areas for further refinement.

(a) Real Image 1 (b) Real Image 2 (d) Synthetic Image 1 (e) Synthetic Image 2 (f) Synthetic Image 3

6. Conclusion and Future Work

Our work demonstrates the potential of difusion models in generating synthetic 3D T1-weighted MRI scans of healthy brains. Although the current results are promising, there are several avenues for future research and improvement.

One of the key directions for future research is improving the resolution of the generated MRI images. We acknowledge that the current resolution is insuficient for detecting small-scale neurodegenerative alterations. To achieve this, we plan to implement a latent difusion model that operates within a compressed latent space. This approach is expected to reduce computational complexity and improve the model’s ability to capture both global structures and fine details, leading to higher fidelity in the generated images. Furthermore, this will enable the application of the model to more clinically relevant scenarios where high-resolution imaging is critical.

To broaden our model’s applicability to rare neurological conditions, we will employ transfer learning techniques. This approach addresses the challenge of limited data availability for rare brain diseases. We plan to fine-tune our pre-trained model on small, condition-specific datasets and explore few-shot and zero-shot learning methods. Collaborating with clinical partners will ensure the generated images’ relevance for rare disorders. This adaptation aims to create valuable tools for research and training in rare neurological conditions.

To integrate our model into healthcare environments, we propose developing an automated Machine Learning Operations (MLOps) pipeline. This pipeline will implement CI/CD practices for medical imaging AI models, including automated data validation, performance monitoring, and rigorous security measures compliant with healthcare regulations. We aim to ensure seamless integration with hospital information systems (HIS). This MLOps pipeline will maintain model accuracy, currency, and deployability in clinical settings, potentially accelerating the adoption of AI-generated synthetic MRI in medical research and practice.

Acknowledgments References

This study has been carried out with the co-financing of the Ministry of University and Research in the framework of PNC ”DARE - Digital lifelong prevention project” (PNC0000002 – CUP B53C22006450001).

[1]

Ho ,

Jain ,

Abbeel , Denoising difusion probabilistic models , in: Proceedings of the 34th International Conference on Neural Information Processing Systems (NeurIPS '20) , Curran Associates, Inc., Red

Hook

, NY , USA, 2020 , pp. 6840 - 6851 . URL: https://dl.acm.org/doi/10.5555/ 3495724.3496298.

[2]

Rombach ,

Blattmann ,

Lorenz ,

Esser ,

Ommer , High-resolution image synthesis with latent difusion models , 2022 . URL: https://arxiv.org/abs/2112.10752. arXiv: 2112 . 10752 .