Denoising Diffusion Probabilistic Models for DBT data augmentation: preliminary results Lorenzo D’Errico1 , Lorenzo Pergamo2 , Daniel Riccio1 and Mariacarla Staffa2,∗ 1 University of Naples Federico II 2 University of Naples Parthenope Abstract Recent strides in computer vision have led to promising breakthroughs in the realm of image generation. Notably, diffusion probabilistic models such as DALL-E 2, Imagen, and Stable Diffusion have demonstrated the ability to create lifelike images based on textual prompts. Yet, their potential application in the medical domain, where intricate three-dimensional image volumes are commonplace, remains largely untapped. Synthetic imagery presents a compelling avenue in the realm of privacy-preserving artificial intelligence and holds immense potential for enriching datasets with limited samples. This study seeks to assess the effectiveness of diffusion probabilistic models in synthesizing high-fidelity medical imaging data, with a particular focus on Digital Breast Tomosynthesis (DBT) images. Keywords Denoising Diffusion Probabilistic Models (DDPMs), Breast Tomosynthesis (DBT), Generative Models 1. Introduction data to effectively learn, a requirement that often proves costly and labor-intensive to fulfill. This challenge is The success of deep learning across various pattern recog- particularly pertinent for digital breast tomosynthesis nition tasks has ignited widespread excitement and el- (DBT), which represents a relatively novel breast cancer evated expectations regarding its potential impact on screening modality. Data augmentation offers a solution healthcare [1]. Concurrently, Digital Breast Tomosynthe- by artificially expanding the training set through label- sis (DBT) has emerged as a transformative technology preserving transformations [4]. This study aims to lever- in breast cancer screening and diagnosis. Since its clini- age Denoising Diffusion Probabilistic Models (DDPMs), cal debut in 2011, radiologists specializing in breast dis- a class of generative models, to generate synthetic sam- ease diagnosis nationwide have increasingly adopted this ples for Digital Breast Tomosynthesis (DBT). DDPMs innovative approach for both screening and diagnostic have garnered significant attention across various do- purposes, with its adoption steadily rising [2]. The con- mains for their ability to produce synthetic data of ex- vergence of DBT and AI presents significant promise, of- ceptional quality. These models function by iteratively fering opportunities for heightened precision, efficiency, introducing noise to an input signal, such as an image, and overall advancements in breast cancer screening and text, or audio, and then learning the denoising process diagnosis. As the healthcare landscape evolves, the fu- to generate novel samples. In the realm of image syn- sion of DBT and AI holds the potential to revolutionize thesis, DDPMs have demonstrated success in generating breast cancer detection and management [3]. Integrating authentic and high-quality images, bolstered by com- deep learning algorithms with DBT data could lead to petitive log-likelihoods that attest to their effectiveness more accurate and timely identification of abnormalities, in diverse generative tasks [5]. The overarching objec- thereby improving patient outcomes. Despite the opti- tive is to address the scarcity and imbalance in existing mism surrounding this new era of machine learning, the datasets, thereby enhancing the quality of deep learning development and implementation of AI tools in clinical algorithms, particularly those related to segmentation settings encounter numerous challenges [2]. Deep ar- and detection. The proposed methodology entails using tificial neural networks necessitate substantial training synthetic DBT samples as a form of data augmentation to mitigate the constraints associated with current dataset Ital-IA 2024: 4th National Conference on Artificial Intelligence, orga- nized by CINI, May 29-30, 2024, Naples, Italy availability. This augmentation strategy is expected to ∗ Corresponding author. contribute to the refinement of deep learning algorithms, † These authors contributed equally. ultimately driving advancements in segmentation and de- Envelope-Open lorenzo.derrico@unina.it (L. D’Errico); tection algorithms within the context of DBT. The study lorenzo.pergamo001@studenti.uniparthenope.it (L. Pergamo); is structured as follows: in Section 2 an overview of the daniel.riccio@unina.it (D. Riccio); mariacarla.staffa@uniparthenope.it (M. Staffa) DBT technology as well as of DDPM models it’s given; Orcid 0000-0001-8044-8224 (L. D’Errico); 0000-0002-5844-0602 Section 3 delves into the results obtained and finally in (D. Riccio); 0000-0001-7656-8370 (M. Staffa) Section 4 future perspectives are discussed. © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings 2. Matherial and Methods 2.1. Digital Breast Tomosynthesis Full-field digital mammography (FFDM) has traditionally been the primary breast cancer screening method, but its effectiveness is hindered by inherent limitations. Visual- izing complex breast structures in two dimensions often leads to obscured tumor margins and inaccurate lesion characterization due to overlapping tissue [6]. Standard imaging projections may not capture the full extent of Figure 2: Comparison between a 2D mammogram and a irregular or multifocal tumors [7], further complicating 3D one. In Digital Breast Tomosynthesis (right), tumors are accurate diagnosis. Variations in breast composition, po- detected, unlike in mammography (left) where tissue overlap obstructs the view of the specialist doctor [11]. sitioning artifacts, and tissue compression during imag- ing introduce variability into tumor size estimation and localization. In contrast, Digital Breast Tomosynthesis (DBT) (see Fig.1), approved by the FDA in 2011, revolu- lutionize breast healthcare delivery by facilitating early tionizes breast imaging by acquiring a series of low-dose detection and treatment of breast cancer. X-ray images from multiple angles and reconstructing them into a 3D dataset [8]. This enables radiologists to 2.2. Denoising Diffusion Probabilistic Models Diffusion models represent an advanced category of gen- erative models renowned for their efficacy in capturing intricate data distributions. Despite being a recent addi- tion to the generative learning field, they have proven valuable across diverse applications. The three domi- nant generative frameworks are identified as Generative Adversarial Networks (GANs) [13], Variational Autoen- coders (VAEs) [14], and normalizing flows [15]. These models, falling under the category of probabilistic gener- ative models, are adept at capturing intricate data distri- butions, establishing themselves as a formidable tool in various applications. A Denoising Diffusion Probabilistic Model (DDPM) is a parameterized Markov chain trained Figure 1: Digital Breast Tomosynthesis procedure. using variational inference to produce samples matching the data after finite time (see Fig.3). DDPM are composed navigate breast tissue in three dimensions, overcoming of two opposite processes, forward and reverse diffusion the limitations of 2D mammography [9]. DBT enhances process. In the forward diffusion process, Gaussian noise lesion visualization, improves diagnostic accuracy, and is gradually and iteratively introduced to intentionally outperforms FFDM in detecting invasive cancers and ar- perturb the images within the training set, aiming to in- chitectural distortions [10] (see Fig.2). Advanced recon- duce a transformation wherein they deviate from their struction algorithms and image processing techniques current distribution and align more closely with a normal further enhance DBT’s diagnostic utility, allowing for distribution. In the reverse diffusion process, the objec- the detection of smaller lesions with greater confidence tive is to systematically invert the preceding forward [12]. DBT reduces false positives, minimizes unneces- diffusion procedure. The reversal is conducted gradually sary recalls, and optimizes patient outcomes by providing and iteratively to counteract the perturbation applied to clearer, more detailed images. Integration of quantitative images in the forward process. Starting where the for- imaging biomarkers and machine learning algorithms ward process concludes, the advantage of initiating from augments DBT’s diagnostic capabilities, ushering in per- a normal distribution lies in the known methodology sonalized breast cancer screening and management. In for sampling points from this uncomplicated distribu- conclusion, DBT represents a transformative advance- tion. The primary aim is to discern the means to revert ment in breast cancer imaging, promising unparalleled to the original data distribution. Nonetheless, the chal- diagnostic accuracy and improved patient outcomes. As lenge arises from the potential for an infinite array of research and technology progress, DBT is poised to revo- trajectories originating from a point in this ostensibly simple space, with only a fraction leading to the data dis- 3. Experimental Results tribution. Within the context of DDPM, this is achieved by referencing the incremental steps undertaken in the This section presents a detailed analysis of the experi- forward diffusion process. The probability density func- mental setup employed for training and evaluating the tion (PDF) corresponding to the corrupted images in the DDPM. The results are presented in a structured manner, forward process exhibits slight variations at each step. showcasing the model’s ability to capture and simulate Consequently, in the reverse process, a deep-learning the intricate features present in authentic DBT images. model is employed at each step to prognosticate the PDF Additionally, we explore the impact of key hyperparame- parameters of the forward process. Subsequent to model ters on the synthesis process. training, any point in the simple space can be selected, and the model can be utilized iteratively to navigate back 3.1. Description of Dataset to the data subspace. In reverse diffusion, denoising is sys- tematically performed in small steps, commencing from The dataset comprises patient records from individuals a noisy image. This method of training and generating who underwent Digital Breast Tomosynthesis (DBT) ex- new samples is characterized by enhanced stability com- aminations at the Duke Health system between January pared to Generative Adversarial Networks (GANs) and 1, 2014, and January 30, 2018. The acquisition process in- surpasses prior approaches such as Variational Autoen- volved cross-referencing information from radiology re- coders (VAE) and normalizing flows. Diffusion models, ports, pathology reports, and DBT data obtained from the as outlined in the literature [16], are a category of latent Picture Archiving and Communication Systems (PACS) variable models represented by the equation at Duke. These studies encompassed a total of 13,954 unique patients, each with at least one craniocaudal (CC) 𝑝𝜃 (𝑥0 ) ∶= ∫ 𝑝𝜃 (𝑥0∶𝑇 ) 𝑑𝑥1∶𝑇 , and mediolateral oblique (MLO) view available for either the left or right breast (see Fig.4). where 𝑥1 , … , 𝑥𝑇 are latent variables of the same dimen- The dataset is organized into three sets: a training set sionality as the data 𝑥0 ∼ 𝑞(𝑥0 ). The joint distribution comprising 1.42 TB, a validation set comprising 84.71 GB, 𝑝𝜃 (𝑥0∶𝑇 ) is denoted as the reverse process, constituting and a test set comprising 135.14 GB. The images in the a Markov chain with learned Gaussian transitions that dataset are in a DICOM format and were processed using initiate at 𝑝(𝑥𝑇 ) = 𝒩 (𝑥𝑇 ; 0, 𝐼 ): the torchio library for reading. To make them consistent 𝑇 and ready for analysis and research, all the pictures were 𝑝𝜃 (𝑥0∶𝑇 ) ∶= 𝑝(𝑥𝑇 ) ∏ 𝑝𝜃 (𝑥𝑡−1 |𝑥𝑡 ), reshaped to dimensions of 64x64 pixels with 8 slices. 𝑡=1 The dimensions of the images have been systematically 𝑝𝜃 (𝑥𝑡−1 |𝑥𝑡 ) ∶= 𝒩 (𝑥𝑡−1 ; 𝜇𝜃 (𝑥𝑡 , 𝑡), Σ𝜃 (𝑥𝑡 , 𝑡)). reduced through an iterative process aimed at preserving the utmost quality of the visual content. This iterative Regarding the structural design of the model, it’s note- approach has been employed with the primary objective worthy that the dimensions of both the input and output of maintaining the highest possible image quality while of the model should align. To achieve this objective, Ho undergoing size reduction. This change is not just for et al. [16] utilized a U-Net architecture, thereby ensur- analysis, but it also enhance computational efficiency. ing compatibility in size between the input and output components of the model. From the typical UNet ar- chitecture, the conventional double convolution at each 3.2. Hardware level was replaced with Residual blocks as employed in The principal objective of the present research is cen- ResNet models. In the DDPM implementation, a Wide tered on the generation of synthetic samples in Digital ResNet block was employed as per Zagoruyko et al. [17]. Breast Tomosynthesis (DBT) through the application of However, in the adaptation by Phil Wang, the standard a sophisticated Denoising Diffusion Probabilistic Model. convolutional layer was replaced with a weight standard- To optimize the computational procedures inherent in ized version, recognized for its improved performance this complex task, the foundational code underwent a in conjunction with group normalization as outlined by process of parallelization on four Tesla V100 Graphics Kolesnikov et al. [18]. Moreover, in order to maintain Processing Units (GPUs). It is imperative to note that parameter consistency across various time instances, si- the parallelization strategy employed pertained specif- nusoidal position embeddings are incorporated, drawing ically to the data level, signifying that the dataset was inspiration from the Transformer model [19]. This inte- effectively partitioned and processed concurrently across gration facilitates the neural network in discerning the all GPUs. This strategic approach played a pivotal role relevant time step (noise level) for each image within a in amplifying both the efficiency and expeditiousness batch. The SinusoidalPositionEmbeddings module has of the model training and synthetic sample generation, been used in this work. Finally, an attention module is introduced from the Transformer architecture [19, 20]. Figure 3: DDPM Architecture. tributing to the overall success of the research endeavor. • Learning Rate. It was observed that the adoption of the smallest learning rate, 1e-4, conferred su- perior outcomes. This discernment underscores the model’s susceptibility to nuanced parameter Figure 4: Example slices from the dataset. adjustments, wherein diminutive updates within the parameter space correlated with enhanced performance. The adaptive characteristics inher- thereby significantly augmenting the overall efficacy of ent to the Adam optimizer, which dynamically the research. adjusts learning rates based on historical gradi- ents, likely played a pivotal role in the efficacy of this minimal learning rate. 3.3. Model deployed Hyperparameters Loss function In the work, a deliberate choice was made to employ the Mean Squared Error (MSE) as the • Batch Size. A judiciously determined batch size primary loss function. This decision was founded upon of 16 was allocated for each of the four Graphics the premise of calculating the disparity between the noise Processing Units (GPUs) employed. This strate- introduced to images and the corresponding noise predic- gic selection was grounded in the quest for an tions generated by the UNet model. However, subsequent optimal compromise among critical factors such empirical investigations, coupled with insights gleaned as training speed, output quality, and, notably, from alternative implementations, have indicated the memory utilization. The careful consideration potential for the utilization of the Mean Absolute Error of these factors was essential in achieving a har- (MAE) to yield superior outcomes. These findings prompt monious balance that not only facilitated expedi- a reevaluation of the chosen loss function, necessitating a tious model training but also ensured the preser- thorough exploration of the implications associated with vation of high-quality outcomes while efficiently the adoption of MAE within the framework of the study’s managing the computational memory resources. objectives. Such a revision stands to enhance the efficacy This particular batch size allocation emerged as and fidelity of the model’s predictive capabilities, thereby the most effective compromise, aligning with the warranting comprehensive investigation and validation overarching objectives of the experiment and con- within the context of the research endeavor. Figure 5: Example of Generated DBT 1. 3.4. Sample quality cantly influence the visual characteristics of the image. While it is evident that the current quality of our sam- ples may not meet the stringent standards required for 4. Discussion and Conclusion certain applications (see Fig.5), it is essential to recog- nize the promising aspects of the results. Despite the Through meticulous experimentation and analysis, the imperfections, the dataset produced presents a valuable study demonstrated the capability of DDPM to produce foundation upon which improvements can be built. Ad- artificial DBT images that closely emulate the intrica- ditionally, it is noteworthy that upon examination of the cies of real-world cases. The promising results obtained histogram (see Fig.6), the generated samples exhibit lower pave the way for future investigations and applications contrast compared to the desired standards. This contrast of DDPM in the realm of medical imaging. The poten- deficiency is a significant aspect that requires attention tial for refining and expanding upon these generative to ensure that the generated images meet the necessary models opens avenues for further research, contributing quality thresholds for clinical applications. Contrary to to the ongoing evolution of DBT technology. Future ap- plication may involve the conditioning of the sampling procedure, allowing for the deliberate manipulation of the generated samples. In this context, this phenomenon is alternatively denoted as guided diffusion [21], [22]. Moreover, Latent diffusion models could be introduced. In these models an initial step involves projecting the input into a more compact latent space, where the diffu- sion process is subsequently applied. To elaborate further, Rombach et al. [23] proposed the utilization of an en- coder network, denoted as 𝑔(𝑥𝑡 ) = 𝑧𝑡 , to encode the input into a latent representation 𝑧𝑡 . This strategic choice aims to alleviate the computational demands associated with training diffusion models by conducting processing in a lower-dimensional space. Following this encoding, a conventional diffusion model, specifically a U-Net, is em- Figure 6: Pixel distribution. ployed to generate new data. The resultant data are then upsampled through a decoder network. the histogram relative to the generated images, the one related to real images displays a distribution of pixel 5. Acknowledgments intensities characterized by the accumulation of values around two distinct modes. This suggests the presence The work was supported by the PNRR MUR project of two dominant intensity regions within the image. The PE0000013-FAIR. separation and distribution of these modes can signifi- References national Symposium on Cluster, Cloud and Internet Computing (CCGrid), 2022, pp. 913–920. [1] H.-P. Chan, R. K. Samala, L. M. Hadjiiski, C. Zhou, [12] R. Moore, D. Kopans, E. Rafferty, D. Georgian-Smith, Deep learning in medical image analysis, in: R. Hitt, E. Yeh, Initial callback rates for conventional Advances in Experimental Medicine and Biology, and digital breast tomosynthesis mammography Springer International Publishing, 2020. comparison in the screening setting, in: Radio- [2] S. Destounis, Role of digital breast tomosynthesis logical Society of North America 93rd Scientific in screening and diagnostic breast imaging (2018). Assembly and Annual Meeting, Nov, 2007. [3] F. D. Marco, A. A. Citarella, L. D. Biasi, L. D’Errico, [13] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, R. Francese, G. Mettivier, M. Staffa, G. Tortora, Ai- D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, based solutions for the analysis of biomedical im- Generative adversarial networks, Commun. ACM ages and signals, in: Proceedings of the Conference 63 (2020) 139–144. on Artificial Intelligence (Ital IA 2023), Pisa, Italy, [14] L. Pinheiro Cinelli, M. Araújo Marins, E. A. Bar- May 29-30, 2023, volume 3486 of CEUR Workshop ros da Silva, S. Lima Netto, Variational autoencoder Proceedings, 2023, pp. 171–176. (2021) 111–149. [4] L. Taylor, G. Nitschke, Improving deep learning [15] A. Kazerouni, E. K. Aghdam, M. Heidari, R. Azad, with generic data augmentation, in: 2018 IEEE M. Fayyaz, I. Hacihaliloglu, D. Merhof, Diffusion Symposium Series on Computational Intelligence models in medical imaging: A comprehensive sur- (SSCI), 2018, pp. 1542–1547. vey, Medical Image Analysis 88 (2023) 102846. [5] A. Q. Nichol, P. Dhariwal, Improved denoising dif- [16] J. Ho, A. Jain, P. Abbeel, Denoising diffusion prob- fusion probabilistic models, in: M. Meila, T. Zhang abilistic models, arXiv preprint arxiv:2006.11239 (Eds.), Proceedings of the 38th International Con- (2020). ference on Machine Learning, volume 139 of Pro- [17] S. Zagoruyko, N. Komodakis, Wide residual net- ceedings of Machine Learning Research, PMLR, 2021, works, 2016. URL: http://arxiv.org/abs/1605.07146, pp. 8162–8171. cite arxiv:1605.07146. [6] D. Förnvik, S. Zackrisson, O. Ljungberg, T. Svahn, [18] A. Kolesnikov, L. Beyer, X. Zhai, J. Puigcerver, P. Timberg, A. Tingberg, I. Andersson, Breast to- J. Yung, S. Gelly, N. Houlsby, Big transfer (BiT): mosynthesis: accuracy of tumor measurement com- General visual representation learning (2019). pared with digital mammography and ultrasonog- [19] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, raphy, Acta radiologica 51 (2010) 240–247. L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, At- [7] B. D. Fornage, O. Toubas, M. Morel, Clinical, mam- tention is all you need (2017). mographic, and sonographic determination of pre- [20] M. Staffa, P. Barra, Multi-monitor system for adap- operative breast cancer size, Cancer 60 (1987) tive image saliency detection based on attentive 765–771. mechanisms, in: H. Degen, S. Ntoa (Eds.), Artifi- [8] L. T. Niklason, B. T. Christian, L. E. Niklason, D. B. cial Intelligence in HCI - 4th International Confer- Kopans, D. E. Castleberry, B. Opsahl-Ong, C. E. ence, AI-HCI 2023, Held as Part of the 25th HCI Landberg, P. J. Slanetz, A. A. Giardino, R. Moore, International Conference, HCII, volume 14051 of et al., Digital tomosynthesis in breast imaging, Ra- Lecture Notes in Computer Science, Springer, 2023, diology 205 (1997) 399–406. pp. 607–617. [9] N. P. Tirada, G. Li, D. Dreizin, L. Robinson, G. R. [21] A. Q. Nichol, P. Dhariwal, A. Ramesh, P. Shyam, Khorjekar, S. A. Dromi, T. S. Ernst, Digital breast to- P. Mishkin, B. McGrew, I. Sutskever, M. Chen, Glide: mosynthesis: Physics, artifacts, and quality control Towards photorealistic image generation and edit- considerations., Radiographics : a review publica- ing with text-guided diffusion models., in: ICML, tion of the Radiological Society of North America, volume 162 of Proceedings of Machine Learning Re- Inc 39 2 (2019) 413–426. search, PMLR, 2022, pp. 16784–16804. [10] I. Andersson, D. M. Ikeda, S. Zackrisson, M. Ruschin, [22] C. Saharia, W. Chan, S. Saxena, L. Li, J. Whang, T. Svahn, P. Timberg, A. Tingberg, Breast tomosyn- E. Denton, S. K. S. Ghasemipour, B. K. Ayan, S. S. thesis and digital mammography: a comparison of Mahdavi, R. G. Lopes, T. Salimans, J. Ho, D. J. Fleet, breast cancer visibility and birads classification in a M. Norouzi, Photorealistic text-to-image diffusion population of cancers with subtle mammographic models with deep language understanding., CoRR findings, European radiology 18 (2008) 2817–2825. abs/2205.11487 (2022). [11] M. Staffa, L. D’Errico, R. Ricciardi, P. Barra, E. Antig- [23] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, nani, S. Minelli, G. Mettivier, How to increase and B. Ommer, High-resolution image synthesis with balance current dbt datasets via an evolutionary latent diffusion models., in: CVPR, IEEE, 2022, pp. gan: preliminary results, in: 2022 22nd IEEE Inter- 10674–10685.