=Paper=
{{Paper
|id=Vol-3762/525
|storemode=property
|title=Denoising Diffusion Probabilistic Models for DBT data augmentation: preliminary results
|pdfUrl=https://ceur-ws.org/Vol-3762/525.pdf
|volume=Vol-3762
|authors=Lorenzo D'Errico,Lorenzo Pergamo,Daniel Riccio,Mariacarla Staffa
|dblpUrl=https://dblp.org/rec/conf/ital-ia/DErricoPRS24
}}
==Denoising Diffusion Probabilistic Models for DBT data augmentation: preliminary results==
Denoising Diffusion Probabilistic Models for DBT data
augmentation: preliminary results
Lorenzo D’Errico1 , Lorenzo Pergamo2 , Daniel Riccio1 and Mariacarla Staffa2,∗
1
University of Naples Federico II
2
University of Naples Parthenope
Abstract
Recent strides in computer vision have led to promising breakthroughs in the realm of image generation. Notably, diffusion
probabilistic models such as DALL-E 2, Imagen, and Stable Diffusion have demonstrated the ability to create lifelike images
based on textual prompts. Yet, their potential application in the medical domain, where intricate three-dimensional image
volumes are commonplace, remains largely untapped. Synthetic imagery presents a compelling avenue in the realm of
privacy-preserving artificial intelligence and holds immense potential for enriching datasets with limited samples. This study
seeks to assess the effectiveness of diffusion probabilistic models in synthesizing high-fidelity medical imaging data, with a
particular focus on Digital Breast Tomosynthesis (DBT) images.
Keywords
Denoising Diffusion Probabilistic Models (DDPMs), Breast Tomosynthesis (DBT), Generative Models
1. Introduction data to effectively learn, a requirement that often proves
costly and labor-intensive to fulfill. This challenge is
The success of deep learning across various pattern recog- particularly pertinent for digital breast tomosynthesis
nition tasks has ignited widespread excitement and el- (DBT), which represents a relatively novel breast cancer
evated expectations regarding its potential impact on screening modality. Data augmentation offers a solution
healthcare [1]. Concurrently, Digital Breast Tomosynthe- by artificially expanding the training set through label-
sis (DBT) has emerged as a transformative technology preserving transformations [4]. This study aims to lever-
in breast cancer screening and diagnosis. Since its clini- age Denoising Diffusion Probabilistic Models (DDPMs),
cal debut in 2011, radiologists specializing in breast dis- a class of generative models, to generate synthetic sam-
ease diagnosis nationwide have increasingly adopted this ples for Digital Breast Tomosynthesis (DBT). DDPMs
innovative approach for both screening and diagnostic have garnered significant attention across various do-
purposes, with its adoption steadily rising [2]. The con- mains for their ability to produce synthetic data of ex-
vergence of DBT and AI presents significant promise, of- ceptional quality. These models function by iteratively
fering opportunities for heightened precision, efficiency, introducing noise to an input signal, such as an image,
and overall advancements in breast cancer screening and text, or audio, and then learning the denoising process
diagnosis. As the healthcare landscape evolves, the fu- to generate novel samples. In the realm of image syn-
sion of DBT and AI holds the potential to revolutionize thesis, DDPMs have demonstrated success in generating
breast cancer detection and management [3]. Integrating authentic and high-quality images, bolstered by com-
deep learning algorithms with DBT data could lead to petitive log-likelihoods that attest to their effectiveness
more accurate and timely identification of abnormalities, in diverse generative tasks [5]. The overarching objec-
thereby improving patient outcomes. Despite the opti- tive is to address the scarcity and imbalance in existing
mism surrounding this new era of machine learning, the datasets, thereby enhancing the quality of deep learning
development and implementation of AI tools in clinical algorithms, particularly those related to segmentation
settings encounter numerous challenges [2]. Deep ar- and detection. The proposed methodology entails using
tificial neural networks necessitate substantial training synthetic DBT samples as a form of data augmentation to
mitigate the constraints associated with current dataset
Ital-IA 2024: 4th National Conference on Artificial Intelligence, orga-
nized by CINI, May 29-30, 2024, Naples, Italy availability. This augmentation strategy is expected to
∗
Corresponding author. contribute to the refinement of deep learning algorithms,
†
These authors contributed equally. ultimately driving advancements in segmentation and de-
Envelope-Open lorenzo.derrico@unina.it (L. D’Errico); tection algorithms within the context of DBT. The study
lorenzo.pergamo001@studenti.uniparthenope.it (L. Pergamo); is structured as follows: in Section 2 an overview of the
daniel.riccio@unina.it (D. Riccio);
mariacarla.staffa@uniparthenope.it (M. Staffa)
DBT technology as well as of DDPM models it’s given;
Orcid 0000-0001-8044-8224 (L. D’Errico); 0000-0002-5844-0602 Section 3 delves into the results obtained and finally in
(D. Riccio); 0000-0001-7656-8370 (M. Staffa) Section 4 future perspectives are discussed.
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0
International (CC BY 4.0).
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
2. Matherial and Methods
2.1. Digital Breast Tomosynthesis
Full-field digital mammography (FFDM) has traditionally
been the primary breast cancer screening method, but its
effectiveness is hindered by inherent limitations. Visual-
izing complex breast structures in two dimensions often
leads to obscured tumor margins and inaccurate lesion
characterization due to overlapping tissue [6]. Standard
imaging projections may not capture the full extent of Figure 2: Comparison between a 2D mammogram and a
irregular or multifocal tumors [7], further complicating 3D one. In Digital Breast Tomosynthesis (right), tumors are
accurate diagnosis. Variations in breast composition, po- detected, unlike in mammography (left) where tissue overlap
obstructs the view of the specialist doctor [11].
sitioning artifacts, and tissue compression during imag-
ing introduce variability into tumor size estimation and
localization. In contrast, Digital Breast Tomosynthesis
(DBT) (see Fig.1), approved by the FDA in 2011, revolu- lutionize breast healthcare delivery by facilitating early
tionizes breast imaging by acquiring a series of low-dose detection and treatment of breast cancer.
X-ray images from multiple angles and reconstructing
them into a 3D dataset [8]. This enables radiologists to 2.2. Denoising Diffusion Probabilistic
Models
Diffusion models represent an advanced category of gen-
erative models renowned for their efficacy in capturing
intricate data distributions. Despite being a recent addi-
tion to the generative learning field, they have proven
valuable across diverse applications. The three domi-
nant generative frameworks are identified as Generative
Adversarial Networks (GANs) [13], Variational Autoen-
coders (VAEs) [14], and normalizing flows [15]. These
models, falling under the category of probabilistic gener-
ative models, are adept at capturing intricate data distri-
butions, establishing themselves as a formidable tool in
various applications. A Denoising Diffusion Probabilistic
Model (DDPM) is a parameterized Markov chain trained
Figure 1: Digital Breast Tomosynthesis procedure. using variational inference to produce samples matching
the data after finite time (see Fig.3). DDPM are composed
navigate breast tissue in three dimensions, overcoming of two opposite processes, forward and reverse diffusion
the limitations of 2D mammography [9]. DBT enhances process. In the forward diffusion process, Gaussian noise
lesion visualization, improves diagnostic accuracy, and is gradually and iteratively introduced to intentionally
outperforms FFDM in detecting invasive cancers and ar- perturb the images within the training set, aiming to in-
chitectural distortions [10] (see Fig.2). Advanced recon- duce a transformation wherein they deviate from their
struction algorithms and image processing techniques current distribution and align more closely with a normal
further enhance DBT’s diagnostic utility, allowing for distribution. In the reverse diffusion process, the objec-
the detection of smaller lesions with greater confidence tive is to systematically invert the preceding forward
[12]. DBT reduces false positives, minimizes unneces- diffusion procedure. The reversal is conducted gradually
sary recalls, and optimizes patient outcomes by providing and iteratively to counteract the perturbation applied to
clearer, more detailed images. Integration of quantitative images in the forward process. Starting where the for-
imaging biomarkers and machine learning algorithms ward process concludes, the advantage of initiating from
augments DBT’s diagnostic capabilities, ushering in per- a normal distribution lies in the known methodology
sonalized breast cancer screening and management. In for sampling points from this uncomplicated distribu-
conclusion, DBT represents a transformative advance- tion. The primary aim is to discern the means to revert
ment in breast cancer imaging, promising unparalleled to the original data distribution. Nonetheless, the chal-
diagnostic accuracy and improved patient outcomes. As lenge arises from the potential for an infinite array of
research and technology progress, DBT is poised to revo- trajectories originating from a point in this ostensibly
simple space, with only a fraction leading to the data dis- 3. Experimental Results
tribution. Within the context of DDPM, this is achieved
by referencing the incremental steps undertaken in the This section presents a detailed analysis of the experi-
forward diffusion process. The probability density func- mental setup employed for training and evaluating the
tion (PDF) corresponding to the corrupted images in the DDPM. The results are presented in a structured manner,
forward process exhibits slight variations at each step. showcasing the model’s ability to capture and simulate
Consequently, in the reverse process, a deep-learning the intricate features present in authentic DBT images.
model is employed at each step to prognosticate the PDF Additionally, we explore the impact of key hyperparame-
parameters of the forward process. Subsequent to model ters on the synthesis process.
training, any point in the simple space can be selected,
and the model can be utilized iteratively to navigate back 3.1. Description of Dataset
to the data subspace. In reverse diffusion, denoising is sys-
tematically performed in small steps, commencing from The dataset comprises patient records from individuals
a noisy image. This method of training and generating who underwent Digital Breast Tomosynthesis (DBT) ex-
new samples is characterized by enhanced stability com- aminations at the Duke Health system between January
pared to Generative Adversarial Networks (GANs) and 1, 2014, and January 30, 2018. The acquisition process in-
surpasses prior approaches such as Variational Autoen- volved cross-referencing information from radiology re-
coders (VAE) and normalizing flows. Diffusion models, ports, pathology reports, and DBT data obtained from the
as outlined in the literature [16], are a category of latent Picture Archiving and Communication Systems (PACS)
variable models represented by the equation at Duke. These studies encompassed a total of 13,954
unique patients, each with at least one craniocaudal (CC)
𝑝𝜃 (𝑥0 ) ∶= ∫ 𝑝𝜃 (𝑥0∶𝑇 ) 𝑑𝑥1∶𝑇 , and mediolateral oblique (MLO) view available for either
the left or right breast (see Fig.4).
where 𝑥1 , … , 𝑥𝑇 are latent variables of the same dimen- The dataset is organized into three sets: a training set
sionality as the data 𝑥0 ∼ 𝑞(𝑥0 ). The joint distribution comprising 1.42 TB, a validation set comprising 84.71 GB,
𝑝𝜃 (𝑥0∶𝑇 ) is denoted as the reverse process, constituting and a test set comprising 135.14 GB. The images in the
a Markov chain with learned Gaussian transitions that dataset are in a DICOM format and were processed using
initiate at 𝑝(𝑥𝑇 ) = 𝒩 (𝑥𝑇 ; 0, 𝐼 ): the torchio library for reading. To make them consistent
𝑇 and ready for analysis and research, all the pictures were
𝑝𝜃 (𝑥0∶𝑇 ) ∶= 𝑝(𝑥𝑇 ) ∏ 𝑝𝜃 (𝑥𝑡−1 |𝑥𝑡 ), reshaped to dimensions of 64x64 pixels with 8 slices.
𝑡=1 The dimensions of the images have been systematically
𝑝𝜃 (𝑥𝑡−1 |𝑥𝑡 ) ∶= 𝒩 (𝑥𝑡−1 ; 𝜇𝜃 (𝑥𝑡 , 𝑡), Σ𝜃 (𝑥𝑡 , 𝑡)). reduced through an iterative process aimed at preserving
the utmost quality of the visual content. This iterative
Regarding the structural design of the model, it’s note-
approach has been employed with the primary objective
worthy that the dimensions of both the input and output
of maintaining the highest possible image quality while
of the model should align. To achieve this objective, Ho
undergoing size reduction. This change is not just for
et al. [16] utilized a U-Net architecture, thereby ensur-
analysis, but it also enhance computational efficiency.
ing compatibility in size between the input and output
components of the model. From the typical UNet ar-
chitecture, the conventional double convolution at each 3.2. Hardware
level was replaced with Residual blocks as employed in
The principal objective of the present research is cen-
ResNet models. In the DDPM implementation, a Wide
tered on the generation of synthetic samples in Digital
ResNet block was employed as per Zagoruyko et al. [17].
Breast Tomosynthesis (DBT) through the application of
However, in the adaptation by Phil Wang, the standard
a sophisticated Denoising Diffusion Probabilistic Model.
convolutional layer was replaced with a weight standard-
To optimize the computational procedures inherent in
ized version, recognized for its improved performance
this complex task, the foundational code underwent a
in conjunction with group normalization as outlined by
process of parallelization on four Tesla V100 Graphics
Kolesnikov et al. [18]. Moreover, in order to maintain
Processing Units (GPUs). It is imperative to note that
parameter consistency across various time instances, si-
the parallelization strategy employed pertained specif-
nusoidal position embeddings are incorporated, drawing
ically to the data level, signifying that the dataset was
inspiration from the Transformer model [19]. This inte-
effectively partitioned and processed concurrently across
gration facilitates the neural network in discerning the
all GPUs. This strategic approach played a pivotal role
relevant time step (noise level) for each image within a
in amplifying both the efficiency and expeditiousness
batch. The SinusoidalPositionEmbeddings module has
of the model training and synthetic sample generation,
been used in this work. Finally, an attention module is
introduced from the Transformer architecture [19, 20].
Figure 3: DDPM Architecture.
tributing to the overall success of the research
endeavor.
• Learning Rate. It was observed that the adoption
of the smallest learning rate, 1e-4, conferred su-
perior outcomes. This discernment underscores
the model’s susceptibility to nuanced parameter
Figure 4: Example slices from the dataset. adjustments, wherein diminutive updates within
the parameter space correlated with enhanced
performance. The adaptive characteristics inher-
thereby significantly augmenting the overall efficacy of ent to the Adam optimizer, which dynamically
the research. adjusts learning rates based on historical gradi-
ents, likely played a pivotal role in the efficacy of
this minimal learning rate.
3.3. Model deployed
Hyperparameters Loss function In the work, a deliberate choice was
made to employ the Mean Squared Error (MSE) as the
• Batch Size. A judiciously determined batch size primary loss function. This decision was founded upon
of 16 was allocated for each of the four Graphics the premise of calculating the disparity between the noise
Processing Units (GPUs) employed. This strate- introduced to images and the corresponding noise predic-
gic selection was grounded in the quest for an tions generated by the UNet model. However, subsequent
optimal compromise among critical factors such empirical investigations, coupled with insights gleaned
as training speed, output quality, and, notably, from alternative implementations, have indicated the
memory utilization. The careful consideration potential for the utilization of the Mean Absolute Error
of these factors was essential in achieving a har- (MAE) to yield superior outcomes. These findings prompt
monious balance that not only facilitated expedi- a reevaluation of the chosen loss function, necessitating a
tious model training but also ensured the preser- thorough exploration of the implications associated with
vation of high-quality outcomes while efficiently the adoption of MAE within the framework of the study’s
managing the computational memory resources. objectives. Such a revision stands to enhance the efficacy
This particular batch size allocation emerged as and fidelity of the model’s predictive capabilities, thereby
the most effective compromise, aligning with the warranting comprehensive investigation and validation
overarching objectives of the experiment and con- within the context of the research endeavor.
Figure 5: Example of Generated DBT 1.
3.4. Sample quality cantly influence the visual characteristics of the image.
While it is evident that the current quality of our sam-
ples may not meet the stringent standards required for 4. Discussion and Conclusion
certain applications (see Fig.5), it is essential to recog-
nize the promising aspects of the results. Despite the Through meticulous experimentation and analysis, the
imperfections, the dataset produced presents a valuable study demonstrated the capability of DDPM to produce
foundation upon which improvements can be built. Ad- artificial DBT images that closely emulate the intrica-
ditionally, it is noteworthy that upon examination of the cies of real-world cases. The promising results obtained
histogram (see Fig.6), the generated samples exhibit lower pave the way for future investigations and applications
contrast compared to the desired standards. This contrast of DDPM in the realm of medical imaging. The poten-
deficiency is a significant aspect that requires attention tial for refining and expanding upon these generative
to ensure that the generated images meet the necessary models opens avenues for further research, contributing
quality thresholds for clinical applications. Contrary to to the ongoing evolution of DBT technology. Future ap-
plication may involve the conditioning of the sampling
procedure, allowing for the deliberate manipulation of
the generated samples. In this context, this phenomenon
is alternatively denoted as guided diffusion [21], [22].
Moreover, Latent diffusion models could be introduced.
In these models an initial step involves projecting the
input into a more compact latent space, where the diffu-
sion process is subsequently applied. To elaborate further,
Rombach et al. [23] proposed the utilization of an en-
coder network, denoted as 𝑔(𝑥𝑡 ) = 𝑧𝑡 , to encode the input
into a latent representation 𝑧𝑡 . This strategic choice aims
to alleviate the computational demands associated with
training diffusion models by conducting processing in
a lower-dimensional space. Following this encoding, a
conventional diffusion model, specifically a U-Net, is em-
Figure 6: Pixel distribution. ployed to generate new data. The resultant data are then
upsampled through a decoder network.
the histogram relative to the generated images, the one
related to real images displays a distribution of pixel 5. Acknowledgments
intensities characterized by the accumulation of values
around two distinct modes. This suggests the presence The work was supported by the PNRR MUR project
of two dominant intensity regions within the image. The PE0000013-FAIR.
separation and distribution of these modes can signifi-
References national Symposium on Cluster, Cloud and Internet
Computing (CCGrid), 2022, pp. 913–920.
[1] H.-P. Chan, R. K. Samala, L. M. Hadjiiski, C. Zhou, [12] R. Moore, D. Kopans, E. Rafferty, D. Georgian-Smith,
Deep learning in medical image analysis, in: R. Hitt, E. Yeh, Initial callback rates for conventional
Advances in Experimental Medicine and Biology, and digital breast tomosynthesis mammography
Springer International Publishing, 2020. comparison in the screening setting, in: Radio-
[2] S. Destounis, Role of digital breast tomosynthesis logical Society of North America 93rd Scientific
in screening and diagnostic breast imaging (2018). Assembly and Annual Meeting, Nov, 2007.
[3] F. D. Marco, A. A. Citarella, L. D. Biasi, L. D’Errico, [13] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu,
R. Francese, G. Mettivier, M. Staffa, G. Tortora, Ai- D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio,
based solutions for the analysis of biomedical im- Generative adversarial networks, Commun. ACM
ages and signals, in: Proceedings of the Conference 63 (2020) 139–144.
on Artificial Intelligence (Ital IA 2023), Pisa, Italy, [14] L. Pinheiro Cinelli, M. Araújo Marins, E. A. Bar-
May 29-30, 2023, volume 3486 of CEUR Workshop ros da Silva, S. Lima Netto, Variational autoencoder
Proceedings, 2023, pp. 171–176. (2021) 111–149.
[4] L. Taylor, G. Nitschke, Improving deep learning [15] A. Kazerouni, E. K. Aghdam, M. Heidari, R. Azad,
with generic data augmentation, in: 2018 IEEE M. Fayyaz, I. Hacihaliloglu, D. Merhof, Diffusion
Symposium Series on Computational Intelligence models in medical imaging: A comprehensive sur-
(SSCI), 2018, pp. 1542–1547. vey, Medical Image Analysis 88 (2023) 102846.
[5] A. Q. Nichol, P. Dhariwal, Improved denoising dif- [16] J. Ho, A. Jain, P. Abbeel, Denoising diffusion prob-
fusion probabilistic models, in: M. Meila, T. Zhang abilistic models, arXiv preprint arxiv:2006.11239
(Eds.), Proceedings of the 38th International Con- (2020).
ference on Machine Learning, volume 139 of Pro- [17] S. Zagoruyko, N. Komodakis, Wide residual net-
ceedings of Machine Learning Research, PMLR, 2021, works, 2016. URL: http://arxiv.org/abs/1605.07146,
pp. 8162–8171. cite arxiv:1605.07146.
[6] D. Förnvik, S. Zackrisson, O. Ljungberg, T. Svahn, [18] A. Kolesnikov, L. Beyer, X. Zhai, J. Puigcerver,
P. Timberg, A. Tingberg, I. Andersson, Breast to- J. Yung, S. Gelly, N. Houlsby, Big transfer (BiT):
mosynthesis: accuracy of tumor measurement com- General visual representation learning (2019).
pared with digital mammography and ultrasonog- [19] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit,
raphy, Acta radiologica 51 (2010) 240–247. L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, At-
[7] B. D. Fornage, O. Toubas, M. Morel, Clinical, mam- tention is all you need (2017).
mographic, and sonographic determination of pre- [20] M. Staffa, P. Barra, Multi-monitor system for adap-
operative breast cancer size, Cancer 60 (1987) tive image saliency detection based on attentive
765–771. mechanisms, in: H. Degen, S. Ntoa (Eds.), Artifi-
[8] L. T. Niklason, B. T. Christian, L. E. Niklason, D. B. cial Intelligence in HCI - 4th International Confer-
Kopans, D. E. Castleberry, B. Opsahl-Ong, C. E. ence, AI-HCI 2023, Held as Part of the 25th HCI
Landberg, P. J. Slanetz, A. A. Giardino, R. Moore, International Conference, HCII, volume 14051 of
et al., Digital tomosynthesis in breast imaging, Ra- Lecture Notes in Computer Science, Springer, 2023,
diology 205 (1997) 399–406. pp. 607–617.
[9] N. P. Tirada, G. Li, D. Dreizin, L. Robinson, G. R. [21] A. Q. Nichol, P. Dhariwal, A. Ramesh, P. Shyam,
Khorjekar, S. A. Dromi, T. S. Ernst, Digital breast to- P. Mishkin, B. McGrew, I. Sutskever, M. Chen, Glide:
mosynthesis: Physics, artifacts, and quality control Towards photorealistic image generation and edit-
considerations., Radiographics : a review publica- ing with text-guided diffusion models., in: ICML,
tion of the Radiological Society of North America, volume 162 of Proceedings of Machine Learning Re-
Inc 39 2 (2019) 413–426. search, PMLR, 2022, pp. 16784–16804.
[10] I. Andersson, D. M. Ikeda, S. Zackrisson, M. Ruschin, [22] C. Saharia, W. Chan, S. Saxena, L. Li, J. Whang,
T. Svahn, P. Timberg, A. Tingberg, Breast tomosyn- E. Denton, S. K. S. Ghasemipour, B. K. Ayan, S. S.
thesis and digital mammography: a comparison of Mahdavi, R. G. Lopes, T. Salimans, J. Ho, D. J. Fleet,
breast cancer visibility and birads classification in a M. Norouzi, Photorealistic text-to-image diffusion
population of cancers with subtle mammographic models with deep language understanding., CoRR
findings, European radiology 18 (2008) 2817–2825. abs/2205.11487 (2022).
[11] M. Staffa, L. D’Errico, R. Ricciardi, P. Barra, E. Antig- [23] R. Rombach, A. Blattmann, D. Lorenz, P. Esser,
nani, S. Minelli, G. Mettivier, How to increase and B. Ommer, High-resolution image synthesis with
balance current dbt datasets via an evolutionary latent diffusion models., in: CVPR, IEEE, 2022, pp.
gan: preliminary results, in: 2022 22nd IEEE Inter- 10674–10685.