-

Ital-IA

1613-0073

Models for DBT data augmentation: preliminary results

Lorenzo D'Errico

0 2

Lorenzo Pergamo

1 2

Daniel Riccio

daniel.riccio@unina.it 0 2

Mariacarla Stafa

mariacarla.staffa@uniparthenope.it 1 2

Denoising Difusion Probabilistic Models (DDPMs)

Breast Tomosynthesis (DBT)

Generative Models

2 0 University of Naples Federico II 1 University of Naples Parthenope 2 age Denoising Difusion Probabilistic Models , DDPMs

2024

4 29 30

Recent strides in computer vision have led to promising breakthroughs in the realm of image generation. Notably, difusion probabilistic models such as DALL-E 2, Imagen, and Stable Difusion have demonstrated the ability to create lifelike images based on textual prompts. Yet, their potential application in the medical domain, where intricate three-dimensional image volumes are commonplace, remains largely untapped. Synthetic imagery presents a compelling avenue in the realm of privacy-preserving artificial intelligence and holds immense potential for enriching datasets with limited samples. This study seeks to assess the efectiveness of difusion probabilistic models in synthesizing high-fidelity medical imaging data, with a particular focus on Digital Breast Tomosynthesis (DBT) images.

CEUR ceur-ws.org

1. Introduction

The success of deep learning across various pattern recogevated expectations regarding its potential impact on tection algorithms within the context of DBT. The study is structured as follows: in Section 2 an overview of the

DBT technology as well as of DDPM models it’s given; Section 3 delves into the results obtained and finally in

Section 4 future perspectives are discussed. diagnosis. As the healthcare landscape evolves, the fu- to generate novel samples. In the realm of image synthereby improving patient outcomes. Despite the opti- tive is to address the scarcity and imbalance in existing tificial neural networks necessitate substantial training settings encounter numerous challenges [2]. Deep ar- and detection. The proposed methodology entails using

2. Matherial and Methods 2.1. Digital Breast Tomosynthesis

Full-field digital mammography (FFDM) has traditionally been the primary breast cancer screening method, but its efectiveness is hindered by inherent limitations. Visualizing complex breast structures in two dimensions often leads to obscured tumor margins and inaccurate lesion characterization due to overlapping tissue [6]. Standard imaging projections may not capture the full extent of Figure 2: Comparison between a 2D mammogram and a irregular or multifocal tumors [7], further complicating 3D one. In Digital Breast Tomosynthesis (right), tumors are accurate diagnosis. Variations in breast composition, po- detected, unlike in mammography (left) where tissue overlap sitioning artifacts, and tissue compression during imag- obstructs the view of the specialist doctor [11]. ing introduce variability into tumor size estimation and localization. In contrast, Digital Breast Tomosynthesis (DBT) (see Fig.1), approved by the FDA in 2011, revolu- lutionize breast healthcare delivery by facilitating early tionizes breast imaging by acquiring a series of low-dose detection and treatment of breast cancer. X-ray images from multiple angles and reconstructing them into a 3D dataset [8]. This enables radiologists to

2.2. Denoising Difusion Probabilistic Models Difusion models represent an advanced category of gen

erative models renowned for their eficacy in capturing intricate data distributions. Despite being a recent addition to the generative learning field, they have proven valuable across diverse applications. The three dominant generative frameworks are identified as Generative Adversarial Networks (GANs) [13], Variational Autoencoders (VAEs) [14], and normalizing flows [ 15]. These models, falling under the category of probabilistic generative models, are adept at capturing intricate data distributions, establishing themselves as a formidable tool in various applications. A Denoising Difusion Probabilistic Model (DDPM) is a parameterized Markov chain trained Figure 1: Digital Breast Tomosynthesis procedure. using variational inference to produce samples matching the data after finite time (see Fig. 3). DDPM are composed navigate breast tissue in three dimensions, overcoming of two opposite processes, forward and reverse difusion the limitations of 2D mammography [9]. DBT enhances process. In the forward difusion process, Gaussian noise lesion visualization, improves diagnostic accuracy, and is gradually and iteratively introduced to intentionally outperforms FFDM in detecting invasive cancers and ar- perturb the images within the training set, aiming to inchitectural distortions [10] (see Fig.2). Advanced recon- duce a transformation wherein they deviate from their struction algorithms and image processing techniques current distribution and align more closely with a normal further enhance DBT’s diagnostic utility, allowing for distribution. In the reverse difusion process, the objecthe detection of smaller lesions with greater confidence tive is to systematically invert the preceding forward [12]. DBT reduces false positives, minimizes unneces- difusion procedure. The reversal is conducted gradually sary recalls, and optimizes patient outcomes by providing and iteratively to counteract the perturbation applied to clearer, more detailed images. Integration of quantitative images in the forward process. Starting where the forimaging biomarkers and machine learning algorithms ward process concludes, the advantage of initiating from augments DBT’s diagnostic capabilities, ushering in per- a normal distribution lies in the known methodology sonalized breast cancer screening and management. In for sampling points from this uncomplicated distribuconclusion, DBT represents a transformative advance- tion. The primary aim is to discern the means to revert ment in breast cancer imaging, promising unparalleled to the original data distribution. Nonetheless, the chaldiagnostic accuracy and improved patient outcomes. As lenge arises from the potential for an infinite array of research and technology progress, DBT is poised to revo- trajectories originating from a point in this ostensibly simple space, with only a fraction leading to the data distribution. Within the context of DDPM, this is achieved by referencing the incremental steps undertaken in the forward difusion process. The probability density function (PDF) corresponding to the corrupted images in the forward process exhibits slight variations at each step.

Consequently, in the reverse process, a deep-learning model is employed at each step to prognosticate the PDF parameters of the forward process. Subsequent to model training, any point in the simple space can be selected, and the model can be utilized iteratively to navigate back to the data subspace. In reverse difusion, denoising is systematically performed in small steps, commencing from a noisy image. This method of training and generating new samples is characterized by enhanced stability compared to Generative Adversarial Networks (GANs) and surpasses prior approaches such as Variational Autoencoders (VAE) and normalizing flows. Difusion models, as outlined in the literature [16], are a category of latent variable models represented by the equation

( 0) ∶= ∫ ( 0∶ ) 1∶ , where 1, … , are latent variables of the same dimensionality as the data 0 ∼ (

0). The joint distribution ( 0∶ ) is denoted as the reverse process, constituting a Markov chain with learned Gaussian transitions that initiate at ( ) = ( ; 0, ) : ( 0∶ ) ∶= ( ) ∏ ( −1 | ), =1 ( −1 | ) ∶= (

−1 ; ( , ), Σ ( , )).

Regarding the structural design of the model, it’s noteworthy that the dimensions of both the input and output of the model should align. To achieve this objective, Ho et al. [16] utilized a U-Net architecture, thereby ensuring compatibility in size between the input and output components of the model. From the typical UNet architecture, the conventional double convolution at each level was replaced with Residual blocks as employed in

ResNet models. In the DDPM implementation, a Wide ResNet block was employed as per Zagoruyko et al. [17]. However, in the adaptation by Phil Wang, the standard

convolutional layer was replaced with a weight standardized version, recognized for its improved performance in conjunction with group normalization as outlined by Kolesnikov et al. [18]. Moreover, in order to maintain parameter consistency across various time instances, sinusoidal position embeddings are incorporated, drawing inspiration from the Transformer model [19]. This integration facilitates the neural network in discerning the relevant time step (noise level) for each image within a batch. The SinusoidalPositionEmbeddings module has been used in this work. Finally, an attention module is introduced from the Transformer architecture [19, 20].

3. Experimental Results This section presents a detailed analysis of the experi

mental setup employed for training and evaluating the

DDPM. The results are presented in a structured manner,

showcasing the model’s ability to capture and simulate the intricate features present in authentic DBT images.

Additionally, we explore the impact of key hyperparameters on the synthesis process. 3.1. Description of Dataset The dataset comprises patient records from individuals

who underwent Digital Breast Tomosynthesis (DBT) examinations at the Duke Health system between January

1, 2014, and January 30, 2018. The acquisition process in

volved cross-referencing information from radiology reports, pathology reports, and DBT data obtained from the

Picture Archiving and Communication Systems (PACS)

at Duke. These studies encompassed a total of 13,954 unique patients, each with at least one craniocaudal (CC) and mediolateral oblique (MLO) view available for either the left or right breast (see Fig. 4).

The dataset is organized into three sets: a training set comprising 1.42 TB, a validation set comprising 84.71 GB, and a test set comprising 135.14 GB. The images in the dataset are in a DICOM format and were processed using the torchio library for reading. To make them consistent and ready for analysis and research, all the pictures were reshaped to dimensions of 64x64 pixels with 8 slices. The dimensions of the images have been systematically reduced through an iterative process aimed at preserving the utmost quality of the visual content. This iterative approach has been employed with the primary objective of maintaining the highest possible image quality while undergoing size reduction. This change is not just for analysis, but it also enhance computational eficiency.

3.2. Hardware The principal objective of the present research is cen

tered on the generation of synthetic samples in Digital

Breast Tomosynthesis (DBT) through the application of

a sophisticated Denoising Difusion Probabilistic Model.

To optimize the computational procedures inherent in

this complex task, the foundational code underwent a process of parallelization on four Tesla V100 Graphics Processing Units (GPUs). It is imperative to note that the parallelization strategy employed pertained specifically to the data level, signifying that the dataset was efectively partitioned and processed concurrently across all GPUs. This strategic approach played a pivotal role in amplifying both the eficiency and expeditiousness of the model training and synthetic sample generation, thereby significantly augmenting the overall eficacy of the research. 3.3. Model deployed tributing to the overall success of the research endeavor. • Learning Rate. It was observed that the adoption of the smallest learning rate, 1e-4, conferred superior outcomes. This discernment underscores the model’s susceptibility to nuanced parameter adjustments, wherein diminutive updates within the parameter space correlated with enhanced performance. The adaptive characteristics inherent to the Adam optimizer, which dynamically adjusts learning rates based on historical gradients, likely played a pivotal role in the eficacy of this minimal learning rate.

Hyperparameters

Loss function In the work, a deliberate choice was

made to employ the Mean Squared Error (MSE) as the • Batch Size. A judiciously determined batch size primary loss function. This decision was founded upon of 16 was allocated for each of the four Graphics the premise of calculating the disparity between the noise Processing Units (GPUs) employed. This strate- introduced to images and the corresponding noise predicgic selection was grounded in the quest for an tions generated by the UNet model. However, subsequent optimal compromise among critical factors such empirical investigations, coupled with insights gleaned as training speed, output quality, and, notably, from alternative implementations, have indicated the memory utilization. The careful consideration potential for the utilization of the Mean Absolute Error of these factors was essential in achieving a har- (MAE) to yield superior outcomes. These findings prompt monious balance that not only facilitated expedi- a reevaluation of the chosen loss function, necessitating a tious model training but also ensured the preser- thorough exploration of the implications associated with vation of high-quality outcomes while eficiently the adoption of MAE within the framework of the study’s managing the computational memory resources. objectives. Such a revision stands to enhance the eficacy This particular batch size allocation emerged as and fidelity of the model’s predictive capabilities, thereby the most efective compromise, aligning with the warranting comprehensive investigation and validation overarching objectives of the experiment and con- within the context of the research endeavor.

3.4. Sample quality

While it is evident that the current quality of our samples may not meet the stringent standards required for certain applications (see Fig.5), it is essential to recognize the promising aspects of the results. Despite the imperfections, the dataset produced presents a valuable foundation upon which improvements can be built. Additionally, it is noteworthy that upon examination of the histogram (see Fig.6), the generated samples exhibit lower contrast compared to the desired standards. This contrast deficiency is a significant aspect that requires attention to ensure that the generated images meet the necessary quality thresholds for clinical applications. Contrary to the histogram relative to the generated images, the one related to real images displays a distribution of pixel intensities characterized by the accumulation of values around two distinct modes. This suggests the presence of two dominant intensity regions within the image. The separation and distribution of these modes can significantly influence the visual characteristics of the image.

4. Discussion and Conclusion

Through meticulous experimentation and analysis, the study demonstrated the capability of DDPM to produce artificial DBT images that closely emulate the intricacies of real-world cases. The promising results obtained pave the way for future investigations and applications of DDPM in the realm of medical imaging. The potential for refining and expanding upon these generative models opens avenues for further research, contributing to the ongoing evolution of DBT technology. Future application may involve the conditioning of the sampling procedure, allowing for the deliberate manipulation of the generated samples. In this context, this phenomenon is alternatively denoted as guided difusion [ 21], [22]. Moreover, Latent difusion models could be introduced. In these models an initial step involves projecting the input into a more compact latent space, where the difusion process is subsequently applied. To elaborate further, Rombach et al. [23] proposed the utilization of an encoder network, denoted as ( ) = , to encode the input into a latent representation . This strategic choice aims to alleviate the computational demands associated with training difusion models by conducting processing in a lower-dimensional space. Following this encoding, a conventional difusion model, specifically a U-Net, is employed to generate new data. The resultant data are then upsampled through a decoder network.

5. Acknowledgments The work was supported by the PNRR MUR project PE0000013-FAIR.

national Symposium on Cluster, Cloud and Internet Computing (CCGrid), 2022, pp. 913–920. [1] H.-P. Chan, R. K. Samala, L. M. Hadjiiski, C. Zhou, [12] R. Moore, D. Kopans, E. Raferty, D. Georgian-Smith, Deep learning in medical image analysis, in: R. Hitt, E. Yeh, Initial callback rates for conventional Advances in Experimental Medicine and Biology, and digital breast tomosynthesis mammography Springer International Publishing, 2020. comparison in the screening setting, in: Radio[2] S. Destounis, Role of digital breast tomosynthesis logical Society of North America 93rd Scientific in screening and diagnostic breast imaging (2018). Assembly and Annual Meeting, Nov, 2007. [3] F. D. Marco, A. A. Citarella, L. D. Biasi, L. D’Errico, [13] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, R. Francese, G. Mettivier, M. Stafa, G. Tortora, Ai- D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, based solutions for the analysis of biomedical im- Generative adversarial networks, Commun. ACM ages and signals, in: Proceedings of the Conference 63 (2020) 139–144. on Artificial Intelligence (Ital IA 2023), Pisa, Italy, [14] L. Pinheiro Cinelli, M. Araújo Marins, E. A. BarMay 29-30, 2023, volume 3486 of CEUR Workshop ros da Silva, S. Lima Netto, Variational autoencoder Proceedings, 2023, pp. 171–176. (2021) 111–149. [4] L. Taylor, G. Nitschke, Improving deep learning [15] A. Kazerouni, E. K. Aghdam, M. Heidari, R. Azad, with generic data augmentation, in: 2018 IEEE M. Fayyaz, I. Hacihaliloglu, D. Merhof, Difusion Symposium Series on Computational Intelligence models in medical imaging: A comprehensive sur(SSCI), 2018, pp. 1542–1547. vey, Medical Image Analysis 88 (2023) 102846. [5] A. Q. Nichol, P. Dhariwal, Improved denoising dif- [16] J. Ho, A. Jain, P. Abbeel, Denoising difusion probfusion probabilistic models, in: M. Meila, T. Zhang abilistic models, arXiv preprint arxiv:2006.11239 (Eds.), Proceedings of the 38th International Con- (2020). ference on Machine Learning, volume 139 of Pro- [17] S. Zagoruyko, N. Komodakis, Wide residual netceedings of Machine Learning Research, PMLR, 2021, works, 2016. URL: http://arxiv.org/abs/1605.07146, pp. 8162–8171. cite arxiv:1605.07146. [6] D. Förnvik, S. Zackrisson, O. Ljungberg, T. Svahn, [18] A. Kolesnikov, L. Beyer, X. Zhai, J. Puigcerver, P. Timberg, A. Tingberg, I. Andersson, Breast to- J. Yung, S. Gelly, N. Houlsby, Big transfer (BiT): mosynthesis: accuracy of tumor measurement com- General visual representation learning (2019). pared with digital mammography and ultrasonog- [19] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, raphy, Acta radiologica 51 (2010) 240–247. L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, At[7] B. D. Fornage, O. Toubas, M. Morel, Clinical, mam- tention is all you need (2017). mographic, and sonographic determination of pre- [20] M. Stafa, P. Barra, Multi-monitor system for adapoperative breast cancer size, Cancer 60 (1987) tive image saliency detection based on attentive 765–771. mechanisms, in: H. Degen, S. Ntoa (Eds.), Artifi[8] L. T. Niklason, B. T. Christian, L. E. Niklason, D. B. cial Intelligence in HCI - 4th International ConferKopans, D. E. Castleberry, B. Opsahl-Ong, C. E. ence, AI-HCI 2023, Held as Part of the 25th HCI Landberg, P. J. Slanetz, A. A. Giardino, R. Moore, International Conference, HCII, volume 14051 of et al., Digital tomosynthesis in breast imaging, Ra- Lecture Notes in Computer Science, Springer, 2023, diology 205 (1997) 399–406. pp. 607–617. [9] N. P. Tirada, G. Li, D. Dreizin, L. Robinson, G. R. [21] A. Q. Nichol, P. Dhariwal, A. Ramesh, P. Shyam, Khorjekar, S. A. Dromi, T. S. Ernst, Digital breast to- P. Mishkin, B. McGrew, I. Sutskever, M. Chen, Glide: mosynthesis: Physics, artifacts, and quality control Towards photorealistic image generation and editconsiderations., Radiographics : a review publica- ing with text-guided difusion models., in: ICML, tion of the Radiological Society of North America, volume 162 of Proceedings of Machine Learning ReInc 39 2 (2019) 413–426. search, PMLR, 2022, pp. 16784–16804. [10] I. Andersson, D. M. Ikeda, S. Zackrisson, M. Ruschin, [22] C. Saharia, W. Chan, S. Saxena, L. Li, J. Whang, T. Svahn, P. Timberg, A. Tingberg, Breast tomosyn- E. Denton, S. K. S. Ghasemipour, B. K. Ayan, S. S. thesis and digital mammography: a comparison of Mahdavi, R. G. Lopes, T. Salimans, J. Ho, D. J. Fleet, breast cancer visibility and birads classification in a M. Norouzi, Photorealistic text-to-image difusion population of cancers with subtle mammographic models with deep language understanding., CoRR ifndings, European radiology 18 (2008) 2817–2825. abs/2205.11487 (2022). [11] M. Stafa, L. D’Errico, R. Ricciardi, P. Barra, E. Antig- [23] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, nani, S. Minelli, G. Mettivier, How to increase and B. Ommer, High-resolution image synthesis with balance current dbt datasets via an evolutionary latent difusion models., in: CVPR, IEEE, 2022, pp. gan: preliminary results, in: 2022 22nd IEEE Inter- 10674–10685.