<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Ital-IA</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Models for DBT data augmentation: preliminary results</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lorenzo D'Errico</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lorenzo Pergamo</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniel Riccio</string-name>
          <email>daniel.riccio@unina.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mariacarla Stafa</string-name>
          <email>mariacarla.staffa@uniparthenope.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Denoising Difusion Probabilistic Models (DDPMs)</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Breast Tomosynthesis (DBT)</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Generative Models</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Naples Federico II</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Naples Parthenope</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>age Denoising Difusion Probabilistic Models</institution>
          ,
          <addr-line>DDPMs</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>4</volume>
      <fpage>29</fpage>
      <lpage>30</lpage>
      <abstract>
        <p>Recent strides in computer vision have led to promising breakthroughs in the realm of image generation. Notably, difusion probabilistic models such as DALL-E 2, Imagen, and Stable Difusion have demonstrated the ability to create lifelike images based on textual prompts. Yet, their potential application in the medical domain, where intricate three-dimensional image volumes are commonplace, remains largely untapped. Synthetic imagery presents a compelling avenue in the realm of privacy-preserving artificial intelligence and holds immense potential for enriching datasets with limited samples. This study seeks to assess the efectiveness of difusion probabilistic models in synthesizing high-fidelity medical imaging data, with a particular focus on Digital Breast Tomosynthesis (DBT) images.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>The success of deep learning across various pattern
recogevated expectations regarding its potential impact on
tection algorithms within the context of DBT. The study
is structured as follows: in Section 2 an overview of the</p>
      <sec id="sec-2-1">
        <title>DBT technology as well as of DDPM models it’s given;</title>
      </sec>
      <sec id="sec-2-2">
        <title>Section 3 delves into the results obtained and finally in</title>
        <p>Section 4 future perspectives are discussed.
diagnosis. As the healthcare landscape evolves, the fu- to generate novel samples. In the realm of image
synthereby improving patient outcomes. Despite the opti- tive is to address the scarcity and imbalance in existing
tificial neural networks necessitate substantial training
settings encounter numerous challenges [2]. Deep ar- and detection. The proposed methodology entails using</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>2. Matherial and Methods</title>
      <sec id="sec-3-1">
        <title>2.1. Digital Breast Tomosynthesis</title>
        <p>Full-field digital mammography (FFDM) has traditionally
been the primary breast cancer screening method, but its
efectiveness is hindered by inherent limitations.
Visualizing complex breast structures in two dimensions often
leads to obscured tumor margins and inaccurate lesion
characterization due to overlapping tissue [6]. Standard
imaging projections may not capture the full extent of Figure 2: Comparison between a 2D mammogram and a
irregular or multifocal tumors [7], further complicating 3D one. In Digital Breast Tomosynthesis (right), tumors are
accurate diagnosis. Variations in breast composition, po- detected, unlike in mammography (left) where tissue overlap
sitioning artifacts, and tissue compression during imag- obstructs the view of the specialist doctor [11].
ing introduce variability into tumor size estimation and
localization. In contrast, Digital Breast Tomosynthesis
(DBT) (see Fig.1), approved by the FDA in 2011, revolu- lutionize breast healthcare delivery by facilitating early
tionizes breast imaging by acquiring a series of low-dose detection and treatment of breast cancer.
X-ray images from multiple angles and reconstructing
them into a 3D dataset [8]. This enables radiologists to</p>
      </sec>
      <sec id="sec-3-2">
        <title>2.2. Denoising Difusion Probabilistic</title>
      </sec>
      <sec id="sec-3-3">
        <title>Models</title>
        <sec id="sec-3-3-1">
          <title>Difusion models represent an advanced category of gen</title>
          <p>erative models renowned for their eficacy in capturing
intricate data distributions. Despite being a recent
addition to the generative learning field, they have proven
valuable across diverse applications. The three
dominant generative frameworks are identified as Generative
Adversarial Networks (GANs) [13], Variational
Autoencoders (VAEs) [14], and normalizing flows [ 15]. These
models, falling under the category of probabilistic
generative models, are adept at capturing intricate data
distributions, establishing themselves as a formidable tool in
various applications. A Denoising Difusion Probabilistic
Model (DDPM) is a parameterized Markov chain trained
Figure 1: Digital Breast Tomosynthesis procedure. using variational inference to produce samples matching
the data after finite time (see Fig. 3). DDPM are composed
navigate breast tissue in three dimensions, overcoming of two opposite processes, forward and reverse difusion
the limitations of 2D mammography [9]. DBT enhances process. In the forward difusion process, Gaussian noise
lesion visualization, improves diagnostic accuracy, and is gradually and iteratively introduced to intentionally
outperforms FFDM in detecting invasive cancers and ar- perturb the images within the training set, aiming to
inchitectural distortions [10] (see Fig.2). Advanced recon- duce a transformation wherein they deviate from their
struction algorithms and image processing techniques current distribution and align more closely with a normal
further enhance DBT’s diagnostic utility, allowing for distribution. In the reverse difusion process, the
objecthe detection of smaller lesions with greater confidence tive is to systematically invert the preceding forward
[12]. DBT reduces false positives, minimizes unneces- difusion procedure. The reversal is conducted gradually
sary recalls, and optimizes patient outcomes by providing and iteratively to counteract the perturbation applied to
clearer, more detailed images. Integration of quantitative images in the forward process. Starting where the
forimaging biomarkers and machine learning algorithms ward process concludes, the advantage of initiating from
augments DBT’s diagnostic capabilities, ushering in per- a normal distribution lies in the known methodology
sonalized breast cancer screening and management. In for sampling points from this uncomplicated
distribuconclusion, DBT represents a transformative advance- tion. The primary aim is to discern the means to revert
ment in breast cancer imaging, promising unparalleled to the original data distribution. Nonetheless, the
chaldiagnostic accuracy and improved patient outcomes. As lenge arises from the potential for an infinite array of
research and technology progress, DBT is poised to revo- trajectories originating from a point in this ostensibly
simple space, with only a fraction leading to the data
distribution. Within the context of DDPM, this is achieved
by referencing the incremental steps undertaken in the
forward difusion process. The probability density
function (PDF) corresponding to the corrupted images in the
forward process exhibits slight variations at each step.</p>
          <p>Consequently, in the reverse process, a deep-learning
model is employed at each step to prognosticate the PDF
parameters of the forward process. Subsequent to model
training, any point in the simple space can be selected,
and the model can be utilized iteratively to navigate back
to the data subspace. In reverse difusion, denoising is
systematically performed in small steps, commencing from
a noisy image. This method of training and generating
new samples is characterized by enhanced stability
compared to Generative Adversarial Networks (GANs) and
surpasses prior approaches such as Variational
Autoencoders (VAE) and normalizing flows. Difusion models,
as outlined in the literature [16], are a category of latent
variable models represented by the equation</p>
          <p>( 0) ∶= ∫   ( 0∶ )  1∶ ,
where  1, … ,   are latent variables of the same
dimensionality as the data  0 ∼ (</p>
          <p>0). The joint distribution
  ( 0∶ ) is denoted as the reverse process, constituting
a Markov chain with learned Gaussian transitions that
initiate at (  ) =  (  ; 0,  ) :
  ( 0∶ ) ∶= (  ) ∏   ( −1 |  ),

=1
  ( −1 |  ) ∶=  (</p>
          <p>−1 ;   (  , ), Σ  (  , )).</p>
          <p>Regarding the structural design of the model, it’s
noteworthy that the dimensions of both the input and output
of the model should align. To achieve this objective, Ho
et al. [16] utilized a U-Net architecture, thereby
ensuring compatibility in size between the input and output
components of the model. From the typical UNet
architecture, the conventional double convolution at each
level was replaced with Residual blocks as employed in</p>
        </sec>
        <sec id="sec-3-3-2">
          <title>ResNet models. In the DDPM implementation, a Wide</title>
        </sec>
        <sec id="sec-3-3-3">
          <title>ResNet block was employed as per Zagoruyko et al. [17].</title>
        </sec>
        <sec id="sec-3-3-4">
          <title>However, in the adaptation by Phil Wang, the standard</title>
          <p>convolutional layer was replaced with a weight
standardized version, recognized for its improved performance
in conjunction with group normalization as outlined by
Kolesnikov et al. [18]. Moreover, in order to maintain
parameter consistency across various time instances,
sinusoidal position embeddings are incorporated, drawing
inspiration from the Transformer model [19]. This
integration facilitates the neural network in discerning the
relevant time step (noise level) for each image within a
batch. The SinusoidalPositionEmbeddings module has
been used in this work. Finally, an attention module is
introduced from the Transformer architecture [19, 20].</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3. Experimental Results</title>
      <sec id="sec-4-1">
        <title>This section presents a detailed analysis of the experi</title>
        <p>mental setup employed for training and evaluating the</p>
      </sec>
      <sec id="sec-4-2">
        <title>DDPM. The results are presented in a structured manner,</title>
        <p>showcasing the model’s ability to capture and simulate
the intricate features present in authentic DBT images.</p>
      </sec>
      <sec id="sec-4-3">
        <title>Additionally, we explore the impact of key hyperparameters on the synthesis process.</title>
        <sec id="sec-4-3-1">
          <title>3.1. Description of Dataset</title>
        </sec>
      </sec>
      <sec id="sec-4-4">
        <title>The dataset comprises patient records from individuals</title>
        <p>who underwent Digital Breast Tomosynthesis (DBT)
examinations at the Duke Health system between January</p>
      </sec>
      <sec id="sec-4-5">
        <title>1, 2014, and January 30, 2018. The acquisition process in</title>
        <p>volved cross-referencing information from radiology
reports, pathology reports, and DBT data obtained from the</p>
      </sec>
      <sec id="sec-4-6">
        <title>Picture Archiving and Communication Systems (PACS)</title>
        <p>at Duke. These studies encompassed a total of 13,954
unique patients, each with at least one craniocaudal (CC)
and mediolateral oblique (MLO) view available for either
the left or right breast (see Fig. 4).</p>
        <p>The dataset is organized into three sets: a training set
comprising 1.42 TB, a validation set comprising 84.71 GB,
and a test set comprising 135.14 GB. The images in the
dataset are in a DICOM format and were processed using
the torchio library for reading. To make them consistent
and ready for analysis and research, all the pictures were
reshaped to dimensions of 64x64 pixels with 8 slices.
The dimensions of the images have been systematically
reduced through an iterative process aimed at preserving
the utmost quality of the visual content. This iterative
approach has been employed with the primary objective
of maintaining the highest possible image quality while
undergoing size reduction. This change is not just for
analysis, but it also enhance computational eficiency.</p>
        <sec id="sec-4-6-1">
          <title>3.2. Hardware</title>
        </sec>
      </sec>
      <sec id="sec-4-7">
        <title>The principal objective of the present research is cen</title>
        <p>tered on the generation of synthetic samples in Digital</p>
      </sec>
      <sec id="sec-4-8">
        <title>Breast Tomosynthesis (DBT) through the application of</title>
        <p>a sophisticated Denoising Difusion Probabilistic Model.</p>
      </sec>
      <sec id="sec-4-9">
        <title>To optimize the computational procedures inherent in</title>
        <p>this complex task, the foundational code underwent a
process of parallelization on four Tesla V100 Graphics
Processing Units (GPUs). It is imperative to note that
the parallelization strategy employed pertained
specifically to the data level, signifying that the dataset was
efectively partitioned and processed concurrently across
all GPUs. This strategic approach played a pivotal role
in amplifying both the eficiency and expeditiousness
of the model training and synthetic sample generation,
thereby significantly augmenting the overall eficacy of
the research.
3.3. Model deployed
tributing to the overall success of the research
endeavor.
• Learning Rate. It was observed that the adoption
of the smallest learning rate, 1e-4, conferred
superior outcomes. This discernment underscores
the model’s susceptibility to nuanced parameter
adjustments, wherein diminutive updates within
the parameter space correlated with enhanced
performance. The adaptive characteristics
inherent to the Adam optimizer, which dynamically
adjusts learning rates based on historical
gradients, likely played a pivotal role in the eficacy of
this minimal learning rate.</p>
        <p>Hyperparameters</p>
      </sec>
      <sec id="sec-4-10">
        <title>Loss function In the work, a deliberate choice was</title>
        <p>made to employ the Mean Squared Error (MSE) as the
• Batch Size. A judiciously determined batch size primary loss function. This decision was founded upon
of 16 was allocated for each of the four Graphics the premise of calculating the disparity between the noise
Processing Units (GPUs) employed. This strate- introduced to images and the corresponding noise
predicgic selection was grounded in the quest for an tions generated by the UNet model. However, subsequent
optimal compromise among critical factors such empirical investigations, coupled with insights gleaned
as training speed, output quality, and, notably, from alternative implementations, have indicated the
memory utilization. The careful consideration potential for the utilization of the Mean Absolute Error
of these factors was essential in achieving a har- (MAE) to yield superior outcomes. These findings prompt
monious balance that not only facilitated expedi- a reevaluation of the chosen loss function, necessitating a
tious model training but also ensured the preser- thorough exploration of the implications associated with
vation of high-quality outcomes while eficiently the adoption of MAE within the framework of the study’s
managing the computational memory resources. objectives. Such a revision stands to enhance the eficacy
This particular batch size allocation emerged as and fidelity of the model’s predictive capabilities, thereby
the most efective compromise, aligning with the warranting comprehensive investigation and validation
overarching objectives of the experiment and con- within the context of the research endeavor.</p>
        <sec id="sec-4-10-1">
          <title>3.4. Sample quality</title>
          <p>While it is evident that the current quality of our
samples may not meet the stringent standards required for
certain applications (see Fig.5), it is essential to
recognize the promising aspects of the results. Despite the
imperfections, the dataset produced presents a valuable
foundation upon which improvements can be built.
Additionally, it is noteworthy that upon examination of the
histogram (see Fig.6), the generated samples exhibit lower
contrast compared to the desired standards. This contrast
deficiency is a significant aspect that requires attention
to ensure that the generated images meet the necessary
quality thresholds for clinical applications. Contrary to
the histogram relative to the generated images, the one
related to real images displays a distribution of pixel
intensities characterized by the accumulation of values
around two distinct modes. This suggests the presence
of two dominant intensity regions within the image. The
separation and distribution of these modes can
significantly influence the visual characteristics of the image.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Discussion and Conclusion</title>
      <p>Through meticulous experimentation and analysis, the
study demonstrated the capability of DDPM to produce
artificial DBT images that closely emulate the
intricacies of real-world cases. The promising results obtained
pave the way for future investigations and applications
of DDPM in the realm of medical imaging. The
potential for refining and expanding upon these generative
models opens avenues for further research, contributing
to the ongoing evolution of DBT technology. Future
application may involve the conditioning of the sampling
procedure, allowing for the deliberate manipulation of
the generated samples. In this context, this phenomenon
is alternatively denoted as guided difusion [ 21], [22].
Moreover, Latent difusion models could be introduced.
In these models an initial step involves projecting the
input into a more compact latent space, where the
difusion process is subsequently applied. To elaborate further,
Rombach et al. [23] proposed the utilization of an
encoder network, denoted as (  ) =   , to encode the input
into a latent representation   . This strategic choice aims
to alleviate the computational demands associated with
training difusion models by conducting processing in
a lower-dimensional space. Following this encoding, a
conventional difusion model, specifically a U-Net, is
employed to generate new data. The resultant data are then
upsampled through a decoder network.</p>
    </sec>
    <sec id="sec-6">
      <title>5. Acknowledgments</title>
      <sec id="sec-6-1">
        <title>The work was supported by the PNRR MUR project PE0000013-FAIR.</title>
        <p>national Symposium on Cluster, Cloud and Internet
Computing (CCGrid), 2022, pp. 913–920.
[1] H.-P. Chan, R. K. Samala, L. M. Hadjiiski, C. Zhou, [12] R. Moore, D. Kopans, E. Raferty, D. Georgian-Smith,
Deep learning in medical image analysis, in: R. Hitt, E. Yeh, Initial callback rates for conventional
Advances in Experimental Medicine and Biology, and digital breast tomosynthesis mammography
Springer International Publishing, 2020. comparison in the screening setting, in:
Radio[2] S. Destounis, Role of digital breast tomosynthesis logical Society of North America 93rd Scientific
in screening and diagnostic breast imaging (2018). Assembly and Annual Meeting, Nov, 2007.
[3] F. D. Marco, A. A. Citarella, L. D. Biasi, L. D’Errico, [13] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu,
R. Francese, G. Mettivier, M. Stafa, G. Tortora, Ai- D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio,
based solutions for the analysis of biomedical im- Generative adversarial networks, Commun. ACM
ages and signals, in: Proceedings of the Conference 63 (2020) 139–144.
on Artificial Intelligence (Ital IA 2023), Pisa, Italy, [14] L. Pinheiro Cinelli, M. Araújo Marins, E. A.
BarMay 29-30, 2023, volume 3486 of CEUR Workshop ros da Silva, S. Lima Netto, Variational autoencoder
Proceedings, 2023, pp. 171–176. (2021) 111–149.
[4] L. Taylor, G. Nitschke, Improving deep learning [15] A. Kazerouni, E. K. Aghdam, M. Heidari, R. Azad,
with generic data augmentation, in: 2018 IEEE M. Fayyaz, I. Hacihaliloglu, D. Merhof, Difusion
Symposium Series on Computational Intelligence models in medical imaging: A comprehensive
sur(SSCI), 2018, pp. 1542–1547. vey, Medical Image Analysis 88 (2023) 102846.
[5] A. Q. Nichol, P. Dhariwal, Improved denoising dif- [16] J. Ho, A. Jain, P. Abbeel, Denoising difusion
probfusion probabilistic models, in: M. Meila, T. Zhang abilistic models, arXiv preprint arxiv:2006.11239
(Eds.), Proceedings of the 38th International Con- (2020).
ference on Machine Learning, volume 139 of Pro- [17] S. Zagoruyko, N. Komodakis, Wide residual
netceedings of Machine Learning Research, PMLR, 2021, works, 2016. URL: http://arxiv.org/abs/1605.07146,
pp. 8162–8171. cite arxiv:1605.07146.
[6] D. Förnvik, S. Zackrisson, O. Ljungberg, T. Svahn, [18] A. Kolesnikov, L. Beyer, X. Zhai, J. Puigcerver,
P. Timberg, A. Tingberg, I. Andersson, Breast to- J. Yung, S. Gelly, N. Houlsby, Big transfer (BiT):
mosynthesis: accuracy of tumor measurement com- General visual representation learning (2019).
pared with digital mammography and ultrasonog- [19] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit,
raphy, Acta radiologica 51 (2010) 240–247. L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin,
At[7] B. D. Fornage, O. Toubas, M. Morel, Clinical, mam- tention is all you need (2017).
mographic, and sonographic determination of pre- [20] M. Stafa, P. Barra, Multi-monitor system for
adapoperative breast cancer size, Cancer 60 (1987) tive image saliency detection based on attentive
765–771. mechanisms, in: H. Degen, S. Ntoa (Eds.),
Artifi[8] L. T. Niklason, B. T. Christian, L. E. Niklason, D. B. cial Intelligence in HCI - 4th International
ConferKopans, D. E. Castleberry, B. Opsahl-Ong, C. E. ence, AI-HCI 2023, Held as Part of the 25th HCI
Landberg, P. J. Slanetz, A. A. Giardino, R. Moore, International Conference, HCII, volume 14051 of
et al., Digital tomosynthesis in breast imaging, Ra- Lecture Notes in Computer Science, Springer, 2023,
diology 205 (1997) 399–406. pp. 607–617.
[9] N. P. Tirada, G. Li, D. Dreizin, L. Robinson, G. R. [21] A. Q. Nichol, P. Dhariwal, A. Ramesh, P. Shyam,
Khorjekar, S. A. Dromi, T. S. Ernst, Digital breast to- P. Mishkin, B. McGrew, I. Sutskever, M. Chen, Glide:
mosynthesis: Physics, artifacts, and quality control Towards photorealistic image generation and
editconsiderations., Radiographics : a review publica- ing with text-guided difusion models., in: ICML,
tion of the Radiological Society of North America, volume 162 of Proceedings of Machine Learning
ReInc 39 2 (2019) 413–426. search, PMLR, 2022, pp. 16784–16804.
[10] I. Andersson, D. M. Ikeda, S. Zackrisson, M. Ruschin, [22] C. Saharia, W. Chan, S. Saxena, L. Li, J. Whang,
T. Svahn, P. Timberg, A. Tingberg, Breast tomosyn- E. Denton, S. K. S. Ghasemipour, B. K. Ayan, S. S.
thesis and digital mammography: a comparison of Mahdavi, R. G. Lopes, T. Salimans, J. Ho, D. J. Fleet,
breast cancer visibility and birads classification in a M. Norouzi, Photorealistic text-to-image difusion
population of cancers with subtle mammographic models with deep language understanding., CoRR
ifndings, European radiology 18 (2008) 2817–2825. abs/2205.11487 (2022).
[11] M. Stafa, L. D’Errico, R. Ricciardi, P. Barra, E. Antig- [23] R. Rombach, A. Blattmann, D. Lorenz, P. Esser,
nani, S. Minelli, G. Mettivier, How to increase and B. Ommer, High-resolution image synthesis with
balance current dbt datasets via an evolutionary latent difusion models., in: CVPR, IEEE, 2022, pp.
gan: preliminary results, in: 2022 22nd IEEE Inter- 10674–10685.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>