<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Localization-Guided Semantic Segmentation Framework for Breast Ultrasound Lesions Using BUS_UCLM</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Faycal Arioua</string-name>
          <email>f.arioua@univ-boumerdes.dz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Faycal Touazi</string-name>
          <email>f.touazi@univ-boumerdes.dz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Djamel Gaceb</string-name>
          <email>d.gaceb@univ-boumerdes.dz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tayeb Benzenati</string-name>
          <email>t.benzenati@univ-boumerdes.dz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ibtissem Telkhoukh</string-name>
          <email>i.telkhoukh@univ-boumerdes.dz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nour Elhouda Magaz</string-name>
          <email>nh.magaz@univ-boumerdes.dz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>LIMOSE Laboratory, Computer Science Department, University M'hamed Bougara</institution>
          ,
          <addr-line>Independence Avenue, 35000 Boumerdes</addr-line>
          ,
          <country country="DZ">Algeria</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>Breast cancer tumor detection and segmentation in ultrasound imaging is crucial for improving early diagnosis and guiding treatment decisions. This paper presents a two-stage deep learning framework that integrates lesion localization-guided semantic segmentation to enhance tumor detection in breast ultrasound images. In the first stage, YOLOv11-n-nano, a lightweight detection model with approximately 2 million parameters, is employed to localize suspicious regions of interest. In the second stage, advanced segmentation architectures-including U-Net++, U-Net3+, Attention U-Net, and TransUNet-are applied to refine the segmentation of the localized areas. The proposed framework is evaluated on the publicly available BUS_UCLM dataset. Our approach achieves state-of-the-art performance, with DeepLabV3 obtaining a Dice score of 94.31% and an Intersection over Union (IoU) of 92.10%. These results highlight the efectiveness of localization-guided segmentation in reducing false positives and improving segmentation accuracy in breast ultrasound imaging.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Breast tumor segmentation</kwd>
        <kwd>Ultrasound Images</kwd>
        <kwd>Deep Learning</kwd>
        <kwd>YOLOv11-n</kwd>
        <kwd>DeepLabV3</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        According to the World Health Organization, breast cancer is the most commonly diagnosed cancer
among women worldwide, with over 2.3 million new cases in 2020 and accounting for approximately
685,000 deaths annually [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The diagnosis and treatment of breast cancer rely on multiple steps,
including clinical examinations and imaging techniques, to assess tumor presence, stage, and biological
characteristics. Among imaging modalities, ultrasound plays a central role in breast cancer diagnosis,
especially in young women and in patients with dense breast tissue, where mammography may be less
efective [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        Breast ultrasound imaging ofers several advantages: it is non-invasive, cost-efective, portable,
and free of ionizing radiation. However, ultrasound sufers from inherent limitations such as speckle
noise, low contrast, operator dependency, and variability in acquisition protocols [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. These factors
make accurate interpretation a challenging task, often requiring considerable radiologist expertise.
Consequently, the integration of Artificial Intelligence (AI), particularly deep learning, into ultrasound
analysis has emerged as a promising direction for automating lesion detection, classification, and
segmentation.
      </p>
      <p>
        In recent years, deep learning techniques, especially convolutional neural networks (CNNs), have
demonstrated remarkable success in medical image analysis tasks including classification, detection, and
segmentation [
        <xref ref-type="bibr" rid="ref4 ref5 ref6 ref7">4, 5, 6, 7</xref>
        ]. Several studies have focused on the diagnosis of breast lesion in mammography
and MRI; however, fewer studies have systematically addressed the breast tumor segmentation in
ultrasound images. This task remains dificult due to factors such as heterogeneous tissue appearance,
irregular tumor boundaries, small lesion sizes, and class imbalance between benign, malignant, and
normal samples [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ].
      </p>
      <p>
        To address these challenges, incorporating a lesion localization step prior to segmentation has been
proposed as an efective strategy. By guiding the segmentation model toward regions of interest, this
two-stage approach reduces false positives and improves delineation of lesion boundaries [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. This
design also reflects the clinical workflow, where radiologists typically identify suspicious regions before
performing detailed analysis. Furthermore, focusing segmentation on localized regions can alleviate
the efects of class imbalance and improve computational eficiency during both training and inference
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Ensemble and hybrid deep learning strategies can further enhance robustness and generalization
across diverse imaging conditions [
        <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
        ].
      </p>
      <p>In this study, we present a two-stage deep learning pipeline for breast lesion analysis in ultrasound
images. The first stage localizes suspicious regions using a YOLO-based model, while the second stage
performs precise semantic segmentation with advanced architectures. We evaluate our framework on
the BUS_UCLM dataset, a publicly available breast ultrasound lesion segmentation dataset containing
683 images across benign, malignant, and normal categories. We integrate four state-of-the-art
CNNbased segmentation models: U-Net, U-Net++, TransUNet, Attention U-Net and DeepLab. We provide a
comprehensive comparison of their performance.</p>
      <p>The structure of this paper is as follows: Section 2 reviews related literature on breast tumor analysis
in ultrasound imaging, with emphasis on deep learning-based methodologies. Section 3 introduces
the proposed methodology, including the two-stage localization-segmentation framework, dataset
preprocessing, and network architectures. Section 4 introduces the evaluation metrics and loss functions
employed to assess the performance of the diferent models. Section 5 presents the experimental results
and comparisons across diferent models. Finally, Section 6 concludes the paper with key findings,
limitations, and future research directions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>Several studies have explored deep learning approaches for breast cancer detection and segmentation
in ultrasound imaging.</p>
      <p>
        Cho et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] proposed a multi-stage segmentation framework combining classification (BTEC-Net,
based on DenseNet and ResNet) and segmentation (RFS-UNet with residual and spatial attention).
Evaluated on the BUSI and UDIAT datasets, their approach significantly reduced false positives and
improved IoU (77%) and Dice (85%), outperforming conventional models such as UNet and PSPNet.
      </p>
      <p>Raza et al. [14] introduced DeepBreastCancerNet, a 24-layer CNN with inception modules, enhanced
through transfer learning from nine pre-trained networks. Using the BUSI dataset (780 images) and an
additional set of 250 images, the model achieved remarkable accuracy (99.35% and 99.63%), surpassing
ResNet-50 (98.06%) and GoogLeNet (98.71%).</p>
      <p>Mukasheva et al. [15] compared five UNet variants (UNet, Attention UNet, UNet++, DenseInception
UNet, Residual UNet) on 780 ultrasound images. DenseInception UNet achieved the highest Dice (0.976)
after augmentation, while Attention UNet underperformed, highlighting the influence of noise and
preprocessing strategies.</p>
      <p>Khaledyan et al. [16] studied UNet-based architectures with preprocessing (CLAHE, augmentation)
on the BUSI dataset. Their proposed Sharp Attention UNet achieved Dice 0.93 and accuracy 97.9%,
outperforming UNet, Sharp UNet, and Attention UNet.</p>
      <p>MohammadiNasab et al. [17] presented a self-supervised multi-task approach (DATTR2U-Net) for
Automated Breast Ultrasound (ABUS). Using inpainting and denoising as pretext tasks, their model
improved segmentation robustness. On TDSC-ABUS, it outperformed Faster R-CNN and YOLO, though
challenges remain for detecting very small lesions.</p>
      <p>Madhu et al. [18] proposed UCapsNet, a two-stage model combining U-Net for segmentation and
Capsule Networks for classification. On BUSI, it achieved outstanding results (segmentation Dice 99.07%,
classification accuracy 99.22%), surpassing pre-trained CNNs like VGG-19 and ResNet-50.</p>
      <p>Almajalid et al. [19] used an enhanced U-Net with preprocessing (denoising, contrast adjustment)
and heavy augmentation on 221 images, achieving a Dice score of 82.5%, outperforming other automatic
methods.</p>
      <p>Vallez et al. [20] introduced the BUS-UCLM dataset (683 annotated ultrasound images). In baseline
tests, Mask R-CNN performed best with Dice 77.09% and IoU 65.46%.</p>
      <p>Zhang et al. [21] proposed a dual-branch DenseNet-UNet combining classification and segmentation
on 1600 images, reaching AUC 0.991 and Dice 89.8%, and showing good generalization on external data.</p>
      <p>Vakanski et al. [22] integrated visual attention maps into a modified UNet for 510 images, achieving
Dice 90.5%, demonstrating the benefit of medical prior knowledge in guiding segmentation. Pan et al
[23] introduces a difusion-based framework for synthesizing breast ultrasound images, integrating text
prompts and a shape-aware mask generator to enhance realism and diversity. It efectively addresses
data scarcity, with downstream evaluations showing strong performance boosts. On the BUS_UCLM
dataset, segmentation achieves a peak DSC of 83.5% (Attention UNet at 25% synthetic ratio). Vallez et
al. [24] evaluates deep learning approaches for breast ultrasound lesion detection and classification
using the BUS_UCLM dataset. Among the models tested, Sk-UNet achieved the highest performance
for multi-class segmentation with a IoU score of 79.2% and Dice of 87.3%.</p>
      <p>
        Table 1 summarizes these related works.
Cho et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]
      </p>
      <sec id="sec-2-1">
        <title>Asaf Raza et al. [14] BUSI Mukasheva et al. BUSI [15] Donya et al. [16]</title>
      </sec>
      <sec id="sec-2-2">
        <title>BUSI</title>
        <sec id="sec-2-2-1">
          <title>Dataset</title>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>BUSI</title>
      </sec>
      <sec id="sec-2-4">
        <title>UDIAT</title>
        <sec id="sec-2-4-1">
          <title>Task / Method</title>
        </sec>
        <sec id="sec-2-4-2">
          <title>Model</title>
        </sec>
      </sec>
      <sec id="sec-2-5">
        <title>Segmentation</title>
      </sec>
      <sec id="sec-2-6">
        <title>Classification</title>
      </sec>
      <sec id="sec-2-7">
        <title>Classification</title>
      </sec>
      <sec id="sec-2-8">
        <title>Segmentation</title>
      </sec>
      <sec id="sec-2-9">
        <title>Segmentation</title>
      </sec>
      <sec id="sec-2-10">
        <title>RFS-UNet</title>
      </sec>
      <sec id="sec-2-11">
        <title>ResNet101</title>
      </sec>
      <sec id="sec-2-12">
        <title>Dense UNet</title>
      </sec>
      <sec id="sec-2-13">
        <title>MSSE-ResNet101 Acc = 0.97 Inception Dice = 0.97 Sharp Attention U- Dice = 0.93</title>
        <p>Net</p>
        <sec id="sec-2-13-1">
          <title>Results</title>
          <p>IoU = 0.77 ; Dice = 0.85
Precision = 0.99
Recall = 0.79 ; Precision
= 0.56 ; SF = 0.65
IoU = 0.94 ; Dice = 0.95 ;
Precision = 0.92
Dice = 0.82
Dice = 0.835
Poorya et al. [17]</p>
        </sec>
      </sec>
      <sec id="sec-2-14">
        <title>TDSC-ABUS Segmentation</title>
      </sec>
      <sec id="sec-2-15">
        <title>DATTR2U-Net</title>
      </sec>
      <sec id="sec-2-16">
        <title>Golla Madhu et al. BUSI [18]</title>
      </sec>
      <sec id="sec-2-17">
        <title>Segmentation U-Net</title>
      </sec>
      <sec id="sec-2-18">
        <title>BUSI</title>
      </sec>
      <sec id="sec-2-19">
        <title>Classification</title>
      </sec>
      <sec id="sec-2-20">
        <title>Capsule Network</title>
        <p>Acc = 0.99
Almajalid et al. [19] 221 images</p>
      </sec>
      <sec id="sec-2-21">
        <title>Segmentation</title>
        <p>U-Net
Vallez et al. [20]</p>
      </sec>
      <sec id="sec-2-22">
        <title>BUS-UCLM</title>
      </sec>
      <sec id="sec-2-23">
        <title>Segmentation</title>
      </sec>
      <sec id="sec-2-24">
        <title>Mask R-CNN</title>
        <p>Dice = 0.77 ; IoU = 0.65
Pan et al. [23]</p>
      </sec>
      <sec id="sec-2-25">
        <title>BUS-UCLM</title>
      </sec>
      <sec id="sec-2-26">
        <title>Segmentation</title>
      </sec>
      <sec id="sec-2-27">
        <title>Difusion-based framework Vallze et al. [24]</title>
      </sec>
      <sec id="sec-2-28">
        <title>Zhang et al. [21]</title>
      </sec>
      <sec id="sec-2-29">
        <title>Vakanski et al. [22] BUSI</title>
      </sec>
      <sec id="sec-2-30">
        <title>BUS-UCLM and other datasets</title>
      </sec>
      <sec id="sec-2-31">
        <title>Segmentation</title>
      </sec>
      <sec id="sec-2-32">
        <title>SK-Unet Dice = 0.873 , IoU=0.792 1600 images Segmentation + U-Net + DenseNet AUC = 0.991 ; Dice = Classification 89.8%</title>
      </sec>
      <sec id="sec-2-33">
        <title>Segmentation with attention U-Net-SA ; U-Net- Dice = 90.5% ; Jaccard = SA-C 83.8%</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Proposed Approach</title>
      <p>This section presents two deep learning strategies developed for the automatic segmentation of breast
lesions in ultrasound images. The goal is to enhance diagnostic accuracy and robustness by leveraging
both direct and localization-guided segmentation approaches.</p>
      <p>- Approach 1: Direct semantic segmentation using deep neural networks such as U-Net++,</p>
      <p>U-Net3+, Attention U-Net, TransUNet, and DeepLabv3.
- Approach 2: A localization-guided two-stage segmentation where YOLOv11-n is used to
localize regions of interest, which are subsequently segmented using the same set of models.</p>
      <p>Both approaches were implemented and evaluated on the BUS_UCLM dataset (Breast Ultrasound
Lesion Segmentation Dataset) [20].
3.1. Approach 1: Direct Semantic Segmentation
This approach applies semantic segmentation directly to full breast ultrasound images, aiming to
produce pixel-wise boundaries of tumoral regions. The objective is to generate precise binary masks
that distinguish between lesion and non-lesion areas without relying on any prior localization step.</p>
      <p>To evaluate this strategy, we implemented and compared five state-of-the-art segmentation models
widely used in medical imaging: U-Net++ [25], U-Net3+ [26], Attention U-Net [27], TransUNet [28],
and DeepLabv3 [29]. Each model was trained on the BUS_UCLM dataset and evaluated using standard
performance metrics. The results obtained serve as a baseline for comparison with localization-guided
segmentation strategies (see Figure 1).
In the second approach, we introduce a two-step pipeline to refine segmentation by concentrating on
the region of interest (ROI) of the lesion area identified in the original image. This strategy aims to
reduce false positives and improve precision.</p>
      <p>- Lesion localization: The YOLOv11-n-nano (YOLOv11-n) model, with around 2 million
parameters, is a lightweight detector that delivers accurate results with low computational cost and fast
inference. YOLOv11-n is applied to the full ultrasound image to localize bounding boxes around
suspicious lesions.
- Targeted Segmentation: Detected regions are cropped, resized, and passed to a segmentation
model (same as those used in Approach 1) for semantic mask prediction. The resulting mask is
then reintegrated into the original image coordinate space.</p>
      <p>This guided strategy helps the model concentrate on lesion-relevant pixels, potentially improving
performance in noisy or low-contrast ultrasound contexts (see Figure 1).</p>
      <p>The localization-guided inference process is summarized in Algorithm 1.
Algorithm 1: Localization-Guided Segmentation Pipeline using YOLOv11-n</p>
      <p>Input: Preprocessed dataset  (ultrasound images , ground truth masks  )</p>
      <p>YOLOv11-n detector
segmentation model</p>
      <p>Output: Segmentation metrics (Dice, IoU, etc.)
1 foreach  ∈  do
2  =  11 − () // detect region(s) of interest;
3 if no localization then
4 Predict empty mask;
5 else
1.  = Crop image using predicted  bounding box;
2. = () //Apply segmentation model to obtain binary lesion mask;
3. Reproject segmented mask into original image dimensions;
4. Compute [Dice, IoU, Precision, Sensitivity, Recall](,  );
6 Return: [Dice, IoU, Precision, Sensitivity, Recall];
3.3. BUS_UCLM Dataset
The dataset we selected for ultrasound images in our approaches, BUS_UCLM (Breast ultrasound lesion
segmentation dataset) [20], contains a total of 683 images acquired from 38 patients, divided into three
categories: benign (174), malignant (90), and normal (419).</p>
      <p>
        We split the dataset into 80% for the training set and 20% for the test set.
3.4. Preprocessing of the BUS_UCLM Dataset
Data preprocessing is an essential step to ensure the quality and consistency of the dataset, while also
improving model performance and reducing the risk of overfitting. For the BUS_UCLM dataset, which
consists of breast ultrasound images with corresponding lesion annotations, we applied the following
preprocessing steps:
• Resizing: All images were resized to a fixed resolution of 224 × 224 pixels, which is a common
requirement for convolutional neural network (CNN) architectures.
• Normalization: Pixel intensity values were normalized to the range [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] in order to stabilize
training and ensure consistency across images.
• Tensor Conversion: Images were converted into tensors, enabling eficient processing by deep
learning models during training and inference.
• Cropping Regions of Interest (ROI): In the second approach, we applied cropping to focus on
tumor-specific regions of interest. Using the provided ground-truth masks, the regions containing
lesions were extracted and cropped from the original ultrasound images.
• Data Augmentation: To improve robustness and reduce overfitting, data augmentation
techniques were applied. Specifically:
– Random Horizontal Flip: Applied to both the image and its corresponding mask with a
probability of 50%.
– Random Rotation: Images and masks were rotated by a random angle in the range
[− 10∘ , +10∘ ]. This augmentation was mainly used in the second approach to enhance
model generalization on the cropped ROIs.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Evaluation Metrics and Loss Functions</title>
      <p>To assess the efectiveness of our proposed approach for breast cancer diagnosis, we employed a set of
standard evaluation metrics, including accuracy, F1-score, sensitivity, and precision, which are defined
below. These metrics were used to quantify the performance of both the detection and segmentation
components of the pipeline. In addition, appropriate loss functions were selected to guide model
optimization during training.
4.1. Classification Metrics
- Accuracy:</p>
      <p>Accuracy =</p>
      <p>+  
  +   +   +</p>
      <p>Precision =   +  
- Recall (also called Sensitivity or True Positive Rate):</p>
      <p>Recall =   +</p>
      <p>Represents the counts of correct and incorrect classifications for binary classification problems.
4.2. Segmentation Metrics
- Intersection over Union (IoU):
- Dice Coeficient (F1 Score) :
- Relation between Dice and IoU:</p>
      <p>Dice = 12 +·IIooUU
4.3. Segmentation Loss Functions
- Binary Cross Entropy (BCE):</p>
      <p>IoU(, ) = | ∩ | =
| ∪ |</p>
      <p>+   +  
Dice(, ) = 2| ∩ | =
|| + ||</p>
      <p>2 
2  +   +  
and</p>
      <p>IoU =</p>
      <p>Dice
2 − Dice

1 ∑︁ [ log() + (1 − ) log(1 − )]
ℒBCE = −  =1
where  is the ground truth label,  the predicted probability.
- Dice Loss (derived from Dice coeficient):
2 ∑︀ 
ℒDice = 1 − ∑︀  + ∑︀ 
- Focal Loss (to address class imbalance):</p>
      <p>ℒFocal = − ∑︁(1 − ) ·  log()</p>
      <p>=1
where  is the focusing parameter, and  is the predicted probability.
4.4. YOLO Evaluation Metrics
- Average Precision (AP):
- Mean Average Precision (mAP):</p>
      <p>AP =</p>
      <p>Precision() 
∫︁ 1</p>
      <p>0
mAP =</p>
      <p>1 ∑︁ AP
 =1
1
0.95
∑︁</p>
    </sec>
    <sec id="sec-5">
      <title>5. Experimental Results</title>
      <p>5.1. Approach 1: Direct Semantic Segmentation
In the first strategy, segmentation models were directly applied to all ultrasound images of the
BUS_UCLM dataset. We evaluated five architectures: UNet++, UNet3+, Attention U-Net, TransUNet,
and DeepLabV3.</p>
      <p>Table 3 summarizes the obtained results across diferent metrics (Loss, Dice, IoU, Recall, and Precision).
As illustrated, DeepLabV3 significantly outperformed other models, achieving the highest Dice (84.49%)
and IoU (73.57%), confirming its strong capability for tumor delineation.</p>
      <p>The results demonstrate that DeepLabV3 clearly outperforms the other segmentation models in
terms of Dice and IoU, which confirms its ability to capture fine tumor boundaries more efectively.
UNet++ also shows competitive performance with a balanced precision and recall, making it a reliable
alternative when computational eficiency is considered. In contrast, UNet3+ and Attention U-Net
perform moderately, with Attention U-Net providing slightly higher recall but lower precision, indicating
a tendency to over-segment. TransUNet achieved the weakest results, suggesting that transformer-based
representations may require larger training datasets or additional fine-tuning to handle
ultrasoundspecific noise and variability. Overall, the direct segmentation approach proves efective, but the
variability in performance across models highlights the importance of architecture choice for breast
ultrasound analysis.</p>
      <p>Figure 3 presents visual examples of segmentation masks generated by the diferent models,
highlighting the superior precision of DeepLabV3 compared to the other architectures.
5.2. Approach 2: Localization-Guided Semantic Segmentation
The second approach was designed as a two-step process: (1) lesion detection using YOLOv11, followed
by (2) focused segmentation on detected regions with the same five architectures.</p>
      <p>YOLOv11 achieved competitive detection performance with an mAP@50 of 81.8% and mAP@50-95
of 65%, ensuring robust region proposals for the subsequent segmentation stage.</p>
      <sec id="sec-5-1">
        <title>Value</title>
        <p>81.8%
65%</p>
        <p>DeepLabV3 obtained the best overall results, achieving the highest Dice (94.31%) and IoU (92.10%),
confirming its robustness for accurate lesion delineation.</p>
        <p>TransUNet also performed competitively, recording the lowest loss (0.3061) and the highest recall
(94.93%), which suggests a strong sensitivity in detecting lesion areas. In contrast, Attention U-Net
achieved the best precision (94.67%), reflecting its ability to reduce false positives while maintaining
high segmentation accuracy. Both UNet++ and UNet3+ yielded stable and reliable performances, with
Dice scores of 94.27% and 94.08% respectively, further confirming their efectiveness in this task.</p>
        <p>Figure 5 illustrates segmentation masks generated by the two-stage approach. Compared to the direct
segmentation, the contours are more accurate, and false positives are reduced, confirming the benefit of
integrating a detection step before segmentation.
5.3. Discussion
Table 6 summarizes the performance of the two segmentation strategies. The direct segmentation
approach (Approach 1) provides reasonable results, with DeepLabV3 achieving the highest Dice (84.49%)
and IoU (73.57%), demonstrating its capability to capture tumor boundaries more accurately than the
other architectures. UNet++ also shows competitive performance, while TransUNet and Attention
U-Net are slightly weaker, either in precision or recall.</p>
        <p>In contrast, the two-stage, YOLOv11-guided segmentation approach (Approach 2) substantially
improves performance across all models. By first localizing lesion regions, the models can focus
their learning on relevant areas, resulting in consistently higher Dice and IoU scores (all above 91%).
DeepLabV3 again achieves the best overall performance (Dice 94.31%, IoU 92.10%), confirming its
robustness for precise tumor delineation. Notably, TransUNet attains the highest recall (94.93%),
indicating strong sensitivity to lesion regions, while Attention U-Net reaches the highest precision
(94.67%), reflecting efective reduction of false positives.</p>
        <p>Overall, the results highlight the benefits of integrating a detection step prior to segmentation. The
two-stage framework not only enhances mask accuracy but also reduces over-segmentation and false
positives, making it a reliable strategy for breast ultrasound analysis.
5.4. Comparison with Related Work
The results clearly show that our approach substantially outperforms previous methods. While the
models from Vallez et al. [20] reach only 68–77% in Dice and around 56–65% in IoU, all of our models
exceed 94% in Dice and 91% in IoU, reflecting far more precise and reliable segmentation. Pan et al [ 23]
83.5% in Dice with Attention UNet at 25% synthetic ratio.</p>
        <p>Among our models, DeepLabV3 slightly stands out with the best overall scores, but all others
(UNet++, UNet3+, Attention UNet, and TransUNet) perform very closely, confirming the robustness of
the proposed approach. In summary, our method brings a significant advancement in breast ultrasound
image segmentation by efectively combining detection and segmentation.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion and Future Work</title>
      <p>Breast tumor segmentation in ultrasound imaging is essential for enhancing early diagnosis and
improving patient management. In this study, we introduced a two-stage deep learning framework that
integrates lesion localization with semantic segmentation to better delineate tumors in breast ultrasound
images. The pipeline uses YOLOv11-n-nano, a lightweight detector of around 2 million parameters,
for eficient and robust lesion localization, followed by leading segmentation architectures—U-Net3+,
TransUNet, U-Net++,DeepLabV3, and Attention U-Net—to refine tumor boundaries. Evaluations on the
BUS_UCLM dataset demonstrate that our approach significantly outperforms conventional methods.</p>
      <p>The localization-guided segmentation strategy boosts precision by focusing the segmentation models
on tumor-relevant areas, reducing false positives and emphasizing critical structures. Among the models
tested, DeepLabv3 achieved the highest performance, with a Dice score of 94.31% and an IoU of 92.10%,
while TransUNet’s strong results underscore the potential of hybrid convolutional-transformer designs.
The proposed two-stage framework significantly enhances breast ultrasound lesion segmentation,
achieving a Dice score of 94.31% and reducing diagnostic variability by minimizing manual delineations.</p>
      <p>Looking ahead, future work will aim to enhance the clinical adaptability and generalization of this
framework through several directions:
• Extending the pipeline to process full ultrasound sequences or volumes, allowing the model to
capture contextual continuity between consecutive frames.
• Integrating multimodal ultrasound data—such as Doppler or elastography—to enrich lesion
characterization and improve segmentation accuracy.
• Validating the model across additional ultrasound datasets and acquisition protocols to ensure
robustness and broad applicability.
• Collaborating with clinicians for user-centered evaluations, assessing the tool’s utility in routine
diagnostic workflows.
• Optimizing the pipeline for real-time inference and deploying it on edge devices to support
point-of-care ultrasound applications.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used ChatGPT in order to: Grammar and spelling
check, Paraphrase and reword. After using this tool/service, the author(s) reviewed and edited the
content as needed and take(s) full responsibility for the publication’s content.
ultrasound images for breast cancer diagnosis, Journal of King Saud University-Computer and
Information Sciences 34 (2022) 10273–10292.
[14] A. Raza, N. Ullah, J. A. Khan, M. Assam, A. Guzzo, H. Aljuaid, DeepBreastCancerNet: A novel
deep learning model for breast cancer detection using ultrasound images, Applied Sciences 13
(2023) 2082.
[15] A. Mukasheva, D. Koishiyeva, Z. Suimenbayeva, S. Rakhmetulayeva, A. Bolshibayeva, G. Sadikova,
Comparison evaluation of unet-based models with noise augmentation for breast cancer
segmentation on ultrasound images., Eastern-European Journal of Enterprise Technologies 125
(2023).
[16] D. Khaledyan, T. J. Marini, T. M. Baran, A. O’Connell, K. Parker, Enhancing breast ultrasound
segmentation through fine-tuning and optimization techniques: Sharp attention unet, Plos one 18
(2023) e0289195.
[17] P. MohammadiNasab, A. Khakbaz, H. Behnam, E. Kozegar, M. Soryani, A multi-task self-supervised
approach for mass detection in automated breast ultrasound using double attention recurrent
residual u-net, Computers in Biology and Medicine 188 (2025) 109829.
[18] G. Madhu, A. M. Bonasi, S. Kautish, A. S. Almazyad, A. W. Mohamed, F. Werner, M. Hosseinzadeh,
M. Shokouhifar, Ucapsnet: A two-stage deep learning model using u-net and capsule network for
breast cancer segmentation and classification in ultrasound imaging, Cancers 16 (2024).
[19] R. Almajalid, J. Shan, Y. Du, M. Zhang, Development of a deep-learning-based method for breast
ultrasound image segmentation, in: 2018 17th IEEE International Conference on Machine Learning
and Applications (ICMLA), IEEE, 2018, pp. 1103–1108.
[20] N. Vallez, G. Bueno, O. Deniz, M. A. Rienda, C. Pastor, Bus-uclm: Breast ultrasound lesion
segmentation dataset, Scientific Data 12 (2025) 242.
[21] S. Zhang, M. Liao, J. Wang, Y. Zhu, Y. Zhang, J. Zhang, R. Zheng, L. Lv, D. Zhu, H. Chen, W. Wang,
Fully automatic tumor segmentation of breast ultrasound images with deep learning, Journal
of Applied Clinical Medical Physics 24 (2023) e13863. URL: https://doi.org/10.1002/acm2.13863.
doi:10.1002/acm2.13863.
[22] A. Vakanski, S. G. Lee, N. Baker, Attention-based segmentation of breast tumors in ultrasound
images using saliency maps, Sensors 20 (2020) 2612. doi:10.3390/s20092612.
[23] H. Pan, H. Lin, Z. Feng, C. Lin, J. Mo, C. Zhang, Z. Wu, Y. Wang, Q. Zheng, Breast ultrasound tumor
generation via mask generator and text-guided network: A clinically controllable framework with
downstream evaluation, arXiv preprint arXiv:2507.07721 (2025).
[24] N. Vallez, I. Mateos-Aparicio-Ruiz, M. A. Rienda, O. Deniz, G. Bueno, Comparative analysis of
deep learning methods for breast ultrasound lesion detection and classification, Physica Medica
134 (2025) 104993.
[25] Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, J. Liang, Unet++: A nested u-net architecture for
medical image segmentation, in: N. Navab, J. Hornegger, W. Wells, A. Frangi (Eds.), Deep Learning
in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, Springer, 2018,
pp. 3–11.
[26] H. Huang, et al., Unet 3+: A full-scale connected u-net for medical image segmentation, in: ICASSP
2020 – IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, 2020, pp.
1055–1059.
[27] O. Oktay, J. Schlemper, L. L. Folgoc, M. Lee, M. Heinrich, K. Misawa, K. Mori, S. McDonagh, N. Y.</p>
      <p>Hammerla, B. Kainz, et al., Attention u-net: Learning where to look for the pancreas, arXiv
preprint arXiv:1804.03999 (2018).
[28] J. Chen, Y. Lu, Q. Yu, X. Luo, E. Adeli, Y. Wang, L. Lu, A. L. Yuille, Y. Zhou, Transunet: Transformers
make strong encoders for medical image segmentation, arXiv preprint arXiv:2102.04306 (2021).
[29] Y. Tang, D. Tan, H. Li, M. Zhu, X. Li, X. Wang, J. Wang, Z. Wang, C. Gao, J. Wang, et al.,
Rtc_tonguenet: An improved tongue image segmentation model based on deeplabv3, Digital
Health 10 (2024) 20552076241242773.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>World</given-names>
            <surname>Health</surname>
          </string-name>
          <string-name>
            <surname>Organization</surname>
          </string-name>
          ,
          <article-title>Breast cancer</article-title>
          ,
          <source>WHO Fact Sheets</source>
          (
          <year>2021</year>
          ). URL: https://www.who.int/ news-room/fact-sheets/detail/breast-cancer.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.-R.</given-names>
            <surname>Wuni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Botwe</surname>
          </string-name>
          , T. Akudjedu,
          <article-title>Impact of artificial intelligence on clinical radiography practice: futuristic prospects in a low resource setting</article-title>
          ,
          <source>Radiography</source>
          <volume>27</volume>
          (
          <year>2021</year>
          )
          <fpage>S69</fpage>
          -
          <lpage>S73</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Tarekegn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Giacobini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Michalak</surname>
          </string-name>
          ,
          <article-title>A review of methods for imbalanced multi-label classification</article-title>
          ,
          <source>Pattern Recognition</source>
          <volume>118</volume>
          (
          <year>2021</year>
          )
          <fpage>107965</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>F.</given-names>
            <surname>Touazi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Gaceb</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chirane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hrzallah</surname>
          </string-name>
          ,
          <article-title>Two-stage approach for semantic image segmentation of breast cancer: Deep learning and mass detection in mammographic images</article-title>
          ., in: IDDM,
          <year>2023</year>
          , pp.
          <fpage>62</fpage>
          -
          <lpage>76</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>F.</given-names>
            <surname>Touazi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Gaceb</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Boudissa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Assas</surname>
          </string-name>
          ,
          <article-title>Enhancing breast mass cancer detection through hybrid vit-based image segmentation model</article-title>
          ,
          <source>in: International Conference on Computing Systems and Applications</source>
          , Springer,
          <year>2024</year>
          , pp.
          <fpage>126</fpage>
          -
          <lpage>135</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Medical image segmentation: A comprehensive review of deep learning-based methods</article-title>
          ,
          <source>Tomography</source>
          <volume>11</volume>
          (
          <year>2025</year>
          )
          <fpage>52</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>F.</given-names>
            <surname>Touazi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Gaceb</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Belkadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Loubar</surname>
          </string-name>
          ,
          <article-title>A self-supervised learning approach for detecting brca mutations in breast cancer histopathological images</article-title>
          , in: N.
          <string-name>
            <surname>Shakhovska</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Jiao</surname>
          </string-name>
          , I. Izonin, S. Chrétien (Eds.),
          <source>Proceedings of the 7th International Conference on Informatics &amp; Data-Driven Medicine, Birmingham, United Kingdom, November 14-16</source>
          ,
          <year>2024</year>
          , volume
          <volume>3892</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>183</fpage>
          -
          <lpage>195</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Rahimpour</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.-J. Saint Martin</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Frouin</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Akl</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Orlhac</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Koole</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Malhaire</surname>
          </string-name>
          ,
          <article-title>Visual ensemble selection of deep convolutional neural networks for 3d segmentation of breast tumors on dynamic contrast enhanced mri</article-title>
          ,
          <source>European Radiology</source>
          <volume>33</volume>
          (
          <year>2023</year>
          )
          <fpage>959</fpage>
          -
          <lpage>969</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Saikia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Si</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Deb</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Bora</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mallik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Maulik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <article-title>Lesion detection in women breast's dynamic contrast-enhanced magnetic resonance imaging using deep learning</article-title>
          ,
          <source>Scientific reports 13</source>
          (
          <year>2023</year>
          )
          <fpage>22555</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>N. e. a. Tajbakhsh,</surname>
          </string-name>
          <article-title>Embracing imperfect datasets: A review of deep learning solutions for medical image segmentation</article-title>
          ,
          <source>Medical Image Analysis</source>
          <volume>63</volume>
          (
          <year>2020</year>
          )
          <fpage>101693</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Khaled</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Touazi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Gaceb</surname>
          </string-name>
          ,
          <article-title>Improving breast cancer diagnosis in mammograms with progressive transfer learning and ensemble deep learning</article-title>
          ,
          <source>Arabian Journal for Science and Engineering</source>
          <volume>50</volume>
          (
          <year>2025</year>
          )
          <fpage>7697</fpage>
          -
          <lpage>7720</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>N.</given-names>
            <surname>Laribi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Gaceb</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Touazi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rezoug</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sahad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. O.</given-names>
            <surname>Reggai</surname>
          </string-name>
          ,
          <article-title>Ensemble deep learning of cnn vs vision transformers for brain lesion classification on mri images</article-title>
          ,
          <source>in: Proceedings of the International Conference on Intelligent Data and Digital Media</source>
          (
          <article-title>IDDM), CEUR-WS</article-title>
          .org,
          <year>2024</year>
          , pp.
          <fpage>203</fpage>
          -
          <lpage>219</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3892</volume>
          /paper15.pdf, open Access.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S. W.</given-names>
            <surname>Cho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. R.</given-names>
            <surname>Baek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. R.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <article-title>Deep learning-based multi-stage segmentation method using</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>