=Paper= {{Paper |id=Vol-1213/paper6 |storemode=property |title=SVM-based CBIR of breast masses on mammograms |pdfUrl=https://ceur-ws.org/Vol-1213/paper6.pdf |volume=Vol-1213 |dblpUrl=https://dblp.org/rec/conf/ecai/TsochatzidisZSP14 }} ==SVM-based CBIR of breast masses on mammograms== https://ceur-ws.org/Vol-1213/paper6.pdf

SVM-based CBIR of Breast
Masses on Mammograms

Lazaros Tsochatzidis, Konstantinos Zagoris,
Michalis Savelonas, and Ioannis Pratikakis

Visual Computing Group,
Dept. of Electrical and Computer Engineering,
Democritus University of Thrace,
67100 Xanthi, Greece
{ltsochat,kzagoris,ipratika}@ee.duth.gr
msavelonas@gmail.com
http://vc.ee.duth.gr

Abstract. Mammography is currently the dominant imaging modality
for the early detection of breast cancer. However, its robustness in dis-
tinguishing malignancy is relatively low, resulting in a large number of
unnecessary biopsies. A computer-aided diagnosis (CAD) scheme, ca-
pable of visually justifying its results, is expected to aid the decision
made by radiologists. Content-based image retrieval (CBIR) accounts
for a promising paradigm in this direction. Facing this challenge, we
introduce a CBIR scheme that utilizes the extracted features as input
to a support vector machine (SVM) ensemble. The final features used
for CBIR comprise the participation value of each SVM. The retrieval
performance of the proposed scheme has been evaluated quantitatively
on the basis of the standard measures. In the experiments, a set of 90
mammograms is used, derived from a widely adopted digital database for
screening mammography. The experimental results show the improved
performance of the proposed scheme.

Keywords: Content-Based Image Retrieval, Mammography, Support
Vector Machines

1 Introduction

The use of content-based image retrieval (CBIR) schemes for computer-aided
diagnosis (CAD) has been intensively investigated in the last decade [6]. Such an
approach that facilitates searching for visually similar medical images, provides
radiologists with visual aid and increases their confidence in incorporating CAD-
cued results in their decision making [8].
There is only a limited amount of works devoted to CBIR-based CAD for
breast masses in mammograms, although mammographic CAD is one of a mature
and widely adopted CAD type [8]. An early attempt towards CBIR for breast
masses in mammograms was the work of Alto et al. [1], who investigated the

26
discriminant capability of compactness, fractional concavity, spiculation index
and Haralick’s textural features. In a recent work, Wang et al. [5] tested the
relationship between CAD performance and the similarity level between the
region of interest (ROI) of the query and the ROIs resulting as outputs of CBIR.
All the above works evaluated retrieval performance on the basis of discrim-
inant capability between benign and malignant cases. It can be argued that a
CBIR scheme is expected to retrieve cases on the basis of visual similarity, since
by its own nature it cannot take into account accompanying clinical data. In a
real clinical setting, the results of CBIR could be jointly assessed with all such
data in the context of an integrated CAD scheme. Finally, it can be observed that
all CBIR methods presented were based on simple similarity measures, which
cannot optimally exploit the distribution of mammogram ROIs in the feature
space.

Calculate the Participation
Feature Value from each Trained
Calculation SVM

Dataset OFFLINE

Circumscribed
Spiculated
Y-Axis

Database
X-Axis

Microlobulated

Calculate the Participation
Feature Matching
Value from each Trained Display Results
Calculation SVM Process

Query Image
ONLINE

Fig. 1. The proposed system architecture.

In this work, we present a novel CBIR scheme, which utilizes a support vector
machine (SVM) ensemble. The corresponding SVMs are capable of optimally
exploiting the distribution of input samples in the feature space on the basis of
breast imaging-reporting and data system (BI-RADS) classifications of breast
masses [2], as performed by an expert radiologist. Then, based on each SVM’s
participation value, a new feature-vector is mapped to each feature set. The
used dataset is formed by mammograms of the digital dataset for screening
mammography (DDSM), which is widely adopted by the medical community.
The remainder of the paper is organized as follows: Section 2 describes the
architecture of the proposed system. The experimental evaluation of the pro-
posed CBIR scheme is presented in Section 3. Finally, conclusions and future
perspectives of this work are discussed in Section 4.

2 Proposed CBIR scheme

Fig. 1 shows the proposed system architecture. At this point, it is worth to note
that a mass detection and segmentation stage is applied prior to the proposed

27
pipeline. Since those two stages are out of the scope of the proposed system, there
will be no further discussion about the methodology used. For our experiments,
the segmentation information is taken from the ground truth corpora.
Initially, the following features are extracted from the mass boundaries:
1. Solidity: Fsolidity = A/H, where A and H denote the areas of the shape and
its corresponding convex hull, respectively.
2. Compactness: Fcompactness = 1 − (4πA/P 2 ), where A and P denote the area
and the perimeter of the shape, respectively.
3. Discrete Fourier Transformation (DFT) coefficients of the Normalized Ra-
dial Length (NRL) function. The Radial Length Function corresponds to
the
p distance of each contour point (t) from the mass centroid (xc , yc ): r(t) =
(x(t) − xc )2 + (y(t) − yc )2 . This function is sampled to a fixed number of
points (256 in this work) and is normalized before its DFT computation.
Thereafter, the above feature set is supplied to three different trained SVMs
that correspond to three classes of breast masses, based on BI-RADS [2], namely
spiculated, micro -lobulated and circumscribed.
The Support Vector Machines (SVMs) [3] are based on statistical learning
theory and have been successfully applied to several classification problems be-
cause of their discriminant ability and the fact that do not require large training
sets.
Instead of utilizing the sign of the SVM decision function, we propose to
normalize it, based on [7], in order to calculate a participation value of each
feature vector to each trained SVM, which corresponds to each BI-RADS class.
The normalized decision function is calculated by the following equation:
 n o
 max 1
1 f (x) ,
1
1 −f (x) if f (x) > 0
1+
n3 e 1+ e o
R (x) = 3
(1)
 1 − max 1
, 1
if f (x) < 0
1+ 1 ef (x) 1+ 1 e−f (x)
3 3

where f (x) denotes the SVM decision function. The output of the Eq. 1 repre-
sents the membership value of the data x to the corresponding class and ranges
in the interval [0,1]. Finally, the outputs of the SVMs construct the new three-
element feature vector, used in the remainder of the retrieval process. In the
sequel, the euclidean distance between the query and each indexed samples is
computed leading to a ranked list of similar objects.

3 Experimental evaluation
In this study, 90 regions of interest (ROI) were used, extracted from various
mammograms of DDSM, which contain masses. Each case is accompanied with
ground-truth delineations and additional information, such as the biopsy-proven
pathology of the lesion, its shape and margin types, the overall breast density
and the assessment, of an expert radiologist [4] based on the BI-RADS standards.
The margin types that were taken into consideration in this work are circum-
scribed, micro-lobulated and spiculated, as they are highly correlated with the

28
mass’ pathology. For each margin type, masses of various shapes (oval, round,
lobulated and irregular - Fig.2) were included. For each selected ROI, the con-
tour of the depicted mass was acquired, by an expert radiologist indicating the
exact position of its margin.

Fig. 2. Example ROIs for each margin type: A circumscribed (left), a micro-lobulated
(center) and a spiculated (right) mass.

Two performance evaluation metrics are employed, which measure the sys-
tem’s ability to retrieve masses of similar margin type to the query. The first one
is the Precision at Top 5 Retrieved items (P@5), which defines how successfully
the algorithms produce relevant results to the first 5 position of the ranking list.
The second metric used, is the Mean Average Precision (MAP) which is a typical
measure for the performance of information retrieval systems and it is defined
as the average of the precision value obtained after each relevant retrieved item.
The BI-RADS SVMs used a radial basis function (RBF) kernel, they were
trained by the 2/3 and evaluated with the remainder portion of the dataset.
The proposed method was evaluated against a typical, unsupervised, state-
of-the-art retrieval system employing the euclidean distance, calculated directly
from the features instead of the participation values from each SVM. Com-
parative results are presented in Table 1 and show that the proposed method
outperformed the typical unsupervised euclidean-based retrieval system.

Table 1. Experimental Results

Unsupervised CBIR SVM-based CBIR
Classes
P@5 MAP P@5 MAP
Circumscribed 0.9 0.915 0.9 0.92
Microlobulated 0.71 0.723 0.71 0.763
Spiculated 0.654 0.608 0.745 0.73
Average 0.75 0.743 0.78 0.8

4 Conclusions
This work introduced a CBIR scheme, which utilizes a support vector machine
(SVM) ensemble. The retrieval performance of the proposed scheme has been

29
evaluated on the basis of BI-RADS classifications of breast masses. The used
dataset is formed by 90 cases of the DDSM, which is a dataset widely adopted by
the medical community. The experimental results lead to the conclusion that the
proposed CBIR scheme outperforms standard euclidean-based retrieval, while
greatly reducing the feature vector dimension and, consequently, the computa-
tional cost.
Future perspectives of this work include: 1) the integration of the proposed
CBIR scheme within the context of a mammographic CAD system, which will
also consider accompanying clinical and textual data, 2) the development of a
similar CBIR scheme to facilitate CAD of breast microcalcifications.

Acknowledgements
This work is funded by the Hellenic Republic, Ministry of Education and Re-
ligious Affairs, General Secretariat of Research and Technology (GSRT), and
particularly the National Programme “SYNERGASIA 2011” (11SYN 10 1546)
in the National Strategic Reference Framework (NSRF) 2007-2013.

References
1. Alto, H., Rangayyan, R.M., Desautels, J.E.L.: Content-based retrieval and analysis
of mammographic masses. J. Electronic Imaging 14(2), 023016 (2005)
2. Berg, W.A., Campassi, C., Langenberg, P., Sexton, M.J.: Breast imaging reporting
and data system: inter- and intraobserver variability in feature analysis and final
assessment. Am. J. Roentgenol. 174(6), 1769–1777 (2000)
3. Cortes, C., Vapnik, V.: Support vector networks. Machine Learning 20, 273–197
(1995)
4. Heath, M., Bowyer, K., Kopans, D., Kegelmeyer Jr, P., Moore, R., Chang, K., Mun-
ishkumaran, S.: Current status of the digital database for screening mammography.
In: Digital mammography, pp. 457–460. Springer (1998)
5. Wang, X., Park, S., Zheng, B.: Assessment of performance and reliability of
computer-aided detection scheme using content-based image retrieval approach and
limited reference database. Journal of Digital Imaging 24(2), 352–359 (2011)
6. Welter, P., Fischer, B., Günther, R.W., Deserno (Né Lehmann), T.M.: Generic in-
tegration of content-based image retrieval in computer-aided diagnosis. Comput.
Methods Prog. Biomed. 108(2), 589–599 (2012)
7. Zagoris, K., Ergina, K., Papamarkos, N.: Image retrieval systems based on compact
shape descriptor and relevance feedback information. Journal of Visual Communi-
cation and Image Representation 22(5), 378 – 390 (2011)
8. Zheng, B.: Computer-aided diagnosis in mammography using content-based image
retrieval approaches: Current status and future perspectives. Algorithms 2(2), 828–
849 (2009)