=Paper= {{Paper |id=Vol-2551/paper-05 |storemode=property |title=Lung Nodule Classification Using Convolutional Autoencoder and Clustering Augmented Learning Method(CALM) |pdfUrl=https://ceur-ws.org/Vol-2551/paper-05.pdf |volume=Vol-2551 |authors=Soumya Suvra Ghosal,Indranil Sarkar,Issmail El Hallaoui |dblpUrl=https://dblp.org/rec/conf/wsdm/GhosalSH20 }} ==Lung Nodule Classification Using Convolutional Autoencoder and Clustering Augmented Learning Method(CALM)== https://ceur-ws.org/Vol-2551/paper-05.pdf

Lung nodule classification using Convolutional Autoencoder
and Clustering Augmented Learning Method(CALM)
Soumya Suvra Ghosal Indranil Sarkar Issmail El Hallaoui
soumyasuvraghosal@gmail.com indranil.sarkar.nitdgp@gmail.com issmail.elhallaoui@gerad.ca
NIT Durgapur NIT Durgapur Ecole Polytechnique de Montreal
Durgapur, India Durgapur, India Montreal, Canada
ABSTRACT study of the correlation between gene expression profiles and disease
Early detection of lung cancer can help in a sharp decrease in the states or stages of cells plays an important role in biological and
lung cancer mortality rate, which accounts for more than 17% per- clinical applications. The gene expression profiles can be obtained
cent of total cancer-related deaths. A large number of cases are from multiple tissue samples and comparing the diseased tissue with
encountered by radiologists daily for initial diagnosis. Computer- the normal one. One main challenge in this regard is to determine the
Aided Diagnosis(CAD) systems can assist radiologists by offering a difference between cancerous gene expression in tumor cells and the
second opinion and making the whole process faster. However, one gene expression in normal, non-cancerous tissues. Many machine
drawback of CAD systems is a large amount of data needed to train learning classification techniques and algorithms have been proposed
them, which can be expensive in the medical field. to address this problem. Hence intelligent healthcare systems are an
In this paper, we propose using a generative adversarial network(GAN) important research direction to assist doctors in harnessing medical
as a potential data augmentation strategy to generate more training big data.
data to improve CAD systems. We also propose a convolutional au- And among all types of cancer Lung cancer is harder to detect in
toencoder deep learning framework to support unsupervised image early stages as there is only a dime-sized lesion growth known as
features learning for lung nodule through unlabeled data. The paper a nodule, inside the lung. By the time when it can be detected, is
also introduces Clustering Augmented Learning Method (CALM) already too late for the patient. Also, these small lesions are only
classifier which is based on the concept of simultaneous heteroge- detectable by a CT scan.
neous clustering and classification to learn deep feature representa- Especially it is difficult to identify the images containing nodules,
tions of the features obtained from Convolutional autoencoder. which should be analyzed for assisting early lung cancer diagnosis,
The classification model within CALM consists of a Feedforward from a large number of pulmonary CT images. At present, the image
Neural Net (FNN) architecture. To improve the accuracy of the clas- analysis methods for assisting radiologists to identify pulmonary
sification model, CALM iterates between clustering and learning to nodules consist of four steps:1) region of interest(ROI) definition,
form robust clusters, thereby leveraging the learning process of the 2) segmentation, 3) hand-crafted features and 4) categorization. In
FNN. particular, radiologist has to spend a lot time on checking each image
Computational experiments using the National Cancer Institute for accurately marking the nodule, which is critical for diagnosis
(NCI) Lung Image Database Consortium (LIDC) dataset resulted in and is a research hotspot in intelligence healthcare.
an overall accuracy of 95.3% with a precision of 94.9%. For example, it is proposed to extract texture features for nodules
analysis, but it is hard to find effective texture feature parameters.
CCS CONCEPTS Previously nodules were analyzed by the morphological method
through shape, size, and boundary, etc. However, this analytical ap-
• Computing Methodologies → Machine learning; Feature Selec-
proach is difficult to provide accurate descriptive information. It is
tion; • Information systems → Information systems applications;
because even an experienced radiologist usually gives a vague de-
Data mining; • Applied Computing → Health informatics.
scription based on personal experience and understanding. Therefore,
it is a challenging issue to effectively extract features for represent-
KEYWORDS
ing the nodules.
Convolutional Autoencoder Neural Network, Lung Nodule, Genera- Recently CAD systems have taken advantage of the popular Con-
tive Adversarial Networks, Deep Features volutional Neural Network(CNN), producing state of art detection
results, with 95% sensitivity at only 10 false positives per scan.
ACKNOWLEDGEMENT However, CNN requires a large amount of training data to learn
This work was presented at the first Health Search and Data Mining effectively; in the medical field, obtaining the required data is often
Workshop [5]. costly, time-consuming, or simply not feasible. To deal with these
issues, data augmentation is often used to better train these CAD
1 INTRODUCTION systems.
The use of computer tools, basic machine learning to facilitate and In [3], the authors addressed the challenges by training a deep learn-
enhance medical analysis and diagnosis is a promising area. The ing architecture based on the Convolutional Autoencoder Neural
Network(CANN) for the classification of pulmonary nodules. In-
Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons spired by results obtained, we also use a similar architecture for
License Attribution 4.0 International (CC BY 4.0) extracting deep features from CT images. Besides, we present a
Soumya Suvra Ghosal, Indranil Sarkar, and Issmail El Hallaoui

new way to improve lung nodule detection in existing systems by feature representation of neuro-imaging data of the brain using Deep
augmenting training datasets with the generated image of nodules. Boltzmann Machine (DBM) diagnosis. The methods achieved a
To create these images, we propose the use of a type of Generative maximal diagnostic accuracy of 95.52%. In Riccardi et al. [14] the
Adversarial Network (GAN). The augmentation of data would help authors proposed a new algorithm, which can automatically detect
in more accurate supervised fine-tuning of proposed model.Overall, nodules with an overall accuracy of 71%. It used 3D radial trans-
the proposed method utilizes both the original and generated image forms. Kumar et al. [9] proposed to use deep features extracted from
for unsupervised feature learning and some amount of data for fine- an autoencoder along with a binary decision tree as a classifier to
tuning. Computational experiments show that the proposed method build their proposed system for lung cancer classification. Wu et al.
is effective to extract image features via a data-driven approach, [19] proposed deep feature learning for deformable registration of
and achieves faster labeling for medical data. Specifically, the main brain MR images. They demonstrate that a general approach can be
contributions of this paper are : built to improve image registration by using deep features. Fakoor
et al. [6] proposed a method to enhance cancer diagnosis and clas-
• Application of GANs to augment the training data for computer-
sification from gene expression data using unsupervised and deep
aided lung nodule detection systems and address the issue of
learning methods. Their model used PCA (Principal Component
the insufficiency of training data.
Analysis) to achieve dimensionality reduction in case of the very
• Image features are available to be directly extracted from the
high dimensionality of the initial raw feature space. Chuquicusma
raw image. Such an end-to-end approach does not use an
et al. [4] proved in his paper that the GANs are able to generate
image segmentation method to find the nodules, avoiding loss
realistic fake images that fool even experienced radiologists.
of important information which might affect classification
Maayan et al. [7] used GANs to augment liver lesion images to
results.
improve the multiclass CNN classification. He got an increase from
• The unsupervised data-driven approach can extend to imple-
85.7% and 92.4% sensitivity and specificity which is much higher
ment in other data sets and related applications.
as compared to recent state-of-the-art liver classification methods.
• Devising a classification approach in which data is clustered
Zhu et al. [20] showed in his work that Generative Adversarial Net-
based on their inherent characteristics. In the process of learn-
works(GANs) can be used to complement and complete the training
ing the best clustering solution, the parameters of the classifi-
data manifold. It can find better margins between classes. They had
cation model are optimized, thereby substantially improving
done their work by using GANs to augment the emotion categories
the learning process.
that were lacking in face data and they could achieve a 5% to 10%
increase in the accuracy of emotion classification.
2 RELATED WORKS
In the past, several methods have been proposed to detect and clas- In this paper, we propose a convolutional autoencoder unsuper-
sify lung cancer in CT images using a different algorithm. Aliferis vised learning algorithm for lung CT features learning and CALM
et al. [2] used recursive feature elimination with single variable as- classifier for pulmonary nodules classification. To tackle the issue
sociation filtering approaches to select a small subset of the gene of scarcity of medical labeled images, we use a type of Generative
expressions as a reduced feature set. For better classification Ra- Adversarial Networks(GANs) to augment data to the training set.
maswamy [13] applied recursive feature elimination using SVM to
find similarly a small number of genes. Wang et al. [18] proved that 3 PRELIMINARIES
if the correlation-based feature selector can be combined with a clas-
3.1 Generative Adversarial Networks(GAN)
sification approach then it can obtain good classification results with
high confidence. Sharma et. al [15] proposed to find an informative Generative Adversarial Networks(GANs) are a type of neural net-
subset of gene expression using feature selection methods. It’s like work where two competing networks - the generator and the discrim-
the “Divide & Conquer” approach. As form the subset they are find- inator - are adversarially trained against one another. The discrimi-
ing the informative genes, and then they are combining to form the nator is trained to differentiate between real data and generated data
overall subset. Nanni et al. [11] proposed a method that combines while the generator attempts to fool the discriminator by generating
different feature reduction approaches, useful for gene microarray synthetic data. More specifically, the generator G samples from a
classification. In Zinovev et al. [21], the authors used decision trees previously known data distribution z ∼ Pz (z) (usually a Gaussian)
to classify lung nodules using the LIDC dataset. The features taken and generates data G(z) by putting z through a function G. The
by them are lobulation, texture, speculation, etc. Those are used discriminator D takes in data x and produces a probability that x is a
to create a 63-dimensional feature vector for classification of 914 sample from the real data distribution Pdat a (x). The loss function
instances. The authors got an overall accuracy of 68.66%. Kuruvilla that the discriminator D maximizes and the generator G minimizes is
et al. [10] used six distinct parameters including skewness and fifth
& sixth central moments, which are extracted from segmented single L=minG maxD Ex ∼Pd at a (x ) [log D(x)] + Ez∼Pz (z) [log(1 − D(G(z))]
slices, containing 2 lung images along with the features mentioned
in [1] and have trained a feed-forward backpropagation neural net- While this original GAN is useful for a multitude of tasks, the Jensen-
work. There has also been a renewed interest in the field of deep Shannon divergence as loss function inherently struggles to learn
learning and the latest research in the area of medical imaging using probability distributions between low dimensional manifolds in a
deep learning shows some good results. One such paper is of Suk higher-dimensional space. Wasserstein GANs (WGANs) attempt to
et al., [17] in which the authors propose a novel latent and shared solve this problem by using an approximation of the Earth-Mover
Lung nodule classification using Convolutional Autoencoder and Clustering Augmented Learning Method(CALM)

distance as the loss function, which enables more stable GAN train- spatial locality. The reconstruction is hence due to a linear combina-
ing. The discriminator is now replaced with a critic as its output is tion of basic image patches based on latent code. CAE combines the
no longer a probability; rather, it is a 1-Lipschitz function that tries local convolution connection with the autoencoder, which is a simple
to maximize the difference in score between the real data and the operation to add a reconstruction input for the convolution operation.
generated data. A function is 1-Lipschitz if and only if the norm of The procedure of the convolutional conversion from feature maps
its gradient everywhere is at most 1. The authors of the WGAN paper input to output is called convolutional encoder. Then the output val-
enforces that the critic is 1-Lipschitz by weight-clipping, which may ues are reconstructed through the inverse convolutional operation,
lead to optimization difficulties. The new loss function is as follows: which is called a convolutional decoder. Moreover, the parameters
of the encode and decode operation are calculated through standard
L=minG maxD ∈D Ex ∼Pd at a (x ) [log D(x)] − Ez∼Pz (z) [log D(G(z))] autoencoder unsupervised greedy training.
Input feature maps x ∈ Rn×l ×l , which are obtained from the input
Where D is the set of 1-lipshitz functions.

3.2 Autoencoder
An autoencoder takes an input x∈ Rd and first maps it to latent
representation h∈ Rd ′ using a deterministic function of type h
= fθ = σ (W x + b) with parameters θ = {W,b}. This “code” is
then used to reconstruct the input by a reverse mapping of f: y=
′ ′ ′ ′ ′
fθ ′ (h) = σ (W x + b ) with θ = {W ,b }. The two parameter sets are
usually constrained to be of form W = W T , using the same weights
′

for encoding the input and decoding the latent representation. Each
training pattern x i is then mapped onto its code hi and its reconstruc-
tion yi . The parameters are optimized, minimizing an appropriate
cost function over the training set D n = {(x 0, t 0 ), ..., (x n , tn )}.

3.3 Denoising Autoencoders(DAE) Figure 1: Convolutional Autoencoder
Without any additional constraints, conventional autoencoders learn
identity mapping. This problem can be circumvented by using a prob- layer or the previous layer. It contains n feature maps, and size of
abilistic RBM(Restricted Boltzmann Machine) approach, or sparse each feature map is l × l pixels. The convolutional autoencoder oper-
coding, or denoising autoencoders trying to reconstruct noisy inputs. ation includes m convolutional kernels, and the output layer output
The latter performs as well as or even better than RBMs. Training in- m feature maps. When the input feature maps from previous layer, n
volves the reconstruction of a clean input from a partially destroyed represents the number of output feature maps from the previous layer.
one. Input x becomes corrupted input x by adding a variable amount The size of convolutional kernel is d ×d, where d ≤ l. θ ={W,Ŵ , b, b̂}
v of a noise distributed according to the characteristics of the input represents the parameters of convolutional autoencoder layer need
image. Common choices include binomial noise(switching pixels on to be learned, while b∈ Rm and W={w j ,j=1,2,...,m} represents
or off) for black and white images or uncorrelated Gaussian noise for the parameters of convolutional autoencoder, where w j ∈ Rn×l ×l
color images. Parameter v represents the percentage of permissible
is defined as a vector w j ∈ Rnl . And Ŵ ={wˆj ,j=1,2,...,m} and b̂
2

corruption. The auto-encoder is trained to denoise the inputs by first
represent the parameters of convolutional decoder, where wˆj ∈ Rnl .
2
finding the latent representation h= fθ (x) = σ (W x + b) from which
′ ′
it reconstructs the original input y= fθ ′ (h) = σ (W h + b )
First the input image is encoded that each time a d × d pixels
3.4 Convolutional Neural Networks patch x i ,i=1,2,...,p is selected from input image, and then the weight
w j of the convolutional kernel j is used for convolutional calculation.
CNN’s are hierarchical models whose convolutional layers alternate Finally the neuron value oi j ,j=1,2,...,m is calculated from the output
with subsampling layers, reminiscent of simple and complex cells in layer.
the primary visual cortex. The network architecture consists of three
oi j = f (x i ) = σ (Wj x i + b)
basic building blocks to be stacked and composed as needed,i.e, the
convolution layer, the max-pooling layer, and the classification layer. where σ is a nonlinear activation function, often including three
functions,i.e, the sigmoid function, the hyperbolic tangent function,
3.5 Convolutional Auto Encoder(CAE) and the rectified linear function(Relu). We implemented Relu in this
paper.
A fully connected autoencoder ignores a 2-D image structure. This
is not only a problem when dealing with realistically sized inputs but
Then oi j output from the convolutional decode is encoded that x i
also introduces redundancy in the parameters, forcing each feature
is reconstructed via oi j for generated xˆi .
to be global. However, the trend in vision and object recognition ′
adopted by most successful models is to discover localized features xˆi = f (oi j ) = ϕ(Ŵi oi j + b̂)
that repeat themselves all over the input. CAEs differ from conven- xˆi is generated after each convolutional encode and decode. P
tional AEs as their weights are shared among all the input, preserving patches are obtained from reconstruction operation of dimension
Soumya Suvra Ghosal, Indranil Sarkar, and Issmail El Hallaoui

d × d. We use the mean square error between the original patch
of input image x i ,(i=1,2,...p) and the reconstructed patch of image
x̂ i ,(i=1,2,...p) as the cost function. Furthermore, the cost function
and reconstruction error is described as:

p
1Õ
JCAE (θ ) = L[x i , xˆi ]
p i=1

LCAE [x i , xˆi ] = ||x i − xˆi || 2 = ||x i − ϕ(σ (x i ))|| 2
Through stochastic gradient descent(SGD), the weight and error Figure 2: Block Diagram of Proposed Model
are minimized, and the convolutional autoencoder layer is optimized.
Finally, the trained parameters are used to output the feature maps mapping from X to Y. Specifically, the first layer L 1 receives the
which are transmitted to the next layer. input image x and the middle layer has three convolution layers and
three pooling layers.
4 METHODOLOGY
For our model, we will be using WGAN with gradient penalty Algorithm 1: Unsupervised Training of CAE
(WGAN-GP), a version of WGAN that replaces weight-clipping
with a gradient penalty of the critic - constraining the gradient norm 1 Given dataset U, number of convolution, pooling layer along
of the critic’s output concerning its input. This allows for more with all weight matrices and bias vectors are randomly
stable GAN training. The optimal WGAN or WGAN-GP critic will initialized
contain straight lines with gradient norm 1 connecting coupled points 2 −1
i←−
between Pdat a and PG(z) ; since enforcing the unit gradient norm
3 if i==1 then
constraint everywhere is intractable, it is only enforced along these
4 The input of Ci is U
straight lines. The new loss function is as follows:
L=minG maxD ∈D Ez∼Pz (z) [log D(G(z)] − Ex ∼Pd at a (x ) [D(x)] + 5 else
6 The input of Ci is output of Pi
λEx̂ ∈Px̂ (x̂ ) [(||∇x̂ D(x̂)||2 − 1)2 ]
7 Greedy layer wise training Ci
Where λ is the weight given to the gradient penalty. x̂ ∼ P(x̂) are
random samples that have uniform distribution along straight lines 8 Find parameters of Ci by cost function
between pairs of points sampled from the real data distribution Pdat a 9 Output of Ci is input to Pi
and the generated data distribution PG(z) . We hypothesize that gen- 10 Max Pooling Operator
erated data can improve lung nodule detection sensitivity, allowing
for better training of CAD systems with existing data. We can use 11 if i < N then
the generator to produce new training data to augment the existing 12 goto line 3
training data.

Since the workload for labeling ROI is high and the pulmonary The convolutional autoencoder has the following architecture :
nodules are difficult to be recognized, the CT images are divided • Input: 40 × 40 patch image from CT image
into small patch areas for training the network. The patch divided • C1: Convolution kernel of size 5 × 5, Number of kernel is 50,
from the CT image is input to Convolutional Autoencoder(CAE) non linear function is ReLU.
for the purpose of learning the feature representation, which is used • P1: Max pooling is used, the size of pooling area is 2 × 2 with
for classification. The parameters of convolution layers in CNN are stride 2.
determined by autoencoder unsupervised learning, and some data • C2: Convolution kernel of size 3 × 3, Number of kernel is 50,
is used for fine-tuning the parameters of the CAE and training the non linear function is ReLU.
classifier. • P2: Max pooling is used, the size of pooling area is 2 × 2 with
The patch divided from the original CT image can be represented stride 2.
as x ∈ X, X ⊂ Rm×d ×d , where m represents the number of the input • C3: Convolution kernel of size 3 × 3, Number of kernel is 50,
channel, and d × d represents the input image size. The labeled data non linear function is ReLU.
are represented as y ∈ Y , Y⊂ Rn , where n represents the number of • P3: Max pooling is used, the size of pooling area is 2 × 2 with
output classification. Through the proposed model, it is expected to stride 2.
deduce the hypothesis function from the training,i.e.,f: X−−→Y and The convolutional autoencoder is trained in an unsupervised manner,
the set of parameters θ . which is explained in Algorithm 1 and the parameters are optimized
through SGD. A mini-batch size of 100 samples and 150 iterations
In the proposed model, the hypothesis function f based on deep for each batch is used.
learning architecture consists of multiple layers, which is not a direct The output from the last pooling layer is fed as input to the CALM
Lung nodule classification using Convolutional Autoencoder and Clustering Augmented Learning Method(CALM)

classifier, which is explained in 5. As a result, the clustering process would aggregate the data having
similar characteristics resulting in better learning by the FNN model.
We include the following additional constraints:
5 CLUSTERING AUGMENTED LEARNING chl = 1 (1)
METHOD (CALM)
cha = 0, ∀a ∈ {1, . . . , |C |}, l , a (2)
5.1 Proposed Approach
Input augmentation We consider a matrix of input data D and a set 5.2 Clustering Problem
of cluster centers C. Since in this case study, there are probabilities We have a distance/dissimilarity measure dil between input examples
of the nodule being either malignant or not, we keep C as 2. In this i ∈ D and cluster centers l ∈ C. The clustering problem aims to
paper, we use clustering to augment input data x ∈ D for better assign each input example to a cluster such that the total distance
learning. To augment the input data, we add a new set of features between the elements of a cluster and its center is minimized. We
representing either an input example belongs to a cluster or not. introduce a new set of binary variables c il that is equal to 1 if input
To distinguish input examples, we introduce an additional index example i ∈ D belongs to the cluster whose center is l ∈ C, and 0
h ∈ {1, . . . , |D|} representing the number of an input example (x 1 is otherwise. The clustering problem is formulated as follows:
the first input example of D). We define also a vector ch composed ÕÕ
min d il c il (3)
of chl , l ∈ C for each example xh ∈ D. It is a one-hot representation i ∈D l ∈C
containing zeros except for the index of the cluster it belongs to Õ
s.t. c il = 1, ∀i ∈ D And c il ∈ {0, 1}, ∀i ∈ D, ∀l ∈ C (4)
(e.g. c 1 = [0, 1] means that the first input example x 1 belongs to the l ∈C
2nd cluster out of 2 clusters). Finally, we augment input examples The objective function (3) minimizes the total distance between
by concatenating the vector xh with the vector ch for each h ∈ a cluster center and its elements. Constraints (4) ensure that each
{1, . . . , |D|}. element is assigned to exactly one cluster and that the decision
variables are binary.
In this paper, we also propose a novel dissimilarity measure based
on the weights of the trained FNN model. It uses the average of
weights linked to each neuron of the input layer. Assuming that the
original input (without the new clustering feature) has d dimensions
(xh = [xh1 , . . . , xhd ], h ∈ {1, . . . , |D|}) and the weight linking node n
of the input layer to node j ∈ {1, . . . , n 1 } of the following layer is
w nj , the two distances measures are formulated as follows:
dil = w nj |x ik − xlk |
Í
avg
n ∈ {1...d } j ∈ {1,...,n 1 }
Thus the distance measure computes the distance between two exam-
ples based on how important is the contribution of each input feature
to the resulting prediction. Therefore, the resulting clusters contain
examples with similar potential to improve the classification results.

Figure 3: Architecture of Clustering Augmented Learning 5.3 Proposed Algorithm
Method(CALM) Classifier
As in Fig. We propose an approach (Algorithm 2) where we it-
eratively train the FNN classifier, use its weights for input data
Cluster centers To determine the cluster centers, CALM consists of
clustering thus changing the input vector, train again the FNN classi-
a clustering model and a Feed-Forward Neural Net(FNN) having a
fier using the new input data, and so on until a stopping criterion is
softmax output to classify the lung nodules. For the clustering model,
attained. The stopping criterion is triggered if the cluster assignment
we propose to use a Random Forest classifier to determine cluster
remains the same for consecutive 10 iterations, i.e., the clustering
centers. After the FNN is trained using a state-of-the-art solver for
problem converges.
data belonging to a single cluster ∈ {1, . . . , |C |}, a Random Forest
The configuration of the proposed model is given as:
Classifier is used to find the best cluster center. Hence we repeat
|C | instances of training the FNN to find the |C | centers. For any A) Classification Model: FC1 −→ − Leaky ReLU −→ − FC2 −→ −
instance l of the model, we use one hot encoded vector of l as Leaky ReLU −→FC3
− −→
− Softmax . Dimension of FC1: 128.
labels for all the input sample in that cluster to train the random Dimension of FC2: 32. Dimension of FC3: 2.
classifier in a supervised manner. In simple words, while predicting B) Optimizer: ADAM Learning Rate 0.001, momentum rate
center of 2nd cluster (for example) we use [0, 1] as label for all 0.9, weight decay(L2 regularization):1e-4.
input sample in that cluster, since |C | is 2. We propose that the input
sample which has the lowest error in predicting its cluster label is 6 DATASET
considered as the center of that cluster in the subsequent iteration of The Lung Image Database Consortium (LIDC) has made a database
the proposed approach. In such a manner, the center would be the publically available that contains thoracic CT images of 1010 pa-
input sample which is the most fitting representative of that cluster. tients of lung cancers, and each scan has been annotated by up to 4
Soumya Suvra Ghosal, Indranil Sarkar, and Issmail El Hallaoui

Algorithm 2: Clustering-augmented learning method number of training samples is 4338 positive + 4338 negative= 8676,
so the number of augmented data added is [4338*0.5]=2169 posi-
Step 0: Data obtained after extracting information using
tive and [4338*0.5]=2169 negative samples. Since negative training
Convolutional Autoencoder(CAE) acts as input to CALM.
volumes are easy to obtain, the WGAN-GP is trained on all of the
Step 1: Initialization of the cluster centers u 1, ...u |C | positive training examples so that it will generate positive data.
randomly. Clustering of the output data obtained from
Convolutional Autoencoder(CAE) and augmenting each data Table 1: Performance of models
sample with its one-hot encoded cluster label.
Model Accuracy Precision Recall F1 AUC
Step 2: Training the FNN classifier & clustering model
GAN+CAE+CALM
foreach l ∈ {1 . . . |C |} do (Proposed Model)
95.3% 94.9% 95% 95% 0.97
Train the FNN model on data belonging to cluster l to
GAN+CAE+NN 94.2% 94.6% 93.5% 93.5% 0.93
learn classification.
For supervised training of the random forest classifier we GAN+CAE+LR 90.3% 92% 92% 92% 0.91
use one hot encoded representation of clusters as labels. GAN+CAE+SVM 90.1% 92% 92% 92% 0.90
Running the clustering model gives the cluster center ul .
AE[9] 77% 76% 77% 77% 0.83
Step 3: Clustering
CNN 89% 88% 90% 89% 0.95
Update dissimilarity matrix using W ∗
Where GAN represents Generative Adversarial Network, CAE represents
if stopping criterion is attained then Stop. Convolutional Autoencoder, CALM represents Clustering Augmented
else go to Step 2. Learning Method, NN represents Neural Network, LR represents Logistic
Regression, SVM represents linear kernel Support Vector Machine.

radiologists on semantic characteristics and malignancy.The ratings
were obtained by performing the biopsy, surgical resection, progres-
sion or reviewing the radiological images to show 2 years of nodule
state at two levels; first at the patient level and second diagnosis at
the second level. The LIDC database of thoracic CT studies for 1010
patients was acquired over a long period with various scanners.
We excluded nodules with outliers in x, y or z dimensions. Out-
liers are defined as values more than 1.5 times the interquartile range
above the third quartile. We also excluded scans with slice thickness
greater than 2.5 mm. This left 666 CT scans for training and 86 CT
scans for evaluation. To reduce noise in our training data, we also
exclude nodules by less than 3 radiologists.
The LIDC dataset also provides information and coordinates on
each nodule. We chose an input size of 40 × 40 since that is large
enough to fully contain the largest nodules. Classic data augmen-
tation was performed on the positive examples: translations of up
to 10 pixels in the XY plane are added to the positive training set.
Negative data is defined as inputs that did not contain nodules agreed
on by any radiologists. The final input data has 5422 image labels
of size 40px × 40px. For comparison, the size of a whole CT scan
Figure 4: Plot of Intra-Cluster Variance vs Iterations
is 512px × 512px × N slices, where N corresponds to the number of
slices, ranging from [65,764] for different CT scans. The training
and evaluation sets are randomly partitioned following proportion
8:2. Precisely, there are [0.8×5422]=4338 initial positive training 7 RESULT
examples, and since we want our initial training data to be bal- The convolutional neural network for learning lung nodule image
anced, we also take 4338 initial negative training examples of a feature is similar to common image feature learning. Both CNN
practically infinite number available. In total, the initial training data and conventional learning use the labeled dataset, and learn the
consists of 8676(4338 positive + 4338 negative) training examples network parameters between each layer from the input layer to
and 2168(1084 positive + 1084 negative) validation examples. the output layer by use of forwarding and backward propagation
In this paper, to improve lung nodule detection in existing CADe methods. We compare the classification performance of the proposed
systems, we augment training data-sets with generated images ob- model, autoencoder(AE)[9], convolutional neural network(CNN)
tained using Generative Adversarial Network (GAN). We used an with the same dataset. Results are shown in Table(1) and Receiver
augmentation rate of 50% while using GANs. Since the original Operating Characteristics Curve(ROC) is shown in Fig.6. To justify
Lung nodule classification using Convolutional Autoencoder and Clustering Augmented Learning Method(CALM)

Table 2: Comparison with Literature

Model Accuracy
Proposed model 95.3%
Kuruvilla and Gunavathi[10] 93.3%
Nascimento et al. [12] 92.78%
Krewer et al. [8] 90.91%
da Silva [16] 82.3%
Kumar et al. [9] 77%

8 CONCLUSION
Figure 5: Training and Testing Loss In this paper, we present a novel approach to assist in CT image
analysis. Approaches based on segmentation and handcrafted fea-
tures are time-consuming and labor-intensive, while the data-driven
the contribution of the CALM classifier, we also compare the results approach is available to avoid the loss of important information in
by using traditional classifiers such as logistic regression, linear nodule segmentation. Methods based on Convolutional Neural Net-
kernel support vector machine on the features obtained from the work(CNN) suffer from the scarcity of labeled data in the medical
last pooling layer of the convolutional autoencoder. Moreover, Fig.4 domain. To overcome that issue, in this paper, we propose the use of
shows how the intra-cluster variance decreases after approximately Generative Adversarial Networks to augment training data. We lever-
75 iterations and then stabilizes. To measure intra-cluster variance, age Convolutional Autoencoder architecture for feature learning, in
we used Euclidean distance in this case study. Similarly, it is evident which the network is initially trained in an unsupervised manner with
from Fig. 5 that testing loss starts decreasing after 80 epochs and a large amount of data and later on the classifier is fine-tuned using
gradually as the clustering solution converges the accuracy begins a supervised approach. Referring to the result and the comparison ta-
to improve. This observation bolsters our initial assumption that ble, our proposed system outperforms the literature mentioned in the
clustering data based on inherent characteristics would improve the related works section. In the future, we will work on amalgamating
learning process of FNN. domain knowledge and data-driven feature learning.
The accuracy, precision, recall, F1, and AUC of the proposed
method are 95.3%, 94.9%, 95%, 95% and 0.97 respectively. For AE REFERENCES
(Autoencoder) method, we train the neural net in an unsupervised [1] A. A. Abdullah and S. M. Shaharum. 2012. Lung cancer cell classification method
using artificial neural network. Information Engineering Letters 2, 1 (2012),
manner and test on the same dataset for classification. We use 1024 49–59.
neurons in the fully connected layer in the AE method. We have also [2] C Aliferis, I Tsamardinos, P Massion, P Fanananpazir, D Hardin, A Statnikov, N
Fananapazir, and D Hardin. 2003. Machine learning models for classification of
lung cancer and selection of genomic markers using array gene expression data.
In FLAIRS.
[3] Min Chen, Xiaobo Shi, Yin Zhang, Di Wu, and Guizani Mohsen. 2017. Deep
Feature Learning for Medical Image Analysis with Convolutional Autoencoder
Neural Network. IEEE transactions on Big Data (2017).
[4] M. J. M. Chuquicusma, S. Hussein, J. Burt, and U. Bagci. 2018. How to fool
radiologists with generative adversarial networks? a visual turing test for lung
cancer diagnosis. In 2018 IEEE 15th International Symposium on Biomedical
Imaging (ISBI 2018). IEEE, 240–244.
[5] Carsten Eickhoff, Yubin Kim, and Ryen White. 2020. Overview of the Health
Search and Data Mining (HSDM 2020) Workshop. In Proceedings of the Thir-
teenth ACM International Conference on Web Search and Data Mining (WSDM
’20). ACM, New York, NY, USA. https://doi.org/10.1145/3336191.3371879
[6] R. Fakoor, F. Ladhak, A. Nazi, and M. Huber. 2013. Using deep learning to
enhance cancer diagnosis and classification. In ICML Workshop on the Role of
Machine Learning in Transforming Healthcare (WHEALTH). ICML, 4493–4498.
[7] M. Frid-Adar, I. Diamant, E. Klang, M. Amitai, J. Goldberger, and H. Greenspan.
2018. Gan-based synthetic medical image augmentation for increased cnn perfor-
mance in liver lesion classification. arXiv:1803.01229 (2018).
[8] H. Krewer, B. Geiger, and L. O. Hall. 2013. Effectoftexturefeatures in computer
aided diagnosis of pulmonary nodules in low-dose computed tomography. In
IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE,
3887–3891.
Figure 6: ROC on classification [9] D Kumar, A Wong, and D.A Clausi. 2015. Lung Nodule Classification using deep
features in CT images. In 12th IEEE Conference on Computer and Robot Vision
(CRV). IEEE, 133–138.
[10] J Kuruvilla and K Gunavathi. 2014. Lung cancer classification using neural
compared the proposed model with accuracy obtained by previous networks for ct images. Computer methods and programs in biomedicine 113, 1
literature. Comparison is shown in Table(2). (2014), 202–209.
Soumya Suvra Ghosal, Indranil Sarkar, and Issmail El Hallaoui

[11] L. Nanni, S. Brahnam, and A. Lumini. 2012. Combining multiple approaches for SVM. In Machine Learning and Data Mining in Pattern Recognition.
gene microarray classification. Bioinformatics (2012), 1151–1157. [17] H. I. Suk, S. W. Lee, D. Shen, and A. D. N. Initiative. 2014. Hierarchical feature
[12] L. B. Nascimento, A. C. de Paiva, and A. C. Silva. 2012. Lung nodules classifi- representation and multimodal fusion with deep learning for ad/mci diagnosis.
cation in CT images using Shannon and Simpson diversity indices and SVM. In NeuroImage 101 (2014), 569–582.
Machine Learning and Data Mining in Pattern Recognition. 454–466. [18] Y Wang, I.V Tetko, M.A Hall, E Frank, A Facius, K.F. X Mayer, and H.W Mewes.
[13] S Ramaswamy, P Tamayo, R Rifkin, S Mukherjee, C Yeang, M Angelo, C Ladd, 2005. Gene selection from microarray data for cancer classification-a machine
M Reich, E Latulippe, J.P Mesirov, T Poggio, W Gerald, M Loda, E.S Lander, , learning approach. Comput. Biol. Chem. 29, 1 (2005), 37–46.
and T.R Golub. 2001. Multiclass cancer diagnosis using tumor gene expression [19] G. Wu, M. Kim, Q. Wang, Y. Gao, S. Liao, and D. Shen. 2013. Unsupervised deep
signatures. In National Academy of Sciences of the United States of America. feature learning for deformable registration of mr brain images. In Medical Image
[14] A. Riccardi, T. S. Petkov, G. Ferri, M. Masotti, and R. Campanini. 2011. Computer- Computing and Computer-Assisted Intervention–MICCAI 2013. SPRINGER, 649–
aided detection of lung nodules via 3d fast radial transform, scale space represen- 656.
tation, and zernike mip classification. Medical physics 38, 4 (2011), 1962–1971. [20] X. Zhu, Y. Liu, J. Li, T. Wan, and Z. Qin. 2018. Emotion classification with data
[15] A. Sharma, S. Imoto, and S Miyano. 2012. A top-r feature selection algorithm for augmentation using generative adversarial networks. In Pacific-Asia Conference
microarray gene expression data. IEEE/ACM Trans. Comput. Biol. Bioinformatics on Knowledge Discovery and Data Mining. SPRINGER, 349–360.
9 (2012), 754–764. [21] D. Zinovev, J. Feigenbaum, J. Furst, and D Raicu. 2011. Probabilistic lung nodule
[16] G. L. F. da Silva, A. C. Silva, A. C. de Paiva, and M. Gattass. [n.d.]. Lung nodules classification with belief decision trees. In Engineering in Medicine and Biology
classification in CT images using Shannon and Simpson diversity indices and Society, EMBC. IEEE, 4493–4498.