CCS CONCEPTS

Lung nodule classification using Convolutional Autoencoder and Clustering Augmented Learning Method(CALM)

Soumya Suvra Ghosal

soumyasuvraghosal@gmail.com 1

Indranil Sarkar

indranil.sarkar.nitdgp@gmail.com 1

Issmail El Hallaoui

issmail.elhallaoui@gerad.ca 0

Convolutional Autoencoder Neural Network, Lung Nodule, Genera-

2 0 Ecole Polytechnique de Montreal , Montreal , Canada 1 NIT Durgapur , Durgapur , India 2 tive Adversarial Networks , Deep Features

Early detection of lung cancer can help in a sharp decrease in the lung cancer mortality rate, which accounts for more than 17% percent of total cancer-related deaths. A large number of cases are encountered by radiologists daily for initial diagnosis. ComputerAided Diagnosis(CAD) systems can assist radiologists by offering a second opinion and making the whole process faster. However, one drawback of CAD systems is a large amount of data needed to train them, which can be expensive in the medical field. In this paper, we propose using a generative adversarial network(GAN) as a potential data augmentation strategy to generate more training data to improve CAD systems. We also propose a convolutional autoencoder deep learning framework to support unsupervised image features learning for lung nodule through unlabeled data. The paper also introduces Clustering Augmented Learning Method (CALM) classifier which is based on the concept of simultaneous heterogeneous clustering and classification to learn deep feature representations of the features obtained from Convolutional autoencoder. The classification model within CALM consists of a Feedforward Neural Net (FNN) architecture. To improve the accuracy of the classification model, CALM iterates between clustering and learning to form robust clusters, thereby leveraging the learning process of the FNN. Computational experiments using the National Cancer Institute (NCI) Lung Image Database Consortium (LIDC) dataset resulted in an overall accuracy of 95.3% with a precision of 94.9%.

CCS CONCEPTS

• Computing Methodologies → Machine learning; Feature Selection; • Information systems → Information systems applications; Data mining; • Applied Computing → Health informatics.

ACKNOWLEDGEMENT

This work was presented at the first Health Search and Data Mining Workshop [ 5 ].

INTRODUCTION

The use of computer tools, basic machine learning to facilitate and enhance medical analysis and diagnosis is a promising area. The study of the correlation between gene expression profiles and disease states or stages of cells plays an important role in biological and clinical applications. The gene expression profiles can be obtained from multiple tissue samples and comparing the diseased tissue with the normal one. One main challenge in this regard is to determine the difference between cancerous gene expression in tumor cells and the gene expression in normal, non-cancerous tissues. Many machine learning classification techniques and algorithms have been proposed to address this problem. Hence intelligent healthcare systems are an important research direction to assist doctors in harnessing medical big data.

And among all types of cancer Lung cancer is harder to detect in early stages as there is only a dime-sized lesion growth known as a nodule, inside the lung. By the time when it can be detected, is already too late for the patient. Also, these small lesions are only detectable by a CT scan.

Especially it is difficult to identify the images containing nodules, which should be analyzed for assisting early lung cancer diagnosis, from a large number of pulmonary CT images. At present, the image analysis methods for assisting radiologists to identify pulmonary nodules consist of four steps:1) region of interest(ROI) definition, 2) segmentation, 3) hand-crafted features and 4) categorization. In particular, radiologist has to spend a lot time on checking each image for accurately marking the nodule, which is critical for diagnosis and is a research hotspot in intelligence healthcare.

For example, it is proposed to extract texture features for nodules analysis, but it is hard to find effective texture feature parameters. Previously nodules were analyzed by the morphological method through shape, size, and boundary, etc. However, this analytical approach is difcfiult to provide accurate descriptive information. It is because even an experienced radiologist usually gives a vague description based on personal experience and understanding. Therefore, it is a challenging issue to effectively extract features for representing the nodules.

Recently CAD systems have taken advantage of the popular Convolutional Neural Network(CNN), producing state of art detection results, with 95% sensitivity at only 10 false positives per scan. However, CNN requires a large amount of training data to learn effectively; in the medical field, obtaining the required data is often costly, time-consuming, or simply not feasible. To deal with these issues, data augmentation is often used to better train these CAD systems.

In [ 3 ], the authors addressed the challenges by training a deep learning architecture based on the Convolutional Autoencoder Neural Network(CANN) for the classification of pulmonary nodules. Inspired by results obtained, we also use a similar architecture for extracting deep features from CT images. Besides, we present a new way to improve lung nodule detection in existing systems by augmenting training datasets with the generated image of nodules. To create these images, we propose the use of a type of Generative Adversarial Network (GAN). The augmentation of data would help in more accurate supervised fine-tuning of proposed model.Overall, the proposed method utilizes both the original and generated image for unsupervised feature learning and some amount of data for finetuning. Computational experiments show that the proposed method is effective to extract image features via a data-driven approach, and achieves faster labeling for medical data. Specifically, the main contributions of this paper are : • Application of GANs to augment the training data for computeraided lung nodule detection systems and address the issue of the insufficiency of training data. • Image features are available to be directly extracted from the raw image. Such an end-to-end approach does not use an image segmentation method to find the nodules, avoiding loss of important information which might affect classification results. • The unsupervised data-driven approach can extend to implement in other data sets and related applications. • Devising a classification approach in which data is clustered based on their inherent characteristics. In the process of learning the best clustering solution, the parameters of the classification model are optimized, thereby substantially improving the learning process. 2

RELATED WORKS

In the past, several methods have been proposed to detect and classify lung cancer in CT images using a different algorithm. Aliferis et al. [ 2 ] used recursive feature elimination with single variable association filtering approaches to select a small subset of the gene expressions as a reduced feature set. For better classification Ramaswamy [ 13 ] applied recursive feature elimination using SVM to ifnd similarly a small number of genes. Wang et al. [ 18 ] proved that if the correlation-based feature selector can be combined with a classification approach then it can obtain good classification results with high confidence. Sharma et. al [ 15 ] proposed to find an informative subset of gene expression using feature selection methods. It’s like the “Divide & Conquer” approach. As form the subset they are finding the informative genes, and then they are combining to form the overall subset. Nanni et al. [ 11 ] proposed a method that combines different feature reduction approaches, useful for gene microarray classification. In Zinovev et al. [ 21 ], the authors used decision trees to classify lung nodules using the LIDC dataset. The features taken by them are lobulation, texture, speculation, etc. Those are used to create a 63-dimensional feature vector for classification of 914 instances. The authors got an overall accuracy of 68.66%. Kuruvilla et al. [ 10 ] used six distinct parameters including skewness and fifth & sixth central moments, which are extracted from segmented single slices, containing 2 lung images along with the features mentioned in [ 1 ] and have trained a feed-forward backpropagation neural network. There has also been a renewed interest in the field of deep learning and the latest research in the area of medical imaging using deep learning shows some good results. One such paper is of Suk et al., [ 17 ] in which the authors propose a novel latent and shared feature representation of neuro-imaging data of the brain using Deep Boltzmann Machine (DBM) diagnosis. The methods achieved a maximal diagnostic accuracy of 95.52%. In Riccardi et al. [ 14 ] the authors proposed a new algorithm, which can automatically detect nodules with an overall accuracy of 71%. It used 3D radial transforms. Kumar et al. [ 9 ] proposed to use deep features extracted from an autoencoder along with a binary decision tree as a classifier to build their proposed system for lung cancer classification. Wu et al. [ 19 ] proposed deep feature learning for deformable registration of brain MR images. They demonstrate that a general approach can be built to improve image registration by using deep features. Fakoor et al. [ 6 ] proposed a method to enhance cancer diagnosis and classification from gene expression data using unsupervised and deep learning methods. Their model used PCA (Principal Component Analysis) to achieve dimensionality reduction in case of the very high dimensionality of the initial raw feature space. Chuquicusma et al. [ 4 ] proved in his paper that the GANs are able to generate realistic fake images that fool even experienced radiologists. Maayan et al. [ 7 ] used GANs to augment liver lesion images to improve the multiclass CNN classification. He got an increase from 85.7% and 92.4% sensitivity and specificity which is much higher as compared to recent state-of-the-art liver classification methods. Zhu et al. [ 20 ] showed in his work that Generative Adversarial Networks(GANs) can be used to complement and complete the training data manifold. It can find better margins between classes. They had done their work by using GANs to augment the emotion categories that were lacking in face data and they could achieve a 5% to 10% increase in the accuracy of emotion classification.

In this paper, we propose a convolutional autoencoder unsupervised learning algorithm for lung CT features learning and CALM classifier for pulmonary nodules classification. To tackle the issue of scarcity of medical labeled images, we use a type of Generative Adversarial Networks(GANs) to augment data to the training set. Generative Adversarial Networks(GANs) are a type of neural network where two competing networks - the generator and the discriminator - are adversarially trained against one another. The discriminator is trained to differentiate between real data and generated data while the generator attempts to fool the discriminator by generating synthetic data. More specifically, the generator G samples from a previously known data distribution z ∼ Pz (z) (usually a Gaussian) and generates data G(z) by putting z through a function G. The discriminator D takes in data x and produces a probability that x is a sample from the real data distribution Pdat a (x ). The loss function that the discriminator D maximizes and the generator G minimizes is L=minG maxD Ex ∼Pdata (x )[log D(x )] + Ez∼Pz (z)[log(1 − D(G(z))] While this original GAN is useful for a multitude of tasks, the JensenShannon divergence as loss function inherently struggles to learn probability distributions between low dimensional manifolds in a higher-dimensional space. Wasserstein GANs (WGANs) attempt to solve this problem by using an approximation of the Earth-Mover distance as the loss function, which enables more stable GAN training. The discriminator is now replaced with a critic as its output is no longer a probability; rather, it is a 1-Lipschitz function that tries to maximize the difference in score between the real data and the generated data. A function is 1-Lipschitz if and only if the norm of its gradient everywhere is at most 1. The authors of the WGAN paper enforces that the critic is 1-Lipschitz by weight-clipping, which may lead to optimization difficulties. The new loss function is as follows: L=minG maxD ∈D Ex ∼Pdata (x )[log D(x )] − Ez∼Pz (z)[log D(G(z))]

Where D is the set of 1-lipshitz functions.

3.2

Autoencoder

An autoencoder takes an input x∈ Rd and first maps it to latent representation h∈ Rd′ using a deterministic function of type h = fθ = σ (W x + b) with parameters θ = {W,b}. This “code” is then used to reconstruct the input by a reverse mapping of f: y= fθ ′ (h) = σ (W ′x + b′ ) with θ ′ = {W′ ,b′ }. The two parameter sets are usually constrained to be of form W ′ = W T , using the same weights for encoding the input and decoding the latent representation. Each training pattern xi is then mapped onto its code hi and its reconstruction yi . The parameters are optimized, minimizing an appropriate cost function over the training set Dn = {(x0, t0), ..., (xn, tn )}. 3.3

Denoising Autoencoders(DAE)

Without any additional constraints, conventional autoencoders learn identity mapping. This problem can be circumvented by using a probabilistic RBM(Restricted Boltzmann Machine) approach, or sparse coding, or denoising autoencoders trying to reconstruct noisy inputs. The latter performs as well as or even better than RBMs. Training involves the reconstruction of a clean input from a partially destroyed one. Input x becomes corrupted input x by adding a variable amount v of a noise distributed according to the characteristics of the input image. Common choices include binomial noise(switching pixels on or off) for black and white images or uncorrelated Gaussian noise for color images. Parameter v represents the percentage of permissible corruption. The auto-encoder is trained to denoise the inputs by first ifnding the latent representation h= fθ (x ) = σ (W x + b) from which it reconstructs the original input y= fθ ′ (h) = σ (W ′h + b′ ) 3.4

Convolutional Neural Networks

CNN’s are hierarchical models whose convolutional layers alternate with subsampling layers, reminiscent of simple and complex cells in the primary visual cortex. The network architecture consists of three basic building blocks to be stacked and composed as needed,i.e, the convolution layer, the max-pooling layer, and the classification layer. 3.5

Convolutional Auto Encoder(CAE)

A fully connected autoencoder ignores a 2-D image structure. This is not only a problem when dealing with realistically sized inputs but also introduces redundancy in the parameters, forcing each feature to be global. However, the trend in vision and object recognition adopted by most successful models is to discover localized features that repeat themselves all over the input. CAEs differ from conventional AEs as their weights are shared among all the input, preserving spatial locality. The reconstruction is hence due to a linear combination of basic image patches based on latent code. CAE combines the local convolution connection with the autoencoder, which is a simple operation to add a reconstruction input for the convolution operation. The procedure of the convolutional conversion from feature maps input to output is called convolutional encoder. Then the output values are reconstructed through the inverse convolutional operation, which is called a convolutional decoder. Moreover, the parameters of the encode and decode operation are calculated through standard autoencoder unsupervised greedy training.

Input feature maps x ∈ Rn×l ×l , which are obtained from the input layer or the previous layer. It contains n feature maps, and size of each feature map is l × l pixels. The convolutional autoencoder operation includes m convolutional kernels, and the output layer output m feature maps. When the input feature maps from previous layer, n represents the number of output feature maps from the previous layer. The size of convolutional kernel is d ×d, where d ≤ l . θ ={W,Wˆ , b, bˆ} represents the parameters of convolutional autoencoder layer need to be learned, while b∈ Rm and W={wj ,j=1,2,...,m} represents the parameters of convolutional autoencoder, where wj ∈ Rn×l ×l is defined as a vector wj ∈ Rnl 2 . And Wˆ ={wˆj ,j=1,2,...,m} and bˆ represent the parameters of convolutional decoder, where wˆj ∈ Rnl 2 .

First the input image is encoded that each time a d × d pixels patch xi ,i=1,2,...,p is selected from input image, and then the weight wj of the convolutional kernel j is used for convolutional calculation. Finally the neuron value oi j ,j=1,2,...,m is calculated from the output layer.

oi j = f (xi ) = σ (Wj xi + b) where σ is a nonlinear activation function, often including three functions,i.e, the sigmoid function, the hyperbolic tangent function, and the rectified linear function(Relu). We implemented Relu in this paper.

Then oi j output from the convolutional decode is encoded that xi is reconstructed via oi j for generated xˆi .

xˆi = f ′ (oi j ) = ϕ(Wˆi oi j + bˆ) xˆi is generated after each convolutional encode and decode. P patches are obtained from reconstruction operation of dimension d × d. We use the mean square error between the original patch of input image xi ,(i=1,2,...p) and the reconstructed patch of image xˆi ,(i=1,2,...p) as the cost function. Furthermore, the cost function and reconstruction error is described as: p 1 Õ JCAE (θ ) = p i=1

L[xi , xˆi ]

LCAE [xi , xˆi ] = ||xi − xˆi ||2 = ||xi − ϕ(σ (xi ))||2

Through stochastic gradient descent(SGD), the weight and error are minimized, and the convolutional autoencoder layer is optimized. Finally, the trained parameters are used to output the feature maps which are transmitted to the next layer. 4

METHODOLOGY

For our model, we will be using WGAN with gradient penalty (WGAN-GP), a version of WGAN that replaces weight-clipping with a gradient penalty of the critic - constraining the gradient norm of the critic’s output concerning its input. This allows for more stable GAN training. The optimal WGAN or WGAN-GP critic will contain straight lines with gradient norm 1 connecting coupled points between Pdat a and PG(z); since enforcing the unit gradient norm constraint everywhere is intractable, it is only enforced along these straight lines. The new loss function is as follows:

L=minG maxD ∈λDEExˆz∈∼PPxˆz((xˆz))[[(l|o|∇gxDˆD(G(x(ˆz)|)|]2−−E1x)2∼]Pdata (x )[D(x )] + Where λ is the weight given to the gradient penalty. xˆ ∼ P (xˆ) are random samples that have uniform distribution along straight lines between pairs of points sampled from the real data distribution Pdat a and the generated data distribution PG(z). We hypothesize that generated data can improve lung nodule detection sensitivity, allowing for better training of CAD systems with existing data. We can use the generator to produce new training data to augment the existing training data.

Since the workload for labeling ROI is high and the pulmonary nodules are difcfiult to be recognized, the CT images are divided into small patch areas for training the network. The patch divided from the CT image is input to Convolutional Autoencoder(CAE) for the purpose of learning the feature representation, which is used for classification. The parameters of convolution layers in CNN are determined by autoencoder unsupervised learning, and some data is used for fine-tuning the parameters of the CAE and training the classifier.

The patch divided from the original CT image can be represented as x ∈ X, X ⊂ Rm×d×d , where m represents the number of the input channel, and d × d represents the input image size. The labeled data are represented as y ∈ Y , Y⊂ Rn , where n represents the number of output classification. Through the proposed model, it is expected to deduce the hypothesis function from the training,i.e.,f: X−−→Y and the set of parameters θ .

In the proposed model, the hypothesis function f based on deep learning architecture consists of multiple layers, which is not a direct mapping from X to Y. Specifically, the first layer L1 receives the input image x and the middle layer has three convolution layers and three pooling layers.

Algorithm 1: Unsupervised Training of CAE 1 Given dataset U, number of convolution, pooling layer along with all weight matrices and bias vectors are randomly initialized 2 i←−−1

5 else

6 3 if i==1 then 4 The input of Ci is U

The input of Ci is output of Pi 7 Greedy layer wise training Ci 8 Find parameters of Ci by cost function 9 Output of Ci is input to Pi 10 Max Pooling Operator 11 if i < N then 12 goto line 3

The convolutional autoencoder has the following architecture : • Input: 40 × 40 patch image from CT image • C1: Convolution kernel of size 5 × 5, Number of kernel is 50, non linear function is ReLU. • P1: Max pooling is used, the size of pooling area is 2 × 2 with stride 2. • C2: Convolution kernel of size 3 × 3, Number of kernel is 50, non linear function is ReLU. • P2: Max pooling is used, the size of pooling area is 2 × 2 with stride 2. • C3: Convolution kernel of size 3 × 3, Number of kernel is 50, non linear function is ReLU. • P3: Max pooling is used, the size of pooling area is 2 × 2 with stride 2.

The convolutional autoencoder is trained in an unsupervised manner, which is explained in Algorithm 1 and the parameters are optimized through SGD. A mini-batch size of 100 samples and 150 iterations for each batch is used.

The output from the last pooling layer is fed as input to the CALM classifier, which is explained in 5. 5 Input augmentation We consider a matrix of input data D and a set of cluster centers C. Since in this case study, there are probabilities of the nodule being either malignant or not, we keep C as 2. In this paper, we use clustering to augment input data x ∈ D for better learning. To augment the input data, we add a new set of features representing either an input example belongs to a cluster or not. To distinguish input examples, we introduce an additional index h ∈ {1, . . . , |D |} representing the number of an input example (x1 is the first input example of D). We define also a vector ch composed of chl , l ∈ C for each example xh ∈ D. It is a one-hot representation containing zeros except for the index of the cluster it belongs to (e.g. c1 = [ 0, 1 ] means that the first input example x1 belongs to the 2nd cluster out of 2 clusters). Finally, we augment input examples by concatenating the vector xh with the vector ch for each h ∈ {1, . . . , |D |}. Cluster centers To determine the cluster centers, CALM consists of a clustering model and a Feed-Forward Neural Net(FNN) having a softmax output to classify the lung nodules. For the clustering model, we propose to use a Random Forest classifier to determine cluster centers. After the FNN is trained using a state-of-the-art solver for data belonging to a single cluster ∈ {1, . . . , |C |}, a Random Forest Classifier is used to find the best cluster center. Hence we repeat |C | instances of training the FNN to find the |C | centers. For any instance l of the model, we use one hot encoded vector of l as labels for all the input sample in that cluster to train the random classifier in a supervised manner. In simple words, while predicting center of 2nd cluster (for example) we use [ 0, 1 ] as label for all input sample in that cluster, since |C | is 2. We propose that the input sample which has the lowest error in predicting its cluster label is considered as the center of that cluster in the subsequent iteration of the proposed approach. In such a manner, the center would be the input sample which is the most fitting representative of that cluster. As a result, the clustering process would aggregate the data having similar characteristics resulting in better learning by the FNN model. We include the following additional constraints: ∀a ∈ {1, . . . , |C |}, l , a (1) (2) 5.2

Clustering Problem

We have a distance/dissimilarity measure dil between input examples i ∈ D and cluster centers l ∈ C. The clustering problem aims to assign each input example to a cluster such that the total distance between the elements of a cluster and its center is minimized. We introduce a new set of binary variables cil that is equal to 1 if input example i ∈ D belongs to the cluster whose center is l ∈ C, and 0 otherwise. The clustering problem is formulated as follows: min Õ Õ dil cil (3) i∈D l ∈C s.t. Õ cil = 1, ∀i ∈ D And cil ∈ {0, 1}, ∀i ∈ D, ∀l ∈ C (4) l ∈C The objective function (3) minimizes the total distance between a cluster center and its elements. Constraints (4) ensure that each element is assigned to exactly one cluster and that the decision variables are binary.

In this paper, we also propose a novel dissimilarity measure based on the weights of the trained FNN model. It uses the average of weights linked to each neuron of the input layer. Assuming that the original input (without the new clustering feature) has d dimensions (xh = [xh1 , . . . , xhd ], h ∈ {1, . . . , |D |}) and the weight linking node n of the input layer to node j ∈ {1, . . . , n1} of the following layer is wnj , the two distances measures are formulated as follows: Í dil = avg wnj |xik − xlk |

n ∈ {1...d } j ∈ {1,...,n1 } Thus the distance measure computes the distance between two examples based on how important is the contribution of each input feature to the resulting prediction. Therefore, the resulting clusters contain examples with similar potential to improve the classification results. 5.3

Proposed Algorithm

As in Fig. We propose an approach (Algorithm 2) where we iteratively train the FNN classifier, use its weights for input data clustering thus changing the input vector, train again the FNN classiifer using the new input data, and so on until a stopping criterion is attained. The stopping criterion is triggered if the cluster assignment remains the same for consecutive 10 iterations, i.e., the clustering problem converges.

The configuration of the proposed model is given as:

A) Classification Model : FC1 −−→ Leaky ReLU −−→ FC2 −−→

Leaky ReLU −−→FC3 −−→ Softmax . Dimension of FC1: 128.

Dimension of FC2: 32. Dimension of FC3: 2.

B) Optimizer: ADAM Learning Rate 0.001, momentum rate 0.9, weight decay(L2 regularization):1e-4. 6

DATASET

The Lung Image Database Consortium (LIDC) has made a database publically available that contains thoracic CT images of 1010 patients of lung cancers, and each scan has been annotated by up to 4 Algorithm 2: Clustering-augmented learning method

Step 0: Data obtained after extracting information using Convolutional Autoencoder(CAE) acts as input to CALM. Step 1: Initialization of the cluster centers u1, ...u |C | randomly. Clustering of the output data obtained from Convolutional Autoencoder(CAE) and augmenting each data sample with its one-hot encoded cluster label.

Step 2: Training the FNN classifier & clustering model

foreach l ∈ {1 . . . |C |} do

Train the FNN model on data belonging to cluster l to learn classification.

For supervised training of the random forest classifier we use one hot encoded representation of clusters as labels.

Running the clustering model gives the cluster center ul .

Step 3: Clustering

Update dissimilarity matrix using W ∗ if stopping criterion is attained then Stop.

else go to Step 2. radiologists on semantic characteristics and malignancy.The ratings were obtained by performing the biopsy, surgical resection, progression or reviewing the radiological images to show 2 years of nodule state at two levels; first at the patient level and second diagnosis at the second level. The LIDC database of thoracic CT studies for 1010 patients was acquired over a long period with various scanners.

We excluded nodules with outliers in x, y or z dimensions. Outliers are defined as values more than 1.5 times the interquartile range above the third quartile. We also excluded scans with slice thickness greater than 2.5 mm. This left 666 CT scans for training and 86 CT scans for evaluation. To reduce noise in our training data, we also exclude nodules by less than 3 radiologists.

The LIDC dataset also provides information and coordinates on each nodule. We chose an input size of 40 × 40 since that is large enough to fully contain the largest nodules. Classic data augmentation was performed on the positive examples: translations of up to 10 pixels in the XY plane are added to the positive training set. Negative data is defined as inputs that did not contain nodules agreed on by any radiologists. The final input data has 5422 image labels of size 40px × 40px . For comparison, the size of a whole CT scan is 512px × 512px × N slices, where N corresponds to the number of slices, ranging from [65,764] for different CT scans. The training and evaluation sets are randomly partitioned following proportion 8:2. Precisely, there are [0.8×5422]=4338 initial positive training examples, and since we want our initial training data to be balanced, we also take 4338 initial negative training examples of a practically infinite number available. In total, the initial training data consists of 8676(4338 positive + 4338 negative) training examples and 2168(1084 positive + 1084 negative) validation examples.

In this paper, to improve lung nodule detection in existing CADe systems, we augment training data-sets with generated images obtained using Generative Adversarial Network (GAN). We used an augmentation rate of 50% while using GANs. Since the original number of training samples is 4338 positive + 4338 negative= 8676, so the number of augmented data added is [4338*0.5]=2169 positive and [4338*0.5]=2169 negative samples. Since negative training volumes are easy to obtain, the WGAN-GP is trained on all of the positive training examples so that it will generate positive data. The convolutional neural network for learning lung nodule image feature is similar to common image feature learning. Both CNN and conventional learning use the labeled dataset, and learn the network parameters between each layer from the input layer to the output layer by use of forwarding and backward propagation methods. We compare the classification performance of the proposed model, autoencoder(AE)[ 9 ], convolutional neural network(CNN) with the same dataset. Results are shown in Table(1) and Receiver Operating Characteristics Curve(ROC) is shown in Fig.6. To justify the contribution of the CALM classifier, we also compare the results by using traditional classifiers such as logistic regression, linear kernel support vector machine on the features obtained from the last pooling layer of the convolutional autoencoder. Moreover, Fig.4 shows how the intra-cluster variance decreases after approximately 75 iterations and then stabilizes. To measure intra-cluster variance, we used Euclidean distance in this case study. Similarly, it is evident from Fig. 5 that testing loss starts decreasing after 80 epochs and gradually as the clustering solution converges the accuracy begins to improve. This observation bolsters our initial assumption that clustering data based on inherent characteristics would improve the learning process of FNN.

The accuracy, precision, recall, F1, and AUC of the proposed method are 95.3%, 94.9%, 95%, 95% and 0.97 respectively. For AE (Autoencoder) method, we train the neural net in an unsupervised manner and test on the same dataset for classification. We use 1024 neurons in the fully connected layer in the AE method. We have also compared the proposed model with accuracy obtained by previous literature. Comparison is shown in Table(2).

[1]

A. A.

Abdullah and

S. M.

Shaharum . 2012 . Lung cancer cell classification method using artificial neural network . Information Engineering Letters 2 , 1 ( 2012 ), 49 - 59 .

[2]

Aliferis ,

Tsamardinos ,

Massion ,

Fanananpazir ,

Hardin ,

Statnikov ,

Fananapazir , and

Hardin . 2003 . Machine learning models for classification of lung cancer and selection of genomic markers using array gene expression data . In FLAIRS.

[3]

Min

Chen , Xiaobo Shi, Yin Zhang, Di Wu, and

Guizani

Mohsen . 2017 . Deep Feature Learning for Medical Image Analysis with Convolutional Autoencoder Neural Network . IEEE transactions on Big Data ( 2017 ).

[4] M. J. M. Chuquicusma , S.

Hussein , J.

Burt , and U.

Bagci . 2018 . How to fool radiologists with generative adversarial networks? a visual turing test for lung cancer diagnosis . In 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018 ). IEEE, 240 - 244 .

[5]

Carsten

Eickhoff , Yubin Kim,

and Ryen

White . 2020 . Overview of the Health Search and Data Mining (HSDM 2020 ) Workshop . In Proceedings of the Thirteenth ACM International Conference on Web Search and Data Mining (WSDM '20) . ACM, New York, NY, USA. https://doi.org/10.1145/3336191.3371879

[6]

Fakoor ,

Ladhak ,

Nazi , and

Huber . 2013 . Using deep learning to enhance cancer diagnosis and classification . In ICML Workshop on the Role of Machine Learning in Transforming Healthcare (WHEALTH) . ICML, 4493 - 4498 .

[7]

Frid-Adar ,

Diamant , E. Klang,

Amitai ,

Goldberger , and

Greenspan . 2018 . Gan-based synthetic medical image augmentation for increased cnn performance in liver lesion classification . arXiv: 1803 . 01229 ( 2018 ).

[8]

Krewer ,

Geiger , and

L. O.

Hall . 2013 . Effectoftexturefeatures in computer aided diagnosis of pulmonary nodules in low-dose computed tomography . In IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, 3887 - 3891 .

[9]

Kumar ,

Wong , and

D.A

Clausi . 2015 . Lung Nodule Classification using deep features in CT images . In 12th IEEE Conference on Computer and Robot Vision (CRV). IEEE, 133 - 138 .

[10]

Kuruvilla and

Gunavathi . 2014 . Lung cancer classification using neural networks for ct images . Computer methods and programs in biomedicine 113 , 1 ( 2014 ), 202 - 209 .

[11]

Nanni ,

Brahnam , and

Lumini . 2012 . Combining multiple approaches for gene microarray classification . Bioinformatics ( 2012 ), 1151 - 1157 .

[12]

L. B.

Nascimento , A. C. de Paiva , and

A. C.

Silva . 2012 . Lung nodules classification in CT images using Shannon and Simpson diversity indices and SVM . In Machine Learning and Data Mining in Pattern Recognition . 454 - 466 .

[13]

Ramaswamy , P Tamayo , R Rifkin,

Mukherjee ,

Yeang ,

Angelo ,

Ladd , M Reich,

Latulippe ,

J.P

Mesirov ,

Poggio ,

Gerald ,

Loda , E.S Lander , , and

T.R

Golub . 2001 . Multiclass cancer diagnosis using tumor gene expression signatures . In National Academy of Sciences of the United States of America.

[14]

Riccardi ,

T. S.

Petkov , G. Ferri,

Masotti , and

Campanini . 2011 . Computeraided detection of lung nodules via 3d fast radial transform, scale space representation, and zernike mip classification . Medical physics 38 , 4 ( 2011 ), 1962 - 1971 .

[15]

Sharma ,

Imoto , and

Miyano . 2012 . A top-r feature selection algorithm for microarray gene expression data . IEEE/ACM Trans. Comput. Biol. Bioinformatics 9 ( 2012 ), 754 - 764 .

[16] G. L. F. da Silva ,

A. C.

Silva , A. C. de Paiva , and M. Gattass . [n.d.]. Lung nodules classification in CT images using Shannon and Simpson diversity indices and SVM . In Machine Learning and Data Mining in Pattern Recognition.

[17]

H. I.

Suk ,

S. W.

Lee ,

Shen , and

A. D. N.

Initiative . 2014 . Hierarchical feature representation and multimodal fusion with deep learning for ad/mci diagnosis . NeuroImage 101 ( 2014 ), 569 - 582 .

[18]

Wang ,

I.V

Tetko ,

M.A

Hall ,

Frank ,

Facius , K.F. X Mayer , and H.W Mewes . 2005 . Gene selection from microarray data for cancer classification-a machine learning approach . Comput. Biol. Chem . 29 , 1 ( 2005 ), 37 - 46 .

[19]

Wu ,

Kim ,

Wang ,

Gao ,

Liao , and

Shen . 2013 . Unsupervised deep feature learning for deformable registration of mr brain images . In Medical Image Computing and Computer-Assisted

Intervention-MICCAI

2013 . SPRINGER, 649 - 656 .

[20]

Zhu ,

Liu ,

Li ,

Wan , and

Qin . 2018 . Emotion classification with data augmentation using generative adversarial networks . In Pacific-Asia Conference on Knowledge Discovery and Data Mining . SPRINGER, 349 - 360 .

[21]

Zinovev ,

Feigenbaum ,

Furst , and

Raicu . 2011 . Probabilistic lung nodule classification with belief decision trees . In Engineering in Medicine and Biology Society , EMBC. IEEE, 4493 - 4498 .