Chest X-Ray based Lung Disease Detection using Convolutional
Neural Network
Saravanan. M a, Manoj Kumar a, Dilsha Vijay a, Gayathri. K a and Jenifar. A a
a
    KPR Institute of Engineering and Technology, Coimbatore, India


                 Abstract
                 Humans suffer from a variety of health problems associated with their chests. There are several
                 diseases associated with continual cardiomegaly, emphysema, fibrosis, pneumothorax, infiltration
                 and other lung sickness. Diagnosing chest conditions as soon as possible is essential. As there are
                 many methods, we analyze the problem of medical data scarcity in this paper using a set of
                 datasets for detect and classify the lung diseases from chest radiograph images. We implemented
                 convolutional neural networks methods to train the images. We collected the data set manually
                 from different websites for 13 diseases with a set of nearly 1000 images. It helps the person to
                 identify the diseases individually without the help of an expert. We got an accuracy of 97% using
                 this algorithm, each disease accuracy is recorded individually.

                 Keywords 1
                 Lung disease detection, deep learning, CNN, Neural network, pre processing techniques

1. Introduction
    There are various respiratory diseases that can affect the lungs. One of these is pneumonia, which kills
about 1.6 million people annually. In addition to that tuberculosis, pneumothorax and countless others are
a threat to human beings. It is estimated that lung diseases are responsible for the deaths of around 3
million people annually. Traditionally, an individual can be diagnosed with lung disease through various
tests, such as a blood test and a chest X-ray examination. Pleural effusions (PE) are fluid buildups in the
pleural cavity that are frequently a sign of a more serious illness such heart problems, pneumonia, or
colon cancers. They've also been discovered to be prognostic indications, such as in the case of acute
pancreatitis. Pneumothorax is a pleural illness that causes air to collect in the pleural space. Because air is
less thick than lung parenchyma, the pneumothorax region will take on the structure of the lungs and lung
cavity, occupying the upper portions of the lungs. Pulmonary fibrosis is a lung condition caused by
scarring and damage to lung tissue. It's more difficult for your lungs to perform properly because of this
thicker, rigid tissue. As your pulmonary fibrosis progresses, you will become increasingly breathless.
When your airways or the little sacs at the end of them don't expand as they ought to when you breathe,
you get atelectasis. A lung nodule is a tiny irregular spot that can be discovered during a chest CT scan.
These scans are performed for a variety of purposes, including lung cancer screening and checking the
lungs if you have symptoms. The majority of lung nodules detected on CT scans are not cancerous. They
are more commonly caused by previous infections, scar tissue, or other factors. Cardiomegaly is a term
used to describe the expansion of the heart, which is usually caused by a cardiac problem. Cardiomegaly
can be caused by a number of disorders that impact how the heart works, including high blood pressure,


CVMLH-2022: Workshop on Computer Vision and Machine Learning for Healthcare, April 22 – 24, 2022, Chennai, India.
EMAIL: saravanan.m@kpriet.ac.in (Saravanan M)
ORCID: 0000-0002-2200-9159 (Saravanan M)
            ©️ 2022 Copyright for this paper by its authors.
            Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
            CEUR Workshop Proceedings (CEUR-WS.org)


                                                                                     26
diabetes, and obesity. Shortness of breath is a symptom of emphysema, a lung disease. Persons who
suffer from emphysema have compromised air sacs in the lungs. Over time, the inner sacs in the lungs
weaken and tear, resulting in larger air spaces rather than small individual ones. Emphysema can go
undetected for many years. Pleural thickening is a chronic condition wherein scar tissue thickens the
pleural lung tissue, commonly known as pleura. For doctors, classifying chest X-ray abnormalities of
these many kinds of lung diseases is a time-consuming operation; as a result, various algorithms have
been proposed to effectively accomplish this work. Over the years, computer-aided diagnostic tools have
been developed to capture significant information of X-rays to assist doctors in acquiring a thorough
knowledge of the X-ray. On the other hand, such CAD system may not have reached a considerable
degree of importance for making diagnoses in X-rays. As a result, their role has been confined to
providing visualizing functionality to clinicians to aid in decision-making.

   Patterns must always be identified in order to diagnose or categorize things. However, finding these
connections can be difficult if the dataset we have is just too vast. Furthermore, since obtained data is
rarely linear, conventional methods cannot be used to discover patterns or develop models. Many
effective machine learning algorithms recently emerged, and deep learning techniques now have a low
error rate. The images utilized to train this model were atelectasis, cardiomegaly, effusion, infiltration,
nodule, pneumonia, pneumothorax, emphysema, fibrosis, pleural thickening, and no finding. This
research presents a typology of practical applications for lung disorders as well as a market analysis on
the subject. The remaining difficulties are also discussed, as well as prospective future directions.

2. Existing Work and System
   The Existing System uses the K-NN algorithm and CNN but in most cases, they are using the CT scan
images and detection of the early stages of the lung diseases. The existing System has its own advantages
and disadvantages but most important disadvantages is that they are not trained enough to classify the real
time images. To overcome this, we can use the deep learning techniques to increase the accuracy of the
model to produce more precise output even when we use the real time dataset.


Figure 1: Architecture of existing system


                                                    27
3. Proposed System
3.1.    Data Collection


Figure 2: Sample Dataset

   Data for the project was manually gathered from a variety of sources and cross-referenced with
publicly available information. Because the initiative is centered on classifying different diseases. The
datasets are sorted into folders and then trained separately. There are 900 images in all. Fig 1 shows the
sample data. Fig 2 represents all the diseases and the amount of images taken for each category.


Figure 3: Dataset Visualization


                                                   28
3.2.    Pre-processing
   By pre-processing the data, meaningful insights can be extracted from the data, thus improving the
quality of the data. In Machine Learning, pre-processing refers to the process of preparing (cleaning and
organizing) raw data for building and training Machine Learning algorithms. Here the data is processed in
four steps. They are

     • Data quality assessment
    It is possible to receive data in a variety of formats when you collect data from different sources. you
are likely to receive information in a variety of formats. For example, if we are collecting images in
different websites then we need to change every image into single format.

     • Data cleaning
    As we have collected data from different sources, we have to remove unwanted information and and
irrelevant data. It helps the data to run efficiently without any errors.

    • Data transformation
   We have already begun cleaning data; the data transformation will start changing the data into the
proper format we have to download and use in other formats.

   • Data reduction
   As we are handling more data’s, even after cleaning and changing it. We have enough data set than we
need it. Data reduction makes the analysis more easier and most accurate.


Figure 4: Data Processing Methods

3.3.    Algorithm and Model
   A classification algorithm is a quantitative process of mapping input data to a certain category using a
classifier. Classifiers come in a variety of forms. One of them is Convolutional Neural Network. A
convolution is a quantitative process that transforms one function into another and calculates the
cumulative of their integer combination. It is intimately linked to the Laplace and Fourier transforms.
Cross-volution’s work in a similar fashion to convolutional layers. The first layer of a CNN is crucial
since it connects the input image to the first layer's receptive fields. CNNs are the most widely used deep
learning algorithm, and they are made up of brains with adaptable prejudices and parameters. Several
inputs are received by each node. The sum of the inputs is then calculated. The sum is then fed into a
                                                    29
convolution operation, which generates an output. CNN differs from other neural networks because it
includes several convolutional layers. When training, CNNs usually have two elements: feature extraction
and classification. Convolution is applied to the input using a kernel during the feature extraction stage.
Following that, a feature map is created. During the classification stage, the CNN calculates the likelihood
that the image parts to a given class or label.


Figure 5: CNN model

   The image has been converted to grayscale. After that, noise removal and contrast enhancement are
completed to generate enhanced photos. CNN divides it into two categories: no findings and other
labelled diseased lungs, and so it identifies lung diseases. The X-rays' small characteristics serve as a
template for feeding the classifier. The part of the sickness that has been recognised is depicted in the
diagram.


Figure 6: Disease identified in images

   The dataset is first sufficiently separated into the train and test groups. For the purpose of visualizing
how data is classified the python library matplotlib is used to visualize data. Pre-processing of data is a
technique in data mining for converting unprocessed data into a consumable and practical format. There
may be various insignificant and missing sections in the data. Information cleansing has been completed
in order to deal with this section. Using the feature extraction methods, we can create new aspects that are
a quadratic mixture of current features. The training method entails retrieving features from an image,
which is repeated over numerous epochs. At least 10 photos must be processed for each epoch. As a
result, the system may intelligently forecast a disease based on the labels. The training algorithm is CNN
[Convolutional Neural Network] and the language used for creating the model is Python. This is a binary
classification concept that requires extracting structural and physiological information from photos and

                                                     30
masks. The characteristics are linear and quantitative, although they can be divided into groups. We
created a user interface using vanilla Javascript for easy access after training and testing the data.

4. Results and Comparison
    Result of this model is to classify the given input as nofinding or the prediction of specific lung
disease like      cardiomegaly ,emphysema, fibrosis, pneumothorax, infiltration, nodule, effusion,
pneumonia, pleural thinking ,atelectasis . The output is the prediction of the different lung diseases.The
accuracy of the CNN model is 97 Percentage. We can also Predict the lung disease using the real time
data so that it will be very useful for the civilians to know the lung diseases based on the X-Ray images
itself.


Figure 7: Architecture of the Proposed System


5. Conclusion
   In this paper, we present a method for detecting lung disease from lung X-ray pictures. We developed
a lung disease categorization system based on deep learning algorithms and evaluated it on modest lung
image datasets. We want to show that using a deep learning algorithm will help us acquire more precise

                                                   31
results. As a result, we've obtained a great level of accuracy. With the right feature selection strategy and
unified methodology, this can be predicted.

6. References
[1] 16.Kingsley Kuan, Mathieu Ravaut, Gaurav Manek, Huiling Chen, Jie Lin,Babar Nazir, Cen Chen,
     Tse Chiang Howe, Zeng Zeng, Vijay Chandrasekhar. Deep Learning for Lung Cancer Detection:
     Tackling the Kaggle Data Science Bowl 2017 Challenge.
[2] Campos HS, Lemos ACM. Asthma and COPD according to the pulmonologist. Brazilian Journal of
     Pulmonology. 2009; 35(4):301-9.
[3] Roth HR, Lu L, Seff A, Cherry KM, Hoffman J, Wang S, Liu J, Turkbey E, Summers RM. A new
     2.5 d representation for lymph node detection using random sets of deep convolutional neural
     network observations. Lecture Notes in Computer Science. 2014; 8673: 520–527.
[4] Coudray N, Ocampo PS, Sakellaropoulos T, et al. Classification and mutation prediction from non–
     small cell lung cancer histopathology images using deep learning. Nat Med. 2018; 24:1559–1567.
     doi:10.1038/s41591-018-0177-5
[5] P. Pattrapisetwong and W. Chiracharit, “Automatic lung segmentation in chest radiographs using
     shadow filter and multilevel thresholding,” in Proceedings of 2016 IEEE Conference on
     Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), Manchester,
     UK, 2016.
[6] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional
     neural networks,” in Proceedings of Advances in Neural Information Processing Systems, pp. 1097–
     1105, Lake Tahoe, Nevada, USA, 2012.
[7] D. Manos, J. M. Seely, J. Taylor, J. Borgaonkar,H. C. Roberts, and J. R.Mayo, “The lung reporting
     and data system (LU-RADS): a proposal for computed tomography screening,”Canadian Association
     of Radiologists Journal, vol. 65,pp. 121–134, 2014.
[8] Hussein, S., Cao, K., Song, Q., Bagci, U.: Risk Stratification of Lung Nodules Using 3D CNN-Based
     Multi-task Learning. In: International Conference on Information Processing in Medical Imaging.
[9] Arnaud A. A. Setio, Francesco Ciompi, Geert Litjens, Paul Gerke, Colin Jacobs, Sarah J. van Riel,
     Mathi”Pulmonary nodule detection in CT images:false positive reduction using multi-view
     convolutional networks”
[10] Wang, L.; Lin, Z.Q.; Wong, A. COVID-Net: A Tailored Deep Convolutional Neural Network
     Design for Detection of COVID-19 Cases from Chest X-Ray Images.
[11] Melendez, J.; Ginneken, B.V.; Maduskar, P.; Philipsen, R.H.H.M.; Reither, K.; Breuninger, M.;
     Adetifa, I.M.O.; Maane, R.; Ayles, H.; Sánchez, C.I. A Novel Multiple-Instance Learning-Based
     Approach to Computer-Aided Detection of Tuberculosis on Chest X-Rays. IEEE Trans. Med.
[12] Angeline R, Mrithika M, Raman A, Warrier P. Pneumonia detection and classification using chest X-
     ray images with convolutional neural network. In: Smys S, Iliyasu AM, Bestak R, Shi F, editors.
     New trends in computational vision and bio-inspired computing. ICCVBIC. Cham: Springer; 2020
[13] Ge Z, Mahapatra D, Chang X, Chen Z, Chi L, Lu H. Improving multi-label chest X-ray disease
     diagnosis by exploiting disease and health labels dependencies. Multimed Tools Appl. 2019
[14] Justin Johnson Andrej Karpathy Li Fei-Fei,DenseCap:Fully Convolutional Localization Networks
     for Dense Captioning, IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2016
[15] Andrew Ward, Nicholas Bambos. Quantum Annealing Assisted Deep Learning for Lung Cancer
     Detection. http://cs231n.stanford.edu/reports/2017/pdfs/534.pdf
[16] Alcantud, J.C.R., Varela, G., Santos-Buitrago, B., Santos-Garcia, G., Jimenez, M.F.: Analysis of
     survival for lung cancer resections cases with fuzzy and soft set theory in surgical decision making.
     PLoS ONE 14(6), e0218283 (2019)


                                                     32