<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Saravanan. M a, Manoj Kumar a, Dilsha Vijay a, Gayathri. K a and Jenifar. A a</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>KPR Institute of Engineering and Technology</institution>
          ,
          <addr-line>Coimbatore</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <fpage>26</fpage>
      <lpage>32</lpage>
      <abstract>
        <p>Humans suffer from a variety of health problems associated with their chests. There are several diseases associated with continual cardiomegaly, emphysema, fibrosis, pneumothorax, infiltration and other lung sickness. Diagnosing chest conditions as soon as possible is essential. As there are many methods, we analyze the problem of medical data scarcity in this paper using a set of datasets for detect and classify the lung diseases from chest radiograph images. We implemented convolutional neural networks methods to train the images. We collected the data set manually from different websites for 13 diseases with a set of nearly 1000 images. It helps the person to identify the diseases individually without the help of an expert. We got an accuracy of 97% using this algorithm, each disease accuracy is recorded individually. Lung disease detection, deep learning, CNN, Neural network, pre processing techniques</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>There are various respiratory diseases that can affect the lungs. One of these is pneumonia, which kills
about 1.6 million people annually. In addition to that tuberculosis, pneumothorax and countless others are
a threat to human beings. It is estimated that lung diseases are responsible for the deaths of around 3
million people annually. Traditionally, an individual can be diagnosed with lung disease through various
tests, such as a blood test and a chest X-ray examination. Pleural effusions (PE) are fluid buildups in the
pleural cavity that are frequently a sign of a more serious illness such heart problems, pneumonia, or
colon cancers. They've also been discovered to be prognostic indications, such as in the case of acute
pancreatitis. Pneumothorax is a pleural illness that causes air to collect in the pleural space. Because air is
less thick than lung parenchyma, the pneumothorax region will take on the structure of the lungs and lung
cavity, occupying the upper portions of the lungs. Pulmonary fibrosis is a lung condition caused by
scarring and damage to lung tissue. It's more difficult for your lungs to perform properly because of this
thicker, rigid tissue. As your pulmonary fibrosis progresses, you will become increasingly breathless.
When your airways or the little sacs at the end of them don't expand as they ought to when you breathe,
you get atelectasis. A lung nodule is a tiny irregular spot that can be discovered during a chest CT scan.
These scans are performed for a variety of purposes, including lung cancer screening and checking the
lungs if you have symptoms. The majority of lung nodules detected on CT scans are not cancerous. They
are more commonly caused by previous infections, scar tissue, or other factors. Cardiomegaly is a term
used to describe the expansion of the heart, which is usually caused by a cardiac problem. Cardiomegaly
can be caused by a number of disorders that impact how the heart works, including high blood pressure,</p>
      <p>2022 Copyright for this paper by its authors.
diabetes, and obesity. Shortness of breath is a symptom of emphysema, a lung disease. Persons who
suffer from emphysema have compromised air sacs in the lungs. Over time, the inner sacs in the lungs
weaken and tear, resulting in larger air spaces rather than small individual ones. Emphysema can go
undetected for many years. Pleural thickening is a chronic condition wherein scar tissue thickens the
pleural lung tissue, commonly known as pleura. For doctors, classifying chest X-ray abnormalities of
these many kinds of lung diseases is a time-consuming operation; as a result, various algorithms have
been proposed to effectively accomplish this work. Over the years, computer-aided diagnostic tools have
been developed to capture significant information of X-rays to assist doctors in acquiring a thorough
knowledge of the X-ray. On the other hand, such CAD system may not have reached a considerable
degree of importance for making diagnoses in X-rays. As a result, their role has been confined to
providing visualizing functionality to clinicians to aid in decision-making.</p>
      <p>Patterns must always be identified in order to diagnose or categorize things. However, finding these
connections can be difficult if the dataset we have is just too vast. Furthermore, since obtained data is
rarely linear, conventional methods cannot be used to discover patterns or develop models. Many
effective machine learning algorithms recently emerged, and deep learning techniques now have a low
error rate. The images utilized to train this model were atelectasis, cardiomegaly, effusion, infiltration,
nodule, pneumonia, pneumothorax, emphysema, fibrosis, pleural thickening, and no finding. This
research presents a typology of practical applications for lung disorders as well as a market analysis on
the subject. The remaining difficulties are also discussed, as well as prospective future directions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Existing Work and System</title>
      <p>The Existing System uses the K-NN algorithm and CNN but in most cases, they are using the CT scan
images and detection of the early stages of the lung diseases. The existing System has its own advantages
and disadvantages but most important disadvantages is that they are not trained enough to classify the real
time images. To overcome this, we can use the deep learning techniques to increase the accuracy of the
model to produce more precise output even when we use the real time dataset.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Proposed System 3.1.</title>
    </sec>
    <sec id="sec-4">
      <title>Data Collection</title>
      <p>Data for the project was manually gathered from a variety of sources and cross-referenced with
publicly available information. Because the initiative is centered on classifying different diseases. The
datasets are sorted into folders and then trained separately. There are 900 images in all. Fig 1 shows the
sample data. Fig 2 represents all the diseases and the amount of images taken for each category.
3.2.</p>
    </sec>
    <sec id="sec-5">
      <title>Pre-processing</title>
      <p>By pre-processing the data, meaningful insights can be extracted from the data, thus improving the
quality of the data. In Machine Learning, pre-processing refers to the process of preparing (cleaning and
organizing) raw data for building and training Machine Learning algorithms. Here the data is processed in
four steps. They are
• Data quality assessment</p>
      <p>It is possible to receive data in a variety of formats when you collect data from different sources. you
are likely to receive information in a variety of formats. For example, if we are collecting images in
different websites then we need to change every image into single format.</p>
      <p>• Data cleaning</p>
      <p>As we have collected data from different sources, we have to remove unwanted information and and
irrelevant data. It helps the data to run efficiently without any errors.</p>
      <p>• Data transformation</p>
      <p>We have already begun cleaning data; the data transformation will start changing the data into the
proper format we have to download and use in other formats.</p>
      <p>• Data reduction</p>
      <p>As we are handling more data’s, even after cleaning and changing it. We have enough data set than we
need it. Data reduction makes the analysis more easier and most accurate.</p>
      <p>A classification algorithm is a quantitative process of mapping input data to a certain category using a
classifier. Classifiers come in a variety of forms. One of them is Convolutional Neural Network. A
convolution is a quantitative process that transforms one function into another and calculates the
cumulative of their integer combination. It is intimately linked to the Laplace and Fourier transforms.
Cross-volution’s work in a similar fashion to convolutional layers. The first layer of a CNN is crucial
since it connects the input image to the first layer's receptive fields. CNNs are the most widely used deep
learning algorithm, and they are made up of brains with adaptable prejudices and parameters. Several
inputs are received by each node. The sum of the inputs is then calculated. The sum is then fed into a
convolution operation, which generates an output. CNN differs from other neural networks because it
includes several convolutional layers. When training, CNNs usually have two elements: feature extraction
and classification. Convolution is applied to the input using a kernel during the feature extraction stage.
Following that, a feature map is created. During the classification stage, the CNN calculates the likelihood
that the image parts to a given class or label.</p>
      <p>The image has been converted to grayscale. After that, noise removal and contrast enhancement are
completed to generate enhanced photos. CNN divides it into two categories: no findings and other
labelled diseased lungs, and so it identifies lung diseases. The X-rays' small characteristics serve as a
template for feeding the classifier. The part of the sickness that has been recognised is depicted in the
diagram.</p>
      <p>The dataset is first sufficiently separated into the train and test groups. For the purpose of visualizing
how data is classified the python library matplotlib is used to visualize data. Pre-processing of data is a
technique in data mining for converting unprocessed data into a consumable and practical format. There
may be various insignificant and missing sections in the data. Information cleansing has been completed
in order to deal with this section. Using the feature extraction methods, we can create new aspects that are
a quadratic mixture of current features. The training method entails retrieving features from an image,
which is repeated over numerous epochs. At least 10 photos must be processed for each epoch. As a
result, the system may intelligently forecast a disease based on the labels. The training algorithm is CNN
[Convolutional Neural Network] and the language used for creating the model is Python. This is a binary
classification concept that requires extracting structural and physiological information from photos and
masks. The characteristics are linear and quantitative, although they can be divided into groups. We
created a user interface using vanilla Javascript for easy access after training and testing the data.</p>
    </sec>
    <sec id="sec-6">
      <title>4. Results and Comparison</title>
      <p>Result of this model is to classify the given input as nofinding or the prediction of specific lung
disease like cardiomegaly ,emphysema, fibrosis, pneumothorax, infiltration, nodule, effusion,
pneumonia, pleural thinking ,atelectasis . The output is the prediction of the different lung diseases.The
accuracy of the CNN model is 97 Percentage. We can also Predict the lung disease using the real time
data so that it will be very useful for the civilians to know the lung diseases based on the X-Ray images
itself.</p>
    </sec>
    <sec id="sec-7">
      <title>5. Conclusion</title>
      <p>In this paper, we present a method for detecting lung disease from lung X-ray pictures. We developed
a lung disease categorization system based on deep learning algorithms and evaluated it on modest lung
image datasets. We want to show that using a deep learning algorithm will help us acquire more precise
results. As a result, we've obtained a great level of accuracy. With the right feature selection strategy and
unified methodology, this can be predicted.</p>
    </sec>
    <sec id="sec-8">
      <title>6. References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <fpage>16</fpage>
          .
          <string-name>
            <surname>Kingsley</surname>
            <given-names>Kuan</given-names>
          </string-name>
          , Mathieu Ravaut, Gaurav Manek, Huiling Chen, Jie Lin,Babar Nazir, Cen Chen, Tse Chiang Howe, Zeng Zeng,
          <string-name>
            <given-names>Vijay</given-names>
            <surname>Chandrasekhar</surname>
          </string-name>
          .
          <article-title>Deep Learning for Lung Cancer Detection: Tackling the Kaggle Data Science Bowl 2017 Challenge</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Campos</surname>
            <given-names>HS</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lemos</surname>
            <given-names>ACM</given-names>
          </string-name>
          .
          <article-title>Asthma and COPD according to the pulmonologist</article-title>
          .
          <source>Brazilian Journal of Pulmonology</source>
          .
          <year>2009</year>
          ;
          <volume>35</volume>
          (
          <issue>4</issue>
          ):
          <fpage>301</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Roth</surname>
            <given-names>HR</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seff</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cherry</surname>
            <given-names>KM</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hoffman</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Turkbey</surname>
            <given-names>E</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Summers</surname>
            <given-names>RM</given-names>
          </string-name>
          .
          <article-title>A new 2.5 d representation for lymph node detection using random sets of deep convolutional neural network observations</article-title>
          .
          <source>Lecture Notes in Computer Science</source>
          .
          <year>2014</year>
          ;
          <volume>8673</volume>
          :
          <fpage>520</fpage>
          -
          <lpage>527</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Coudray</surname>
            <given-names>N</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ocampo</surname>
            <given-names>PS</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sakellaropoulos</surname>
            <given-names>T</given-names>
          </string-name>
          , et al.
          <article-title>Classification and mutation prediction from nonsmall cell lung cancer histopathology images using deep learning</article-title>
          .
          <source>Nat Med</source>
          .
          <year>2018</year>
          ;
          <volume>24</volume>
          :
          <fpage>1559</fpage>
          -
          <lpage>1567</lpage>
          . doi:
          <volume>10</volume>
          .1038/s41591-018-0177-5
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Pattrapisetwong</surname>
          </string-name>
          and
          <string-name>
            <given-names>W.</given-names>
            <surname>Chiracharit</surname>
          </string-name>
          , “
          <article-title>Automatic lung segmentation in chest radiographs using shadow filter and multilevel thresholding</article-title>
          ,”
          <source>in Proceedings of 2016 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)</source>
          , Manchester, UK,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Krizhevsky</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Sutskever</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G. E.</given-names>
            <surname>Hinton</surname>
          </string-name>
          , “
          <article-title>ImageNet classification with deep convolutional neural networks</article-title>
          ,
          <source>” in Proceedings of Advances in Neural Information Processing Systems</source>
          , pp.
          <fpage>1097</fpage>
          -
          <lpage>1105</lpage>
          ,
          <string-name>
            <surname>Lake</surname>
            <given-names>Tahoe</given-names>
          </string-name>
          , Nevada, USA,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Manos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Seely</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Taylor</surname>
          </string-name>
          , J. Borgaonkar,
          <string-name>
            <given-names>H. C.</given-names>
            <surname>Roberts</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Mayo</surname>
          </string-name>
          , “
          <article-title>The lung reporting and data system (LU-RADS): a proposal for computed tomography screening,”Canadian Association of Radiologists Journal</article-title>
          , vol.
          <volume>65</volume>
          ,pp.
          <fpage>121</fpage>
          -
          <lpage>134</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Hussein</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cao</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Song</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bagci</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          :
          <article-title>Risk Stratification of Lung Nodules Using 3D CNN-Based Multi-task Learning</article-title>
          .
          <source>In: International Conference on Information Processing in Medical Imaging.</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Arnaud</surname>
            <given-names>A. A.</given-names>
          </string-name>
          <string-name>
            <surname>Setio</surname>
            , Francesco Ciompi, Geert Litjens, Paul Gerke, Colin Jacobs,
            <given-names>Sarah J. van Riel</given-names>
          </string-name>
          ,
          <article-title>Mathi”Pulmonary nodule detection in CT images:false positive reduction using multi-view convolutional networks”</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>Z.Q.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Wong</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>COVID-Net</surname>
          </string-name>
          :
          <article-title>A Tailored Deep Convolutional Neural Network Design for Detection of COVID-19 Cases from Chest X-Ray Images</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Melendez</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Ginneken</surname>
            ,
            <given-names>B.V.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Maduskar</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Philipsen</surname>
            ,
            <given-names>R.H.H.M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Reither</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Breuninger</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Adetifa</surname>
            ,
            <given-names>I.M.O.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Maane</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ; Ayles,
          <string-name>
            <surname>H.</surname>
          </string-name>
          ; Sánchez,
          <string-name>
            <surname>C.I.</surname>
          </string-name>
          <article-title>A Novel Multiple-Instance Learning-Based Approach to Computer-Aided Detection of Tuberculosis on Chest X-Rays</article-title>
          .
          <source>IEEE Trans. Med</source>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Angeline</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mrithika</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Raman</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Warrier</surname>
            <given-names>P</given-names>
          </string-name>
          .
          <article-title>Pneumonia detection and classification using chest Xray images with convolutional neural network</article-title>
          . In: Smys S,
          <string-name>
            <surname>Iliyasu</surname>
            <given-names>AM</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bestak</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shi</surname>
            <given-names>F</given-names>
          </string-name>
          , editors.
          <article-title>New trends in computational vision and bio-inspired computing</article-title>
          .
          <source>ICCVBIC</source>
          . Cham: Springer;
          <fpage>2020</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Ge</surname>
            <given-names>Z</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mahapatra</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            <given-names>X</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            <given-names>Z</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chi</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            <given-names>H</given-names>
          </string-name>
          .
          <article-title>Improving multi-label chest X-ray disease diagnosis by exploiting disease and health labels dependencies</article-title>
          .
          <source>Multimed Tools Appl</source>
          .
          <year>2019</year>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Justin</given-names>
            <surname>Johnson Andrej Karpathy Li</surname>
          </string-name>
          Fei-Fei,
          <article-title>DenseCap:Fully Convolutional Localization Networks for Dense Captioning</article-title>
          , IEEE Conference on
          <source>Computer Vision and Pattern Recognition (CVPR)</source>
          ,
          <year>2016</year>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Andrew</given-names>
            <surname>Ward</surname>
          </string-name>
          , Nicholas Bambos.
          <article-title>Quantum Annealing Assisted Deep Learning for Lung Cancer Detection</article-title>
          . http://cs231n.stanford.edu/reports/2017/pdfs/534.pdf
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Alcantud</surname>
            ,
            <given-names>J.C.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varela</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Santos-Buitrago</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Santos-Garcia</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jimenez</surname>
            ,
            <given-names>M.F.</given-names>
          </string-name>
          :
          <article-title>Analysis of survival for lung cancer resections cases with fuzzy and soft set theory in surgical decision making</article-title>
          .
          <source>PLoS ONE</source>
          <volume>14</volume>
          (
          <issue>6</issue>
          ),
          <year>e0218283</year>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>