<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Ensemble of Texture Features for finding abnormalities in the Gastro-Intestinal Tract</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Syed Sadiq Ali Naqvi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shees Nadeem</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Muhammad Zaid</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Muhammad Atif Tahir</string-name>
          <email>atif.tahir@nu.edu.pk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Computer Science National University of Computer and Emerging Sciences, Karachi Campus</institution>
          ,
          <country country="PK">Pakistan</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <fpage>13</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>An endoscopy is a procedure in which a doctor uses specialized instruments to view and operate on the internal organs and vessels of the body. This paper aims to predict the diseases and abnormalities in the Gastro-Intestinal Tract, using multimedia data. It difers from other projects in the medical domain because it does not use medical imaging like X-rays, CT scan etc. The dataset, which comprises of 4000 images, is provided by MediaEval Benchmarking Initiative for Multimedia Evaluation. The data is collected during traditional colonoscopy procedures. Techniques from the fields of multimedia content analysis (to extract information from the visual data) and machine learning (for classification) have been used. On testing data, 94% accuracy and an MCC of 0.73 is achieved using logistic regression and ensemble on diferent features.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Medical image diagnosis is one of the most challenging tasks
pertinent to the industry of computer vision. Most of the work in the
recent times has been done on CT-Scans, X-Rays, and MRI etc. The
Medico Task of 2017 challenged their participants to predict the
abnormalities in the Gastro-Intestinal tract through endoscopic
examination [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This implies the presence of multimedia images
instead of traditional medical images for the challenge [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Deep
analysis on GI tract images can help to predict abnormalities and
diseases in its initial stages [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. 4000 images were used for
training purpose and the same numbers were reserved for testing data.
Diferent pre-processing techniques were applied and machine
learning models were deployed to produce healthy results.
      </p>
    </sec>
    <sec id="sec-2">
      <title>OUR PROPOSED APPROACH</title>
      <p>Feature Engineering is one of the most challenging and key parts
of any Machine Learning Project. Discriminating features are the
requirement for function approximation. The task organizers
provided 6 pre-computed visual features for every image. These include
JCD, Tamura, Color Layout, Edge Histogram, Auto Color
Correlogram and PHOG.</p>
      <p>
        Since texture plays an important role in the recognition of any
object in the image and has been used a lot for diferent computer
vision tasks such as Facial recognition etc. We, therefore, compute
the texture of the images using the most common methods of
Local Binary Pattern [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and Haralick features [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. This drastically
improves the classifier accuracy. Through 10-Fold cross validation
approach, it was found that some features perform very poorly as
compared to others. Hence, they were removed from the model.
The refined features were JCD, Edge Histograms, Color Layout,
Auto Color Correlogram, Local Binary Pattern with radius 1 and
haralick texture features.
      </p>
      <p>
        We then train separate model using logistic regression [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and
kernel discriminant analysis using spectral regression [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ] for
each feature because of the composite nature of features. Ensemble
technique was then applied to the predictions. Ensemble implies
the fact that final model makes use of majority voting among all the
independent models trained on each feature. It should be noted that
we investigated various advanced machine learning techniques but
the best results were obtained using logistic regression and thus
reported in this paper.
      </p>
      <p>
        One of the interesting characteristics of this competition
included the limited use of data to train the models. We, therefore,
use K-means clustering [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] to come up with a reduced data set
representing the whole distribution. We divide the dataset into
10 clusters and extract images from each cluster in an equal ratio.
Through this, we extract 732 images from 4000 to train models.
      </p>
    </sec>
    <sec id="sec-3">
      <title>RESULTS AND ANALYSIS</title>
      <p>The linear regression model was implemented using Python’s
scikitlearn package. Among other parameters of logistic regression, two
of the most important parameters include “solver" and “multi_class"
parameters, for which we used the values of “lbfgs" and “ovr"
respectively. The Broyden Fletcher Goldfarb Shanno (BFGS) algorithm is
an iterative method for solving unconstrained nonlinear
optimization problems. One-Versus-Rest (ovr), also known as one-vs-all, is
a strategy which fits one classifier per class. For each classifier, the
class is fitted against all the other classes. In addition to its
computational eficiency (only 8 classifiers are needed), one advantage of
this approach is its interpretability. Since each class is represented
by one and one classifier only, it is possible to gain knowledge about
the class by inspecting its corresponding classifier. This is the most
commonly used strategy for multiclass classification and is a fair
default choice.</p>
      <p>
        We train logistic regression on each feature resulting in 6
different models. Each model provided 8 probabilities, where each
probability represented a class confidence score. These probabilities
were added together and the class with the highest probability score
is chosen to be the predicted label. By applying the proposed model,
we obtained the accuracy of 90% with the F1-score of 0.89 and MCC
of 0.8 on the training data. While on testing data, which are
independently run the organizers, we found the accuracy of 94% with the
F-score of 0.76 and MCC of 0.73 (Table 2). The best run is obtained
using Run1 in which all 4000 images are used and this approach
is basically ensemble of 6 features (JCD, Edge Histograms, Color
Layout, Auto Color Correlogram, Local Binary Pattern with radius
1 and haralick texture features). Logistic regression is being used
as the classifier. In summary, following are the 5 runs submitted for
the abnormality detection:
3.1 Run1
Ensemble of 6 features [JCD, Edge Histogram, Color Layout, Auto
Color Correlogram, LBP, Haralick] trained on 4000 images, using
Logistic Regression.
3.2 Run2
Same as Run1 but 2000 images were randomly selected.
3.3 Run3
Same as Run1 with the addition of another feature. The new feature
was formulated by Kernel Discriminant Analysis (for
dimensionality reduction) which takes an input all the 6 features. For this run,
4000 images were used.
3.4 Run4
The model was trained on just reduced dimensions which were
obtained by KDA. Nearest Neighbour was used as the classifier.
Complete training data (4000) is used.
3.5 Run5
Firstly, KMeans Clustering [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] is applied to obtain 10 clusters from
each class. From these clusters, 732 images were selected such that
uniformity among the dataset is maintained. Run1 was duplicated
on these selected 732 images.
      </p>
      <p>Table 1 shows the confusion matrix of the best run. It is observed
that the model performs remarkably well for Normal-Pylorus (all
500 True Positive) and Normal-cecum (485). It also classifies
Normalz-line quite accurately (451), however, Esophagitis is also being
confused with Normal-z-line quite often. Polyps are also being
correctly classified moderately well (341), however, they are also
being confused with Ulcerative-colitis (and vice versa) and
Normalcecum. Lastly, Dyed-resection-margins and Dyed-lifted-polyps are
being confused with each other in some cases. It feels like the model
is somewhat overfit on the Normal-cecum class.
4</p>
    </sec>
    <sec id="sec-4">
      <title>CONCLUSION</title>
      <p>We present our proposed model to classify gastro-intestinal
abnormalities using endoscopic images. Training (4000 samples) and
Testing (4000 samples) data was provided by MediaEval
Benchmarking Initiative for Multimedia Evaluation. As mentioned earlier
in the introduction, the study used multimedia content analysis,
machine learning and ensemble learning techniques for
classification. The best of the results were found on logistic regression using
ensemble method on 6 diferent features (including Local Binary
Pattern, Haralick texture feature) which resulted in an accuracy of
94% with F1-score of 0.76 and MCC of 0.73 on testing data.
The 2017 Multimedia for Medicine Task (Medico)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Michael</given-names>
            <surname>Riegler</surname>
          </string-name>
          , Konstantin Pogorelov, PÃěl Halvorsen, Carsten Griwodz, Thomas de Lange, Kristin Ranheim Randel, Sigrun Losada Eskeland,
          <string-name>
            <surname>Duc-Tien</surname>
            <given-names>DangNguyen</given-names>
          </string-name>
          , Mathias Lux, Concetto Spampinato, “
          <article-title>Multimedia for Medicine: The Medico Task at MediaEval 2017"</article-title>
          ,
          <issue>MediaEvalâĂŹ17</issue>
          ,
          <fpage>13</fpage>
          -15
          <source>September</source>
          <year>2017</year>
          , Dublin, Ireland.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Konstantin</given-names>
            <surname>Pogorelov</surname>
          </string-name>
          , Kristin Ranheim Randel, Carsten Griwodz, Sigrun Losada Eskeland, Thomas de Lange, Dag Johansen, Concetto Spampinato,
          <string-name>
            <surname>Duc-Tien</surname>
            <given-names>DangNguyen</given-names>
          </string-name>
          , Mathias Lux, Peter Thelin Schmidt, Michael Riegler, PÃěl Halvorsen, “
          <article-title>Kvasir: a multi-class image dataset for computer aided gastrointestinal disease detection"</article-title>
          ,
          <source>Proceedings of ACM on Multimedia Systems Conference (MMSYS)</source>
          , pp.
          <fpage>164</fpage>
          -
          <lpage>169</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>T.</given-names>
            <surname>Ojala</surname>
          </string-name>
          , M. PietikÃďinen, and T. T. MÃďenpÃďÃď,
          <article-title>âĂIJMultiresolution gray-scale and rotation invariant texture classification with Local Binary Pattern</article-title>
          ,
          <source>âĂİ IEEE Trans. on Pattern Analysis and Machine Intelligence</source>
          , vol.
          <volume>24</volume>
          , no.
          <issue>7</issue>
          , pp.
          <fpage>971</fpage>
          -
          <lpage>987</lpage>
          ,
          <year>2002</year>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Haralick</surname>
          </string-name>
          ,
          <string-name>
            <surname>Robert</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>and Karthikeyan</given-names>
            <surname>Shanmugam</surname>
          </string-name>
          . “
          <article-title>Textural features for image classification</article-title>
          .
          <source>" IEEE Transactions on systems, man, and cybernetics 6</source>
          (
          <year>1973</year>
          ):
          <fpage>610</fpage>
          -
          <lpage>621</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          , and J. Han.
          <article-title>Speed up kernel discriminant analysis</article-title>
          .
          <source>The VLDB Journal</source>
          ,
          <volume>20</volume>
          (
          <issue>1</issue>
          ):21âĂŞ33,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Tahir</surname>
          </string-name>
          et al.
          <article-title>A robust and scalable visual category and action recognition system using kernel discriminant analysis with spectral regression</article-title>
          .
          <source>IEEE Transactions on Multimedia</source>
          ,
          <volume>15</volume>
          (
          <issue>7</issue>
          ),
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Christopher</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Bishop</surname>
          </string-name>
          ,
          <source>Pattern Recognition and Machine Learning</source>
          , Springer,
          <year>2006</year>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <article-title>[8] MacQueen, James , Some methods for classification and analysis of multivariate observations</article-title>
          ,
          <source>Proceedings of the fifth Berkeley symposium on mathematical statistics and probability</source>
          ,
          <source>1967</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>