<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Pyramid-Focus-Augmentation: Medical Image Segmentation with Step-Wise Focus</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Vajira Thambawita</string-name>
          <email>vajira@simula.no</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Steven Hicks</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pål Halvorsen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael A. Riegler</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Converted GT</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Oslo Metropolitan University</institution>
          ,
          <country country="NO">Norway</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>SimulaMet</institution>
          ,
          <country country="NO">Norway</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <fpage>14</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>Segmentation of findings in the gastrointestinal tract is a challenging but also important task which is an important building stone for suficient automatic decision support systems. In this work, we present our solution for the Medico 2020 task, which focused on the problem of colon polyp segmentation. We present our simple but eficient idea of using an augmentation method that uses grids in a pyramid-like manner (large to small) for segmentation. Our results show that the proposed methods work as indented and can also lead to comparable results when competing with other methods.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Segmented polyp regions in Gastrointestinal Tract (GI) images [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
can provide detailed analysis to doctors to identify correct areas to
proceed with treatments compared to other computer-aided
analysis such as classification [
        <xref ref-type="bibr" rid="ref10 ref2 ref9">2, 9, 10</xref>
        ] and detection [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] which provide
less detailed information about the exact region and size of the
afected area. However, training Deep Learning (DL) models to
perform segmentation for medical data is challenging because of
the lack of medical domain images as a result of tight privacy
restrictions, the high cost for annotating medical data using experts,
and a lower number of true positive findings compared to true
negatives. In this paper, we present our approach for the
participation in the 2020 Medico Segmentation Challenge [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], for which
we introduce a novel augmentation technique called
pyramidfocus-augmentation (PYRA). PYRA can be used to improve the
performance of segmentation tasks when we have a small dataset
to train our DL models or if the number of positive findings is
small. Further, our method can focus doctors’ attention to regions
of polyps gradually. In addition to that the output of the method is
also adjustable meaning, we could present a lower resolution of the
grid if this is suficient for the task at hand which can help to save
processing time. Finally, our technique can also be applied to any
segmentation task using any deep learning segmentation model.
      </p>
    </sec>
    <sec id="sec-2">
      <title>METHOD</title>
      <p>
        Our method has two main steps: data augmentation with PYRA
using pre-defined grid sizes followed by training of a DL model with
the resulting augmented data. The source code for our method can
be found in our GitHub1 repository. The development dataset [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
1GitHub: https://vlbthambawita.github.io/PYRA/
2.1
      </p>
    </sec>
    <sec id="sec-3">
      <title>PYRA Data Augmentation</title>
      <p>As the first step in PYRA, we generate checker board grids as
illustrated in the first row of Figure 2 with sizes of  ×  with 
values of 4, 8, 16, 32, 64, 128 and 256.  should be selected such that
_ %  = 0. Applying these eight grid augmentations to
the training dataset with 800 images increases the training data to
800 × 8 = 6400 images.</p>
      <p>For the second step, we convert the Ground Truth (GT)
segmentation masks into a grid-based representation of the GT
corresponding to the grid sizes. For example, if the grid size is 8 × 8, then the
corresponding GT is a 8 × 8 converted GT.</p>
      <p>The transformation of the ground truth masks to gridded masks
is performed as following: (i) we divide the gt into the input grid
size, (ii) we counted true pixels of each grid cell, (iii) if the number
of true pixels is larger than 0, we converted the whole cell into a
true cell. An example of a converted GT is depicted on the top of
Figure 1.
2.2</p>
    </sec>
    <sec id="sec-4">
      <title>Experimental Setup and Model training</title>
      <p>
        We have set up four experiments: Exp-1, Exp-2, Exp-3, and Exp-4
to show the performance of PYRA. Exp-1 and Exp-2 represent two
baseline experiments. Exp-1 uses only the 800 training images
without any augmentations. In Exp-2, we used general augmentations
such as Afine, Coarse Dropout, and Additive Gaussian Noise from
the library called imgaug [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Exp-3 and Exp-4 are using our PYRA
with the data from Exp-1 and Exp-2, respectively. The training
dataset size was changed from 800 to 6400 after applying PYRA.
However, we validated our experiments only using 200 images
reserved for testing. We have used one data loader for all experiments
to maintain a fair evaluation. The baseline experiments Exp-1 and
Exp-2 used the data loader with a grid size of 256 × 256 which
represents the original GT masks without any conversion.
      </p>
      <p>Image
2 × 2</p>
      <p>
        We have used the Unet architecture [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] as our DL model to
perform the polyp segmentation task. We trained the Unet model
with a stacked input using a polyp image and a random grid image
selected from the eight sizes. Then, the model was trained to predict
converted GT which were formed by converting the real GT into a
grid-based GT as in the previous section.
      </p>
      <p>
        The Unet model used dropout layers with the probability of 0.5.
Then, we used our Unet model as a stochastic model to perform
Monte Carlo sampling for the validation data. We kept our Unet
model in the training state to perform this sampling while
predicting the output for the validation data. In the Pytorch library,
which is used for all our implementations, we can do this simply by
keeping the model state in the model.train() state. We iterated
50 times for a single input to predict the output. We calculated the
mean from these 50 predictions, which is used as the final prediction
for the competition and Standard Deviation (std) images to know
the model’s confidence for the predictions. The whole training
process is illustrated in Figure 1 with an example image and a grid size
of 8 × 8 as an input. However, we submitted the predicted mean
images for the gird size of 256 × 256 which generate predictions
with the size of true GT (without any transformations). All the
experiments used a fixed learning rate of 0.001 with the RMSprop
optimizer [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], which were selected from preliminary experiments.
3
      </p>
    </sec>
    <sec id="sec-5">
      <title>RESULT AND DISCUSSION</title>
      <p>Table 1 summarizes the Mean Intersection over Union (mIoU) and
the Dice Coeficient (DC) for the validation dataset and the test
dataset. The final results to the competition were collected from
mean images calculated by sampling 50 outputs for the same input
with the grid size of 256. Additionally, we have calculated std
images for the validation dataset to show the benefits of using PYRA.
Example outputs for a given input image are illustrated in Figure 2.</p>
      <p>According to the results in Table 1, Exp-3 which use only
Pyramidfocus-augmentation shows the best validation results with mIoU
of 0.7693 and DC of 0.8447, and the best test results with mIoU
of 0.6981 and DC of 0.7887. The advantage of our
Pyramid-focusaugmentation can be identified using the third row of Figure 2
along the fourth row of the same figure. We can see that our
model can focus on polyp regions step by step. The third row
of Figure 2 shows how our model predicts correct polyp cells in
2 × 2, 4 × 4, 8 × 8, 16 × 16, 32 × 32, 64 × 64, 128 × 128 and 256 × 256 grid
sizes, respectively. When we compare this row with the last row of
the images of std, we can see that the model has high confidence for
the identified polyp regions. For example, it shows high confidence
(black color region) for the middle part of the polyps. In contrast,
our model shows less confidence (yellow color region) for a polyps’
outer borders.
4</p>
    </sec>
    <sec id="sec-6">
      <title>CONCLUSION AND FUTURE WORK</title>
      <p>In this paper, we presented a novel augmentation method called
Pyramid-focus-augmentation (PYRA), which can be used to train
segmentation DL methods. Our method shows a large benefit in
the medical diagnosis use-case, by focusing a doctors’ attention on
regions with findings step by step.</p>
      <p>Our experiments did not use post-processing to clean up output
corresponding to the input grid. In future work, we will evaluate
our approach with additional post-processing steps for smaller
grid sizes. For example, we can do convolution operations to the
output using a convolutional window equal to the input grid size
to clean the results. However, post-processing techniques will not
improve the final results when the grid size equals the input images’
resolution.
5</p>
    </sec>
    <sec id="sec-7">
      <title>ACKNOWLEDGMENT</title>
      <p>The research has benefited from the Experimental Infrastructure for
Exploration of Exascale Computing (eX3), which is financially
supported by the Research Council of Norway under contract 270053.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Akbari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mohrekesh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Nasr-Esfahani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M. R.</given-names>
            <surname>Soroushmehr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Karimi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Samavi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Najarian</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Polyp Segmentation in Colonoscopy Images Using Fully Convolutional Network</article-title>
          .
          <source>In 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)</source>
          .
          <volume>69</volume>
          -
          <fpage>72</fpage>
          . https: //doi.org/10.1109/EMBC.
          <year>2018</year>
          .8512197
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Steven</given-names>
            <surname>Alexander</surname>
          </string-name>
          <string-name>
            <surname>Hicks</surname>
          </string-name>
          , Pia H Smedsrud, Pål Halvorsen, and
          <string-name>
            <given-names>Michael</given-names>
            <surname>Riegler</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Deep Learning Based Disease Detection Using Domain Specific Transfer Learning</article-title>
          .
          <source>Proc. of MediaEval.</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Geofrey</given-names>
            <surname>Hinton</surname>
          </string-name>
          ,
          <source>Nitish Srivastava, and Kevin Swersky</source>
          .
          <year>2012</year>
          .
          <article-title>Neural networks for machine learning lecture 6a overview of mini-batch gradient descent</article-title>
          . (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Debesh</given-names>
            <surname>Jha</surname>
          </string-name>
          , Steven A.
          <string-name>
            <surname>Hicks</surname>
          </string-name>
          , Krister Emanuelsen, Håvard Johansen, Dag Johansen, Thomas de Lange,
          <article-title>Michael A</article-title>
          .
          <string-name>
            <surname>Riegler</surname>
            , and
            <given-names>Pål</given-names>
          </string-name>
          <string-name>
            <surname>Halvorsen</surname>
          </string-name>
          .
          <year>2020</year>
          . Medico Multimedia Task at MediaEval 2020:
          <article-title>Automatic Polyp Segmentation</article-title>
          .
          <source>In Proc. of the MediaEval 2020 Workshop.</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Debesh</given-names>
            <surname>Jha</surname>
          </string-name>
          ,
          <string-name>
            <surname>Pia H Smedsrud</surname>
          </string-name>
          ,
          <article-title>Michael A Riegler, Pål Halvorsen</article-title>
          , Thomas de Lange, Dag Johansen, and
          <string-name>
            <given-names>Håvard D</given-names>
            <surname>Johansen</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Kvasir-seg: A segmented polyp dataset</article-title>
          .
          <source>In International Conference on Multimedia Modeling</source>
          . Springer,
          <fpage>451</fpage>
          -
          <lpage>462</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Alexander</surname>
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Jung</surname>
          </string-name>
          , Kentaro Wada, Jon Crall, Satoshi Tanaka, Jake Graving, Christoph Reinders, Sarthak Yadav, Joy Banerjee, Gábor Vecsei, Adam Kraft, Zheng Rui, Jirka Borovec, Christian Vallentin, Semen Zhydenko, Kilian Pfeifer, Ben Cook, Ismael Fernández,
          <string-name>
            <surname>François-Michel De</surname>
            <given-names>Rainville</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chi-Hung</surname>
            <given-names>Weng</given-names>
          </string-name>
          , Abner Ayala-Acevedo, Raphael Meudec, Matias Laporte, and others.
          <year>2020</year>
          . imgaug. https://github.com/aleju/imgaug. (
          <year>2020</year>
          ). Online; accessed 01-Nov-
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Ji</given-names>
            <surname>Young</surname>
          </string-name>
          <string-name>
            <given-names>Lee</given-names>
            , Jinhoon Jeong, Eun Mi Song, Chunae Ha, Hyo Jeong Lee, Ja Eun Koo,
            <surname>Dong-Hoon</surname>
          </string-name>
          <string-name>
            <given-names>Yang</given-names>
            , Namkug Kim, and
            <surname>Jeong-Sik Byeon</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Real-time detection of colon polyps during colonoscopy using deep learning: systematic validation with four independent datasets</article-title>
          .
          <source>Scientific Reports</source>
          <volume>10</volume>
          ,
          <issue>1</issue>
          (
          <year>2020</year>
          ),
          <fpage>8379</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Olaf</given-names>
            <surname>Ronneberger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Philipp</given-names>
            <surname>Fischer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Brox</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>U-net: Convolutional networks for biomedical image segmentation</article-title>
          . In International Conference on
          <article-title>Medical image computing and computer-assisted intervention</article-title>
          . Springer,
          <fpage>234</fpage>
          -
          <lpage>241</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Vajira</given-names>
            <surname>Thambawita</surname>
          </string-name>
          , Debesh Jha, Hugo Lewi Hammer,
          <string-name>
            <given-names>Håvard D.</given-names>
            <surname>Johansen</surname>
          </string-name>
          , Dag Johansen, Pål Halvorsen, and
          <string-name>
            <given-names>Michael A.</given-names>
            <surname>Riegler</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>An Extensive Study on CrossDataset Bias and Evaluation Metrics Interpretation for Machine Learning Applied to Gastrointestinal Tract Abnormality Classification</article-title>
          .
          <source>ACM Trans. Comput. Healthcare 1</source>
          ,
          <issue>3</issue>
          ,
          <string-name>
            <surname>Article 17</surname>
          </string-name>
          (
          <year>June 2020</year>
          ),
          <volume>29</volume>
          pages. https://doi.org/10.1145/3386295
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Vajira</surname>
            <given-names>Thambawita</given-names>
          </string-name>
          , Debesh Jha, Michael Riegler, Pål Halvorsen, Hugo Lewi Hammer,
          <string-name>
            <surname>Håvard D Johansen</surname>
            , and
            <given-names>Dag</given-names>
          </string-name>
          <string-name>
            <surname>Johansen</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>The medico-task 2018: Disease detection in the gastrointestinal tract using global features and deep learning</article-title>
          .
          <source>Proc. of MediaEval</source>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>