<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Eric Y Chang</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>ImageCLEF2020: Laterality-Reduction Three- Dimensional CBAM-Resnet with Balanced Sampler for Multi-Binary Classification of Tuberculosis and CT Auto Reports</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>RIMAG Medical Imaging Corporation</institution>
          ,
          <addr-line>Beijing</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>San Diego VA Health Care System</institution>
          ,
          <addr-line>San Diego, CA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of California</institution>
          ,
          <addr-line>San Diego, CA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <volume>1</volume>
      <issue>2</issue>
      <fpage>0000</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>Detection and characterization of tuberculosis and the evaluation of lesion characteristics are challenging. In an effort to provide a solution for a classification task of tuberculosis findings, we proposed a laterality-reduction 3D CBAM Resnet with balanced-sampler strategy. With proper usage of both provided masks, each side of the lung was cropped, masked, and rearranged so that laterality could be neglected, and dataset size doubled. Balanced sampler in each batch sampler was also used in this study to address the data imbalance problem. CBAM was used to add an attention mechanism in each block of the Resnet to further improve the performance of the CNN.</p>
      </abstract>
      <kwd-group>
        <kwd>Tuberculosis</kwd>
        <kwd>Convolutional Neural Network</kwd>
        <kwd>Laterality-Reduction</kwd>
        <kwd>Dataset Imbalance</kwd>
        <kwd>Attention Mechanism</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Tuberculosis (TB) is a bacterial infection caused by the germ Mycobacterium
tuberculosis and is a leading cause of death from infectious disease worldwide. An epidemic
in many developing regions, such as Africa and Southeast Asia, it was responsible for
1.6 million deaths in 2017 alone. There are different manifestations of TB which require
different treatments, making the detection and characterization of TB disease and-the
evaluation of lesion characteristics critically important tasks in the monitoring, control,
and treatment of this disease. An accurate and automated method for the classification
of TB from CT images may be especially useful in regions of the world with few
radiologists.</p>
      <p>
        The ImageCLEF 2020 Tuberculosis – CT report challenge [
        <xref ref-type="bibr" rid="ref1 ref2">1,2</xref>
        ] was concentrated
on the automated CT report generation task. This year, three labels for each side of the
lungs were provided, namely labeling for the presence of TB lesions, pleurisy, and
caverns. In addition, a dataset containing chest CT scans of 403 TB patients (283 for
training and 120 for testing) was provided.
2
2.1
      </p>
    </sec>
    <sec id="sec-2">
      <title>Methods</title>
      <sec id="sec-2-1">
        <title>Data</title>
        <p>
          The dataset provided for the CT report task training contained a total of 283 patients,
with labeling provided for six categories. As seen in Figure 1(a), the training dataset
distribution of pathology was quite unbalanced, with “lung affected” at both sides being
the most commonly seen label, caverns being seen less, and pleurisy being the most
rarely observed condition. 17 sub-category combinations of the six categories are
shown in Figure 1(b), with “lung affected for both sides” (represented by [
          <xref ref-type="bibr" rid="ref1 ref1">1,1,0,0,0,0</xref>
          ])
as the sub-category with the most dataset counts of 73.
        </p>
        <p>
          By neglecting the laterality of the lungs and re-arranging the dataset, we found that the
dataset counts doubled to 576, but the categories for classification were sharply reduced
from six to three, as shown in Figure 1(c). When combining these resulting three
categories, there were only five sub-categories, as shown in Figure 1(d). The “lung affected
by lesion” category (represented by [
          <xref ref-type="bibr" rid="ref1">1,0,0</xref>
          ]) had the most counts of 288, while lung with
all three pathologies present had the fewest counts of 10.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Pre-processing</title>
        <p>
          To perform laterality-reduction, it was necessary to properly obtain images from both
sides of the lungs from the original dataset. Laterality-reduction images were obtained
according to the algorithm pipeline as shown in Figure 2. The images for the
ImageCLEF tuberculosis task were provided as NIFTI 3D datasets. Two versions of lung
segmentation masks were also provided [
          <xref ref-type="bibr" rid="ref3 ref4">3,4</xref>
          ]. The first version of segmentation
(denoted as Mask 1) provided more accurate masks, containing masks for left and right
laterality individually (values equal 1 for left and 2 for right), but in the most severe TB
cases, there was a tendency to miss large abnormal regions of lungs. On the other hand,
the second segmentation (denoted as Mask 2) provided less precise bounds, but was
more stable in terms of including lesion areas, though it contained the entire lung area
(including both left and right sides of the lungs).
        </p>
        <p>
          In order to take advantages of both masks, a two-step mask-cropping algorithm was
proposed in this study. As shown in Figure 2, both segmentation versions were used to
generate a laterality-reduction lung segmentation. First, the original NIFTI-formatted
dataset was transformed into image data using the NiBabel package [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Then, the
reformatted images were adjusted to three different window levels, namely baseline, lung,
and soft tissue, and then normalized. For baseline window level, the foreground was
obtained via the otsu_thresholding algorithm provided in openCV package [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]; for lung
and soft tissue, the image levels were set as [-600,1500] and [50,350], respectively.
Then, images were normalized to [
          <xref ref-type="bibr" rid="ref1">0,1</xref>
          ] with their mean and std value. Afterward, each
laterality of the images was cropped according to Mask 1. For the left laterality of the
lungs, the right boundary of the lungs was found and used to crop the left side from the
images at stage 1. Similarly, the right laterality used the left boundary to obtain the right
side from the images at stage 1. Finally, all three windows and levels of
laterality-reduction data were saved, and annotation file was rearranged for use in further training.
2.3
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Network Design and Training Strategy</title>
        <p>
          As shown in Figure 3, a 3D convolutional block attention module (CBAM)-Resnet was
designed to train the model for 3-class binary classification based on the PyTorch
framework. A standard 3D-resnet34 [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] was used as the convolutional neural network
backbone, with three fc layers to be the classifier. CBAM [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] was used to implement
channel and spatial attention mechanisms for each block of the Resnet. Sigmoid was
used as the activation function for binary classification.
        </p>
        <p>Fig. 3. Proposed laterality-reduction 3D CBAM-Resnet architecture. Each laterality of original
data is cropped and masked, then fed into a 3D Resnet for training and inference. Each block of
the Resnet is modified with convolutional block attention module (CBAM).</p>
        <p>
          To train the neural network, we used a workstation with 4 Nvidia GTX 1080 Ti video
cards, 128 GB RAM, and a 1 TB solid state drive. The training dataset was randomly
split to form a validation cohort comprised of 20% of the original dataset. During the
training process, to avoid over-fitting, image augmentation and balanced sampler were
implemented in each batch. For each batch, 12 datasets that were fed into the network
were dynamically generated from saved metadata with different window levels as a
single channel and were interpolated into a 3*64*256*256 size torch tensor. For the
image augmentation, traditional data augmentation methods, including brightness,
shear, scale, and flip were applied. The balanced sampler strategy was adopted during
the training process, which equalized the data sampled from all three classes for each
batch [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
        </p>
        <p>
          Binary CrossEntroy (BCE) was used as the baseline for the multi-binary
classification loss. Then, to improve the performance of the network, weighted BCE loss was
applied to let the network focus more on the “lung affected” and “caverns” categories.
Weighted focal loss was also applied in order to let the network focus further on more
difficult examples [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. All losses were realized on the PyTorch platform according to
equations (1) and (2):
loss
(

,  ,  ) = α(1 −  

)  

 (o, t, w) = −1/n ∑  [ ] ∗ ( [ ] ∗ log( [ ]) + (1 −  [ ])log⁡(1 −  [ ])).
(1)
(2)
 = 1 and  = 0, loss
 = 2 for focal loss.
3
3.1
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <sec id="sec-3-1">
        <title>Experimental Results</title>
        <p>
          Here, o means calculated output, t means target, and w means weights of classes. When
is the same with BCE loss. In this study, w equals [
          <xref ref-type="bibr" rid="ref1 ref2 ref4">4, 2, 1</xref>
          ]
In order to find the best combination of techniques for submission, we tested various
combinations using half of the dataset with 20 epochs of training. For each epoch, half
of the dataset was randomly selected for training. The experiments were conducted with
and without balanced sampling, with and without CBAM, and with various losses (i.e.,
BCE, wBCE, wFocal). During the training, epochs with the best mean AUC value were
saved. Then, models of different experiments were evaluated using the same validation
dataset, with the results shown in Figure 4.
(bsmp), demonstrate that mean AUC is significantly improved from 0.678 to 0.838.
With CBAM in (c), the mean AUC is slightly improved to 0.844. With wBCE and
wFocal as the loss instead of BCE in (d) and (e), the mean AUC is improved to 0.885
and 0.892. Then, with the full dataset used as the training dataset, the mean AUC
combined with wFocal achieved the highest mean AUC score of 0.916.
        </p>
        <p>Fig. 4. Model performance comparison of different combinations of the techniques, including
without cbam, without balanced sampler (bsmp), and with different losses (i.e., binary cross
entropy (bce), focal losses, weighted bce, and weighted focal)
A comparison of experiment results is also summarized in Table 1.</p>
        <sec id="sec-3-1-1">
          <title>CBAM</title>
          <p>√
√
√
√</p>
          <p>Experiments
Balanced
Sampler</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>Loss</title>
          <p>√
√
√
√
√
BCE
BCE
BCE
wBCE
wFocal
wFocal</p>
        </sec>
        <sec id="sec-3-1-3">
          <title>Dataset Scale Half Half</title>
          <p>Half
Half
Half
Full</p>
        </sec>
        <sec id="sec-3-1-4">
          <title>Loss</title>
          <p>Min AUC
Mean AUC
0.678
0.838
0.844
0.885
0.892
0.916
0.458
0.553
0.387
0.343
0.367
0.302
Metrics
0.55
0.76
0.81
0.83
0.87
0.90
3.2</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Inference and Submission</title>
        <p>The provided TST dataset included 120 image files for testing. With our pre-processing
pipeline, the TST data were cropped according to the provided Mask 1 and Mask 2 to
generate 240 laterality-reduction image files. After prediction by the trained model, the
results were rearranged so that both lateralities of one patient were combined again
according to the requirement and saved as the .txt file to be submitted.</p>
        <p>As different techniques were applied, thus generating different results, some results
were ensembled in order to generate better results. Ensembling results of weighted
binary cross-entropy loss and weighted focal loss gave the best mean AUC. Test time
augmentation was also attempted, and although it produced the best minimum AUC, it
did not have the best mean AUC. A detailed description of our submissions is as
follows:</p>
        <p>For submission ID 67838, the technique used was cbam + balanced sampler +
wBCE, number of epochs was 60, and the best model with validation mean AUC of
0.916 was saved and used. The mean AUC obtained on the TST dataset was 0.872, with
min AUC of 0.810.</p>
        <p>For submission ID 67839, the technique used was cbam + balanced sampler +
wFocal, number of epochs was 60, and the best model with validation mean AUC of 0.918
was saved and used. The mean AUC obtained on the TST dataset was 0.874, with min
AUC of 0.809.</p>
        <p>For submission ID 67920, the technique used was cbam + balanced sampler + Focal,
number of epochs was 48, and the best model with validation mean AUC of 0.907 was
saved and submitted. The mean AUC obtained on the TST dataset was 0.832, with min
AUC of 0.779.</p>
        <p>For submission ID 67921, the technique used was cbam + balanced sampler + BCE,
number of epochs was 48, and the best model with validation mean AUC of 0.86 was
saved and submitted. The mean AUC obtained on the TST dataset was 0.737, with min
AUC of 0.708.</p>
        <p>For submission ID 67950, the submitted results were a combination of submission
IDs 67838 and 67839. A mean AUC of 0.875 was achieved.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Discussion and Conclusion</title>
      <p>In an effort to provide a CNN solution for a multi-binary classification task of
tuberculosis findings, we proposed a laterality-reduction 3D CBAM Resnet. As severe class
imbalance exists in the dataset provided, we tried several techniques to improve the
model performance. First, with proper usage of both provided masks, each side of the
lungs was cropped, masked, and rearranged so that laterality could be neglected. By
cropping each side of the lungs, task number was reduced from six binary
classifications to three, but the size of datasets doubled. Balanced sampler in each batch sampler
was also used in this study to address the data imbalance problem. CBAM was used to
add an attention mechanism in each block of the Resnet to further improve the
performance of the CNN. Modified binary focal loss was also realized in the PyTorch
framework to allow the network to focus on more difficult examples. Using all the
aforementioned techniques, we achieved a mean AUC of 0.875 in the evaluation of the test
dataset, and placed second in this competition.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Perspectives for Future Work</title>
      <p>
        In this study, we only tested Resnet-based CNN architecture, as there was a limited
timeframe and as the 3D dataset-based CNN was slow to train. In the future, more CNN
architectures should be tested, such as 3D Resnet 50, 3D Resnet 101, 3D Densenet [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ],
3D Efficientnet [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], etc. Besides, even with our best performing model, the overfit still
existed during training. While this was mostly due to the limited training dataset,
additional image augmentation techniques, such as non-linear transformation, contrast
random adjusting, channel shuffling, etc. could be tested in the future to obtain even better
results. Additionally, because we did not perform the k-fold cross-validation, the
training and validation dataset used in this study contained some bias in the distribution of
the category. In the future, at least a 5-fold cross-validation will be performed, and the
results will be ensembled to form the final model.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Kozlovski</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liauchuk</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Dicente</given-names>
            <surname>Cid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Tarasau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Kovalev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            ,
            <surname>Müller</surname>
          </string-name>
          , H.:
          <article-title>Overview of ImageCLEF tuberculosis 2020 - automatic CT-based report generation</article-title>
          .
          <source>In: CLEF2020 Working Notes. CEUR Workshop Proceedings</source>
          , Thessaloniki, Greece, CEURWS.org http://ceur-ws.
          <source>org (September 22-25</source>
          ,
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Ionescu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Müller</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peteri</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Abacha</surname>
            ,
            <given-names>A.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Datla</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hasan</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Demner-Fushman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kozlovski</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liauchuk</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cid</surname>
            ,
            <given-names>Y.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kovalev</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pelka</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Friedrich</surname>
            ,
            <given-names>C.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>de Herrera</surname>
            ,
            <given-names>A.G.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ninh</surname>
            ,
            <given-names>V.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>T.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Piras</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riegler</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , l Halvorsen,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Tran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.T.</given-names>
            ,
            <surname>Lux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Gurrin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Dang-Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.T.</given-names>
            ,
            <surname>Chamberlain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Campello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Fichou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Berari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Brie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Dogariu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Stefan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.D.</given-names>
            ,
            <surname>Constantin</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.G.</surname>
          </string-name>
          :
          <article-title>Overview of the ImageCLEF 2020: Multimedia retrieval in medical, lifelogging, nature, and internet applications</article-title>
          .
          <source>In: Experimental IR Meets Multilinguality, Multimodality, and Interaction. Volume 12260 of Proceedings of the 11th International Conference of the CLEF Association (CLEF</source>
          <year>2020</year>
          ).,
          <string-name>
            <surname>Thessaloniki</surname>
          </string-name>
          , Greece,
          <source>LNCS Lecture Notes in Computer Science</source>
          , Springer (September
          <volume>22</volume>
          -25
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Dicente</given-names>
            <surname>Cid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Jiménez del Toro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.A.</given-names>
            ,
            <surname>Depeursinge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Müller</surname>
          </string-name>
          , H.:
          <article-title>Efficient and fully automatic segmentation of the lungs in CT volumes</article-title>
          . In Goksel,
          <string-name>
            <given-names>O.</given-names>
            ,
            <surname>Jimenez del Toro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.A.</given-names>
            ,
            <surname>Foncubierta-Rodriguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Müller</surname>
          </string-name>
          , H., eds.
          <source>: Proceedings of the VISCERAL Anatomy Grand Challenge at the 2015 IEEE ISBI. CEUR Workshop Proceedings</source>
          , CEUR-WS.org &lt;http://ceur-ws.
          <source>org&gt; (May</source>
          <year>2015</year>
          )
          <fpage>31</fpage>
          -
          <lpage>35</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Liauchuk</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kovalev</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Imageclef 2017: Supervoxels and co-occurrence for tuberculosis CT image classification</article-title>
          .
          <source>In: CLEF2017 Working Notes. CEUR Workshop Proceedings</source>
          , Dublin, Ireland, CEUR-WS.org &lt;http://ceur-ws.
          <source>org&gt; (September 11-14</source>
          ,
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Brett</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hanke</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Markiewicz</surname>
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Côté M</surname>
          </string-name>
          .
          <article-title>-</article-title>
          A.,
          <string-name>
            <surname>McCarthy</surname>
            <given-names>P.</given-names>
          </string-name>
          , and Cheng C., nipy/nibabel: 2.3.3 Zenodo., (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>6. OpenCV: Image Thresholding. https://docs.opencv.org/master/d7/d4d/tutorial_py_thresholding.html</mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>He</surname>
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ren</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
            <given-names>J</given-names>
          </string-name>
          .:
          <article-title>Deep Residual Learning for Image Recognition</article-title>
          . In CVPR. (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Woo</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Park</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            <given-names>J.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kweon</surname>
            <given-names>I.S.:</given-names>
          </string-name>
          <article-title>CBAM: Convolutional Block Attention Module</article-title>
          . ECCV. (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Imbalanced</given-names>
            <surname>Dataset</surname>
          </string-name>
          <article-title>Sampler</article-title>
          . https://github.com/ufoym/imbalanced-dataset-sampler
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>T.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Girshick</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dollár</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Focal loss for dense object detection</article-title>
          . In: ICCV. (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Huang</surname>
            <given-names>G.</given-names>
          </string-name>
          , Liu Z., van der Maaten L.,
          <string-name>
            <surname>Weinberger</surname>
            <given-names>K.Q.</given-names>
          </string-name>
          :
          <article-title>Densely Connected Convolutional Networks</article-title>
          .
          <source>CVPR</source>
          . (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Tan</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le</surname>
            <given-names>Q.V.</given-names>
          </string-name>
          :
          <article-title>EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks</article-title>
          . ICML. (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>