<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>R. V. Reddy);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Evaluating Deep CNNs for Multi-Label Concept Detection in ROCOv2 Radiology Image Dataset by Team LekshmiscopeVIT</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Aryan Sahni</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rachit Gupta</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Raamigaani Venugopal Reddy</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lekshmi Kalinathan</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Vellore Institute of Technology</institution>
          ,
          <addr-line>Chennai-600127</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>The "Lekshmiscopevit" team presents a ResNet50-based approach for the Concept Detection Task of the ImageCLEF Medical 2025 challenge, using the Radiology Objects in Context version 2 (ROCOv2) dataset. Our experiments explored multiple deep learning architectures, including InceptionV3, DenseNet, and custom convolutional models, with and without pretrained ImageNet weights. Among these, the ResNet50 model consistently outperformed the others, achieving the highest accuracy in both the validation and the test sets. Training was carried out using 80,091 radiology images, 17,277 images used for validation, and 19,267 for testing. To assess the efect of label space complexity, we also experimented with reducing the number of predicted labels to the top most frequently occurring UMLS CUIs. This label reduction improved model performance by alleviating class imbalance and increasing generalization.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Transfer Learning</kwd>
        <kwd>Deep Learning</kwd>
        <kwd>Concept detection</kwd>
        <kwd>ResNet50</kwd>
        <kwd>Multi-label Classification</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Task Performed</title>
      <p>
        In the context of the ImageCLEFmedical Caption 2025 challenge [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], we have contributed to the
Concept Detection Task, which is the task of detecting clinically relevant UMLS concepts directly from
radiological images. The task represents a building block toward automatic image captioning and scene
understanding in the medical field. We created and trained Multi-Label Classification models to make
predictions for UMLS concepts related to each image in the dataset. The concepts were chosen from
a filtered portion of the UMLS 2022AB release, including those with greater frequency and specific
semantic types to maintain relevance and feasibility. Our method used the ROCOv2 dataset, which
contained a training set of 80,091 radiology images, a validation set of 17,277 images, and a test set of
19,267 images. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] The forecasted concepts were assessed with set coverage measures, namely precision,
recall, and F1-score, depicting the correctness and completeness of the concept sets produced by the
models. The experiments were all run on only the oficial training data, as per the task guidelines, to
make it comparable with other participating systems. The codes and trained model can be found in the
following GitHub repository: https://github.com/C0okiegranny221/CONCEPT-DETECTION.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Main Objectives of the Experiments</title>
      <p>
        The main goal of our experiments was to create a successful deep learning pipeline for multi-label
concept detection from radiology images within the ImageCLEFmedical Caption 2025 challenge [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
Of particular interest was finding clinically significant UMLS concepts solely based on visual features,
ultimately leading to the facilitation of downstream applications like automatic image captioning and
semantic retrieval. To this purpose, we exhaustively tested various convolutional neural network
architectures such as ResNet50, DenseNet121, InceptionV3, and custom-designed models under
pretrained and randomly initialised weight configurations. These custom models were lightweight CNN
models that were made up of dense connections to baseline the tests with the more well-established
models later in the test phase. The tests were designed to identify the model architecture with the optimal
generalisation performance for unseen medical images. A major experimental aim was to examine the
efect of label distribution on model performance by changing the number of concepts employed during
training and inference. We explored how limiting predictions to the most common UMLS concepts
afected performance metrics like precision, recall, and F1-score. Our highest-performing results were
obtained with a ResNet50-based model, and it showed higher accuracy for both the test set and the
validation set than other architectures. These results highlight the need for architecture choice and
optimisation of label space as key factors in improving visual concept recognition in medical imaging.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Approaches Used and Progress Beyond State-of-the-Art</title>
      <p>
        Our strategy towards the concept detection task [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] involved utilising deep convolutional neural
networks with efective preprocessing and label encoding techniques suited for multi-label classification.
The MultiLabelBinarizer (MLB) was used to encode the UMLS concepts in a binary matrix format,
which would allow the model to make multiple concept predictions per image. Input images were
preprocessed by Keras ImageDataGenerator, where real-time data augmentation and normalisation
were possible, improving the model’s generalisation on unseen medical images.
      </p>
      <p>
        We tried diferent CNN architectures, such as ResNet50, DenseNet121, and InceptionV3, and found
that ResNet50 models with pretrained ImageNet weights performed best overall. Preloading with
pretrained weights helped the models transfer low-level feature representations from natural images
to medical image data, leading to faster convergence and better accuracy. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] This transfer learning
approach gave a robust initialisation, enabling the network to concentrate on learning domain-specific
patterns applicable to radiology. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
      </p>
      <p>
        Compared to conventional concept detection techniques based on handcrafted features or shallow
classifiers, our solution achieved a significant improvement by combining deep visual feature learning
with multi-label semantic prediction. The use of label frequency analysis and concept space reduction
additionally provided performance boosts by concentrating on the most informative clinical concepts.
In total, our pipeline demonstrated strong gains on the ROCOv2 validation and test sets [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ],
demonstrating the power of combining CNN architectures with medical data-specific preprocessing and label
optimisation methodologies.
      </p>
      <p>The image size to feed into the neural network was set to a 224 x 224 x 3 image, and the image set was
not shufled and sent in batches of 128 at a time. Our team used the Adam optimizer, and the learning
rate was set to 0.01. We used binary cross-entropy as the loss function in our training and accuracy as
the metric. The early stopping function was utilised to avoid overfitting of the model on the training
set and was validated against the validation set to a minimum delta of 0.001 and a patience of 3, and the
best weights were restored at the end of the training.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Resources Used</title>
      <p>
        The experiments were carried out by leveraging a mix of on-campus GPU facilities and cloud-based
setups. Much of the training and testing of models was done on GPUs hosted by the high-performance
computing facilities of the college, which enabled the necessary computational power for processing
the massive ROCOv2 dataset [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. We also leveraged online environments like Google Colab and Kaggle
Notebooks that provided rapid prototyping functionality and working with pretrained models without
much hassle. The GPUs used for the training of the models were Tesla T4 with a 30 GB RAM ceiling.
      </p>
      <p>
        In order to speed up training and increase performance, we utilised pre-trained weights from ImageNet
for all deep networks, such as ResNet50, DenseNet121, and InceptionV3 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. This application of
transfer learning enabled the models to take advantage of learnt features beforehand and thereby
decrease training time as well as the likelihood of overfitting, particularly considering the intricacy
of radiology image content and the multi-label nature of the task. The integration of heterogeneous
computational environments and pre-trained models allowed us to efectively iterate on experiments,
tune hyperparameters, and test multiple architectures at scale.
      </p>
    </sec>
    <sec id="sec-5">
      <title>5. Results Obtained</title>
      <p>output space to the top 10 most commonly occurring UMLS concepts, the model recorded a Jaccard
index of 0.3929 with a corresponding exact match of 0.1018. This outcome captures the power of taking
advantage of high-frequency concepts and pretrained feature extractors in a multi-label classification
environment for understanding medical images.</p>
      <p>Although other models like DenseNet121 and InceptionV3 were competitive in their performance,
they could not equate the accuracy levels that ResNet50 attained under equivalent training conditions.
These results indicate that ResNet50 is especially best suited to the visual representation requirements
of the task of concept detection, particularly when used in tandem with label frequency filtering to
decrease the output space complexity.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Result Analysis</title>
      <p>
        Even though they are well known for excellent feature extraction strengths, InceptionV3 and
DenseNet121 performed poorly compared to ResNet50 under the multi-label concept detection task
based on ROCOv2 radiology images [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Various architectural and optimisation-related aspects
probably attributed to this result [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. ResNet50 proved to be the best model for this task based on its
fast convergence, strong residual learning mechanism, and ability to generalise under a low-label
and domain-specific environment. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] The inferior performance of alternative architectures and the
test-validation gap are explained by architectural mismatch with task requirements, distributional shift
sensitivity, and transfer learning constraints from non-medical domains. Future research might delve
into longer training lengths, pretraining for the medical domain, and curriculum-based label additions
for further optimisation.
      </p>
      <sec id="sec-6-1">
        <title>6.1. Architectural Complexity vs. Task Requirements</title>
        <p>
          InceptionV3, with its deeply modular structure consisting of multiple convolutional filters of diferent
sizes in parallel (e.g., 1×1, 3×3, 5×5), is optimised to learn multi-scale features. [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] While useful in
general natural image classification applications, this multiplicity of feature scales could have introduced
redundancy in learning features for medical images, where fine-grained domain-specific patterns
predominate and might not correspond well to general-purpose, multiresolution filters. In addition,
numerous parallel branches add computational and memory requirements, which may slow down
convergence in a short number of training epochs.
        </p>
        <p>
          DenseNet is based on dense connectivity, where every layer takes inputs from all earlier layers.
This design promotes feature reuse and prevents vanishing gradients. [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] But practically, this dense
connectivity can create overfitting on the unnecessary fine-grained details in high-resolution radiology
images [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], especially when the training is conducted for a few epochs and without domain-specific
pretraining. The nature of DenseNet to retain many low-level features can also result in information
dilution in subsequent layers, which could be undesirable in an application where semantic abstraction
and concept-level recognition are more important.
        </p>
        <p>
          By contrast, ResNet50’s residual connections allow stable gradient flow and speed up convergence
through the ability to learn identity mappings when deeper transformations are not required. [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]
This aspect is especially beneficial for transfer learning with pretrained weights, allowing for eficient
adaptation to the target domain without excessive degradation. ResNet50 thereby balances depth,
simplicity, and transferability [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] and is more suitable for medical concept detection with sparse label
space and high intra-class similarity.
        </p>
      </sec>
      <sec id="sec-6-2">
        <title>6.2. Impact of Label Space Restriction</title>
        <p>Another important factor in the observed performance diferences is the decision to restrict the output
label space to the top 10 most frequent UMLS concepts. This substantially reduced label sparsity and
class imbalance, which typically plague multi-label classification tasks. Models with higher capacity (e.g.,
DenseNet) may require larger label diversity to showcase their full representational power. Conversely,
ResNet50 took advantage of the decreased complexity of labels to allow it to converge better in the
lower-dimensional label space.</p>
      </sec>
      <sec id="sec-6-3">
        <title>6.3. Gap between Validation and Test Set Performance</title>
        <p>
          Although the performance in validation went up to 0.3929, in the unseen test set it was significantly
lower achieveing a F1 score of 0.1494 and a secondary F1 score of 0.2298.There could have been several
reasons why this gap ensued. First the Dataset Distribution Shift ,the ROCOv2 dataset [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]has been
reported to have a wide range of imaging modalities and diagnostic scenarios. The validation set was
drawn from the same distribution as training data, while the test set includes totally unseen images,
potentially including underrepresented modalities, resolutions, or clinical conditions. Such domain
shift can cause reduced generalization. The model could also have been privy to Overfitting to Most
Frequent Patterns as the model was trained and tested on most frequent 10 concepts, which could result
in overfitting to those dominant patterns. If the test set has a slightly diferent frequency distribution
or a greater level of label noise, the model would not be able to adapt and hence will perform lower
accuracy. Batch normalization layers found in all utilized models are batch statistics sensitive. At
training time, they learn to match the training/validation batch distribution. At inference time over the
test set (particularly when conducted in small batches), statistics mismatch causes suboptimal scaling
of activations and performance degradation.
        </p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Perspectives for Future Work</title>
      <p>
        While the existing method concentrated on utilizing ResNet50 with pre-initialized ImageNet weights
and decreasing the label domain to common ideas, there are a variety of promising avenues that can
be improved upon to increase performance and extend applicability. Further research can delve into
self-supervised pretraining on medical image datasets, e.g., MIMIC-CXR or CheXpert, to improve
feature representations towards the domain specific meaning of radiology. In addition, multi-modal
learning by blending image features with metadata or subset caption text can potentially allow more
context-sensitive predictions. Another novel avenue is the utilization of graph neural networks (GNNs)
to capture co-occurring and hierarchical semantic relationships between UMLS concepts, enabling
the model to draw on semantic dependencies in classification. A transformer-based approach has also
shown to to be better performing in cases of multilabel classification [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. In addition,
uncertaintyaware learning with Bayesian deep learning may alleviate label noise and dataset bias, particularly
in the long tail of infrequent concepts. Finally, adding visual grounding or attention maps to identify
concept-related regions within the image would make the system more interpretable for clinical users,
paving the way for hybrid AI-human diagnostic pipelines.
      </p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>This research was supported by the Department of Science and Technology (DST), India, under the
Fund for Improvement of S&amp;T Infrastructure in Universities and Higher Educational Institutions
(FIST) Program [Grant No. SR/FST/ET-I/2022/1079], along with a matching grant from VIT University.
The authors express their sincere gratitude to DST-FIST and the VIT management for their financial
assistance and the infrastructural support provided for this work.</p>
    </sec>
    <sec id="sec-9">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used GPT-4 Turbo, QuillBot, and Grammarly in order
to: Grammar and spelling check. Further, the author(s) used GPT-4 Turbo for rephrasing sentences
or paragraphs to improve clarity, conciseness, or style. After using these tools/services, the authors
reviewed and edited the content as needed and take full responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>H.</given-names>
            <surname>Damm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. M. G.</given-names>
            <surname>Pakull</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Becker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bracke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Eryilmaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bloch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Brüngel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. S.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Rückert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Pelka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Schäfer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Idrissi-Yaghir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Ben</given-names>
            <surname>Abacha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>García Seco de Herrera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Friedrich</surname>
          </string-name>
          , Overview of ImageCLEFmedical 2025 -
          <article-title>Medical Concept Detection and Interpretable Caption Generation</article-title>
          , in: CLEF 2025 Working Notes, CEUR Workshop Proceedings, CEUR-WS.org, Madrid, Spain,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Rückert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bloch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Brüngel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Idrissi-Yaghir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Schäfer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. S.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Koitka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Pelka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Ben</given-names>
            <surname>Abacha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>García Seco de Herrera</surname>
          </string-name>
          , H. Müller,
          <string-name>
            <given-names>P.</given-names>
            <surname>Horn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Nensa</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>M. Friedrich, ROCOv2: Radiology Objects in Context Version 2, an Updated Multimodal Image Dataset</article-title>
          ,
          <source>Scientific Data</source>
          <volume>11</volume>
          (
          <year>2024</year>
          ).
          <source>doi:10.1038/s41597-024-03496-6.</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>H. E.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cosa-Linan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Santhanam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jannesari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. E.</given-names>
            <surname>Maros</surname>
          </string-name>
          , T. Ganslandt,
          <article-title>Transfer learning for medical image classicfiation: a literature review</article-title>
          ,
          <source>BMC medical imaging 22</source>
          (
          <year>2022</year>
          )
          <article-title>69</article-title>
          . doi:
          <volume>10</volume>
          .1186/s12880-022-00793-7.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L.</given-names>
            <surname>Torrey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shavlik</surname>
          </string-name>
          ,
          <article-title>Transfer learning, in: Handbook of research on machine learning applications and trends: algorithms, methods, and techniques</article-title>
          ,
          <source>IGI global</source>
          ,
          <year>2010</year>
          , pp.
          <fpage>242</fpage>
          -
          <lpage>264</lpage>
          . doi:
          <volume>10</volume>
          .4018/ 978-1-
          <fpage>60566</fpage>
          -766-9.
          <year>ch011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S. R.</given-names>
            <surname>Shah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Qadri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Bibi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M. W.</given-names>
            <surname>Shah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. I.</given-names>
            <surname>Sharif</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Marinello</surname>
          </string-name>
          ,
          <article-title>Comparing inception V3, VGG 16</article-title>
          , VGG 19,
          <string-name>
            <surname>CNN</surname>
          </string-name>
          , and
          <article-title>ResNet 50: A case study on early detection of a rice disease</article-title>
          ,
          <source>Agronomy</source>
          <volume>13</volume>
          (
          <year>2023</year>
          )
          <article-title>1633</article-title>
          . doi:
          <volume>10</volume>
          .3390/agronomy13061633.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Weng</surname>
          </string-name>
          ,
          <article-title>A survey of image classification methods and techniques for improving classification performance</article-title>
          ,
          <source>International journal of Remote sensing 28</source>
          (
          <year>2007</year>
          )
          <fpage>823</fpage>
          -
          <lpage>870</lpage>
          . doi:
          <volume>10</volume>
          .1080/ 01431160600746456.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>N.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Jain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mishra</surname>
          </string-name>
          ,
          <article-title>An analysis of convolutional neural networks for image classification</article-title>
          ,
          <source>Procedia computer science 132</source>
          (
          <year>2018</year>
          )
          <fpage>377</fpage>
          -
          <lpage>384</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.procs.
          <year>2018</year>
          .
          <volume>05</volume>
          .198.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. Yan,</surname>
          </string-name>
          <article-title>HCP: A flexible CNN framework for multi-label image classification</article-title>
          ,
          <source>IEEE transactions on pattern analysis and machine intelligence</source>
          <volume>38</volume>
          (
          <year>2015</year>
          )
          <fpage>1901</fpage>
          -
          <lpage>1907</lpage>
          . doi:
          <volume>10</volume>
          .1109/tpami.
          <year>2015</year>
          .
          <volume>2491929</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Mascarenhas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <article-title>A comparison between VGG16, VGG19 and ResNet50 architecture frameworks for Image Classification</article-title>
          , in: 2021
          <source>International Conference on Disruptive Technologies for Multi-Disciplinary Research and Applications (CENTCON)</source>
          , volume
          <volume>1</volume>
          ,
          <year>2021</year>
          , pp.
          <fpage>96</fpage>
          -
          <lpage>99</lpage>
          . doi:
          <volume>10</volume>
          . 1109/CENTCON52345.
          <year>2021</year>
          .
          <volume>9687944</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lanchantin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ordonez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Qi</surname>
          </string-name>
          ,
          <article-title>General multi-label image classification with transformers</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>16478</fpage>
          -
          <lpage>16488</lpage>
          . doi:
          <volume>10</volume>
          .1109/cvpr46437.
          <year>2021</year>
          .
          <volume>01621</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>