<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Explaining the Transfer Learning Ability of a Deep Neural Networks by Means of Representations</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>German Magai</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Artem Soroka</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>HSE University</institution>
          ,
          <addr-line>Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>National Research Nuclear University MEPhI</institution>
          ,
          <addr-line>Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <fpage>26</fpage>
      <lpage>28</lpage>
      <abstract>
        <p>The basis of transfer learning methods is the ability of deep neural networks to use knowledge from one domain to learn in another domain. However, another important task is the analysis and explanation of the internal representations of deep neural networks models in the process of transfer learning. Some deep models are known to be better at transferring knowledge than others. In this re-search, we apply the Centered Kernel Alignment (CKA) method to analyze the in-ternal representations of deep neural networks and propose a method to evaluate the ability of a neural network architecture to transfer knowledge based on the quantitative change in representations during the learning process. We introduce the Transfer Ability Score (TAs) measure to assess the ability of an architecture to effectively transfer learning. We test our approach using Vision Transformer (ViT-B/16) and CNN (ResNet, DenseNet) architectures in computer vision tasks in several datasets, including medical images. Our work is a contribution to the field of explainable AI and an attempt to explain the learning transfer process.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Transfer learning</kwd>
        <kwd>knowledge representation 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
      <p>
        Various methods are used to evaluate the similarity of neural representations: Linear-Reg [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ],
SVCCA[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], PWCCA[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], HSIC[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], but the most common is the Central Kernel Alignment (CKA)
method. The CKA analysis in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] shows the block structure of CNN. The paper [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] notes the
fundamental differences between ViT and CNN in terms of the similarity of representation. There
are many works that explore the problem of knowledge transfer [
        <xref ref-type="bibr" rid="ref10 ref11 ref7 ref8 ref9">7–11</xref>
        ]. In [
        <xref ref-type="bibr" rid="ref12 ref13 ref14">12–14</xref>
        ] it is argued
that ViT has better transfer learning performance than CNN in the medical imaging task. LEEP,
NCE, LogMe, OTCE [
        <xref ref-type="bibr" rid="ref15 ref16 ref17 ref18 ref19">15–19</xref>
        ] have been proposed to assess the transfer knowledge ability of a DNN.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. First level heading</title>
      <p>The deep neural network</p>
      <p>
        (  ) =   is mapping from the example   space to the class labels
  space. DNN=fL○…○f1. where functions fi, 1 ≤  ≤  , are called layer functions, θ is a set of
parameters. The design paradigms of modern DNN
model architectures are divided into
architectures based on the convolution (CNN) [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] and self-attention (ViT) [21]. Due to the large
number of existing DNN architectures, the question arises as to whether each one is suitable for
efficient transfer learning. Let  ,
      </p>
      <p>
        ∈   × denote the 2 sets of neural activations of layer i and
j of the DNN model with d = d1 and d2 neurons respectively in response to a batch of n examples.
The measure CKA ∈ [
        <xref ref-type="bibr" rid="ref1">0,1</xref>
        ] shows how sets  and  are similar to each other. The CKA is based on
the principle of the Hilbert-Schmidt Independence Criterion (HSIC) [22, 23]:
      </p>
      <p>1
( ,  ) = ( −1)2  (      ) = |
(    )|2,
where tr is the trace matrix, cov is the covariance matrix, F is the Frobenius norm, n – the
number of examples in a batch. Linear CKA can be calculated as follows:

( ,  ) =
√

( , )
( , )
( , )
,</p>
      <p>We propose a Transferability score (TAs) – a measure of the ability of a DNN to transfer
knowledge to a new domain. Consider the problem of transferring knowledge by the model with
architecture   from source domain   to target domain   . The adaptation to the   can be
interpreted via evolving of the feature space on different layers. A slight change in feature
representations on different layers during finetuning on domain   indicates that the DNN has a
high ability to transfer knowledge to a new domain. In contrast significantly change shows that
the information extracted from   is not enough to generalize knowledge to a new domain   , or
the domains are very different and a substantial change in the learned features representation is
required. A low TAs value is an indication of less parameter change during DNN training.
 =1 = { 1,  2. .   } is a set of representations for model DNNX with n1 layers trained
 =1 = { 1,  2. .   } is a set of representations for model DNNY with n2 layers
finetuned on Dt. Let’s define CKA matrix M1, where  1 is the value of the 
(  ,   ) between the 
representations on layers i and j. And CKA matrix M2, where  2 is the value of the 
between the  and  representations on layers i and j, respectively. Let’s denote  ′ =  1 −  2
 ′ shows how much the representations on different layers have changed after fine-tuned on the
target domain.  ′</p>
      <p>– i,j-th element of matrix  ′. We estimate the ability of a model with  
architecture to transfer knowledge (Transferability score – TAs) from the   domain to the  
domain via a quantitative change in the feature space after fine-tuning and define it as 
=

∑ , =1 | ′ | / 2. The  ′ values show the absolute change in the similarities of representations.
The lower the Transferability score, the greater the DNN model's ability to transfer knowledge.
In addition, the  ′ matrix provides a visual understanding of how
much the similarity of
representations on different layers of the DNN has changed after fine tuning on data in the   .
(1)
(2)</p>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <p>We test ResNet-50 [24], ResNet-101, DenseNet-121[25] and ViT-B/16 architecture models
pretrained on ImageNet-1k [26]. We analyze the ability of various DNN models to transfer knowledge
to a new target domain on several datasets: Eurosat (ESAT) [27], PatchCamelyon (PCAM) [28],
The Cars dataset [29], DTD [30], CIFAR-10 [31]. For DNN training we used Adam [32] stochastic
optimizer, lr = 5∙10-5, batch size = 32.</p>
      <p>
        The success of transfer learning depends on the similarity between the   and   : the more
similar the data, the more effective the transfer of knowledge [
        <xref ref-type="bibr" rid="ref8">8,33</xref>
        ]. Difference between the CKA
matrices showing the difference between the source and fine-tuned models for different  
(Figure 1). ImageNet's   partially includes information contained in DTD, CIFAR-10, and
Stanford cars, so the representations do not change as much as for PCAM and ESAT, which are
very different from ImageNet. To adapt to the PCAM and ESAT domains, the DNN model needs to
learn new feature representations, which is strongly reflected in the  ′ matrices. It can also be
seen that the ViT-B/16 architecture changes representations less significantly than ResNet-50,
which indicates that ViT-B/16 are able to extract more information from   and it is easier for
ViT to adapt to   . This is consistent with the greater accuracy of ViT models in knowledge
transfer than CNN models (Table 1).
      </p>
      <p>The dynamics of TAs during fine-tuning to a new dataset shows that when the accuracy of the
test stabilizes, the values of the TA score also stabilize (Figure 2). In ViT, we observe a slight
change in representations, because when trained on   , the ViT model extracts more complete
information from a large dataset and generalizes better, and when adapted to   , the adaptation
of the feature space is not so significant [34], which is consistent with the lower value of TAs.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>
        In this paper, we touch upon the issue of interpreting the change in the similarity of internal
features representations in the transfer learning process. We have proposed a method to evaluate
the ability of a DNN architecture to transfer knowledge from the source domain to target domain
based on similarity of feature representations. Experiments were performed for several
architectures on different datasets. Based on TAs we can conclude the ViT architecture has a
better ability to transfer knowledge than CNN models, which is consistent with previous research
[
        <xref ref-type="bibr" rid="ref7 ref8 ref9">7-9</xref>
        ].
      </p>
      <p>Improving our approach may be useful for choosing the optimal architecture. For future
research, we propose to pay attention to the transfer of knowledge not only within the modality
of images, but also cross-modality, for example, the use of features extracted from an image for
an audio or text classification task.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <p>The work of G. Magai was supported by the HSE University Basic Research Program. The work of
A. Soroka was performed in the Tensor Processors laboratory of the Mephius Full-cycle
Microelectronics Design Center (NRNU MEPhI) and IVA Technologies (HiTech).
[21] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, I. Polosukhin, Attention
is all you need. Advances in neural information processing systems, 2017.
[22] A. Gretton, K. Fukumizu, C. Teo, L. Song, B. Schölkopf, A. Smola, A kernel statistical test of
independence. Advances in neural information processing systems, 20, 2007.
[23] D. Greenfeld, U. Shalit, Robust learning with the hilbert-schmidt independence criterion. In:</p>
      <p>International Conference on Machine Learning (pp. 3759-3768). PMLR, 2020.
[24] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition. In: Proceedings
of the IEEE conference on computer vision and pattern recognition (pp. 770-778), 2016.
[25] G. Huang, Z. Liu, L. Van Der Maaten, K. Q. Weinberger, Densely connected convolutional
networks. In: Proceedings of the IEEE conference on computer vision and pattern
recognition (pp. 4700-4708), 2017.
[26] J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, L. Fei-Fei, Imagenet: A large-scale hierarchical image
database. In: 2009 IEEE conference on computer vision and pattern recognition (pp.
248255). IEEE, 2009.
[27] P. Helber, B. Bischke, A. Dengel, D. Borth, Eurosat: A novel dataset and deep learning
benchmark for land use and land cover classification. IEEE Journal of Selected Topics in
Applied Earth Observations and Remote Sensing, 12(7), 2217-2226, 2019.
[28] B. S. Veeling, J. Linmans, J. Winkens, T. Cohen, M. Welling, Rotation equivariant CNNs for
digital pathology. In: Medical Image Computing and Computer Assisted Intervention–
MICCAI, 2018.
[29] J. Krause, M. Stark, J. Deng, L. Fei-Fei, 3d object representations for fine-grained
categorization. In: Proceedings of the IEEE international conference on computer vision
workshops, 2013.
[30] M. Cimpoi, S. Maji, I. Kokkinos, S. Mohamed, A. Vedaldi, Describing textures in the wild. In:</p>
      <p>Proceedings of the IEEE conference on computer vision and pattern recognition, 2014.
[31] A. Krizhevsky, G. Hinton, Learning multiple layers of features from tiny images, 2009.
[32] D. P. Kingma, J. Ba, Adam: A method for stochastic optimization. arXiv preprint
arXiv:1412.6980, 2014.
[33] E. Otović, M. Njirjak, D. Jozinović, G. Mauša, A. Michelini, I. S̆tajduhar, Intra-domain and
crossdomain transfer learning for time series data — How transferable are the features?
Knowledge-Based Systems, 239, 107976, 2022.
[34] J. Kim, K. Shim, J. Kim, B. Shim, Vision Transformer-Based Feature Extraction for Generalized
Zero-Shot Learning. In: ICASSP 2023-2023 IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP) (pp. 1-5). IEEE, 2023.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Romero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ballas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. E.</given-names>
            <surname>Kahou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chassang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gatta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <article-title>Fitnets: Hints for thin deep nets</article-title>
          .
          <source>arXiv preprint arXiv:1412.6550</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Raghu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gilmer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yosinski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sohl-Dickstein</surname>
          </string-name>
          ,
          <article-title>Svcca: Singular vector canonical correlation analysis for deep learning dynamics and interpretability</article-title>
          .
          <source>Advances in neural in-formation processing systems</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Morcos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Raghu</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. Bengio,</surname>
          </string-name>
          <article-title>Insights on representational similarity in neural networks with canonical correlation</article-title>
          .
          <source>Advances in Neural Information Processing Systems</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>W. D. K.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. B.</given-names>
            <surname>Kleijn</surname>
          </string-name>
          ,
          <article-title>The HSIC bottleneck: Deep learning without backpropagation</article-title>
          .
          <source>In: Proceedings of the AAAI conference on artificial intelligence</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kornblith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Norouzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Hinton</surname>
          </string-name>
          ,
          <article-title>Similarity of neural network representations revisited</article-title>
          .
          <source>In: International Conference on Machine Learning</source>
          (pp.
          <fpage>3519</fpage>
          -
          <lpage>3529</lpage>
          ). PMLR,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Raghu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Unterthiner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kornblith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , A. Dosovitskiy,
          <article-title>Do vision trans-formers see like convolutional neural networks?</article-title>
          <source>Advances in Neural Information Processing Systems</source>
          ,
          <volume>34</volume>
          ,
          <fpage>12116</fpage>
          -
          <lpage>12128</lpage>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Yosinski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Clune</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lipson</surname>
          </string-name>
          ,
          <source>How Transferable Are Features in Deep Neural Networks? In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2</source>
          , pp.
          <fpage>3320</fpage>
          -
          <lpage>3328</lpage>
          . MIT Press,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>I.</given-names>
            <surname>Redko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Morvant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Habrard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sebban</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bennani</surname>
          </string-name>
          ,
          <article-title>Advances in domain adaptation theory</article-title>
          .
          <source>Elsevier</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>L.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zuo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <surname>L. Zhang,</surname>
          </string-name>
          <article-title>Cross-domain visual matching via generalized similarity measure and feature learning</article-title>
          .
          <source>IEEE transactions on pattern analysis and machine intelligence</source>
          ,
          <volume>39</volume>
          (
          <issue>6</issue>
          ),
          <fpage>1089</fpage>
          -
          <lpage>1102</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Ferreira</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          <article-title>M. Couto, Multi-domain semantic similarity in biomedical research</article-title>
          .
          <source>BMC bioinformatics</source>
          ,
          <volume>20</volume>
          (
          <issue>10</issue>
          ),
          <fpage>23</fpage>
          -
          <lpage>31</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>T.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kornblith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Swersky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Norouzi</surname>
          </string-name>
          , G. Hinton,
          <article-title>Big self-supervised models are strong semi-supervised learners</article-title>
          . arXiv preprint arXiv:
          <year>2006</year>
          .10029v2,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Usman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Zia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tariq</surname>
          </string-name>
          ,
          <article-title>Analyzing transfer learning of vision transformers for interpreting chest radiography</article-title>
          .
          <source>Journal of digital imaging</source>
          ,
          <volume>35</volume>
          (
          <issue>6</issue>
          ),
          <fpage>1445</fpage>
          -
          <lpage>1462</lpage>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <surname>Leveraging</surname>
            <given-names>CNN</given-names>
          </string-name>
          and
          <article-title>Vision Transformer with Transfer Learning to Diagnose Pigmented Skin Lesions</article-title>
          . Highlights in Science, Engineering and Technology,
          <volume>39</volume>
          ,
          <fpage>408</fpage>
          -
          <lpage>412</lpage>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>G.</given-names>
            <surname>Ayana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Dese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Dereje</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kebede</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Barki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Amdissa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. W.</given-names>
            <surname>Choe</surname>
          </string-name>
          ,
          <article-title>Vision-TransformerBased Transfer Learning for Mammogram Classification</article-title>
          . Diagnostics,
          <volume>13</volume>
          (
          <issue>2</issue>
          ),
          <fpage>178</fpage>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>C.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Hassner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Seeger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Archambeau</surname>
          </string-name>
          ,
          <article-title>Leep: A new measure to evaluate transferability of learned representations</article-title>
          .
          <source>In: International Conference on Machine Learning</source>
          (pp.
          <fpage>7294</fpage>
          -
          <lpage>7305</lpage>
          ). PMLR,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A. T.</given-names>
            <surname>Tran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. V.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          , T. Hassner,
          <article-title>Transferability and hardness of supervised classification tasks</article-title>
          .
          <source>In: Proceedings of the IEEE/CVF International Conference on Computer Vision</source>
          (pp.
          <fpage>1395</fpage>
          -
          <lpage>1405</lpage>
          ),
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. L.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zamir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Guibas</surname>
          </string-name>
          ,
          <article-title>An information-theoretic approach to transferability in task transfer learning</article-title>
          .
          <source>In: 2019 IEEE international conference on image processing (ICIP)</source>
          (pp.
          <fpage>2309</fpage>
          -
          <lpage>2313</lpage>
          ). IEEE,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. L.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <article-title>Otce: A transferability metric for cross-domain cross-task representations</article-title>
          .
          <source>In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</source>
          (pp.
          <fpage>15779</fpage>
          -
          <lpage>15788</lpage>
          ),
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>K.</given-names>
            <surname>You</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Long</surname>
          </string-name>
          ,
          <article-title>Logme: Practical assessment of pre-trained models for transfer learning</article-title>
          .
          <source>In: International Conference on Machine Learning</source>
          (pp.
          <fpage>12133</fpage>
          -
          <lpage>12143</lpage>
          ). PMLR,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>J.</given-names>
            <surname>Gu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kuen</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shahroudy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Shuai</surname>
          </string-name>
          , T. Chen,
          <article-title>Recent advances in convolutional neural networks</article-title>
          .
          <source>Pattern recognition</source>
          ,
          <volume>77</volume>
          ,
          <fpage>354</fpage>
          -
          <lpage>377</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>