<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Learning for Deep Neural Networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Gabriele Lagani</string-name>
          <email>gabriele.lagani@phd.unipi.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Machine Learning, Hebbian Learning, Deep Neural Networks, Computer Vision</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ISTI-CNR</institution>
          ,
          <addr-line>Pisa, 56124</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Pisa</institution>
          ,
          <addr-line>Pisa, 56127</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <fpage>19</fpage>
      <lpage>22</lpage>
      <abstract>
        <p>Deep learning is becoming more and more popular to extract information from multimedia data for indexing and query processing. In recent contributions, we have explored a biologically inspired strategy for Deep Neural Network (DNN) training, based on the Hebbian principle in neuroscience. We studied hybrid approaches in which unsupervised Hebbian learning was used for a pre-training stage, followed by supervised fine-tuning based on Stochastic Gradient Descent (SGD). The resulting semi-supervised strategy exhibited encouraging results on computer vision datasets, motivating further interest towards applications in the domain of large scale multimedia content based retrieval.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In the past few years, Deep Neural Networks (DNNs) have emerged as a powerful technology
in the domain of computer vision [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. Consequently, DNNs started gaining popularity also
in the domain of large scale multimedia content based retrieval, replacing handcrafted feature
extractors [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]. Learning algorithms for DNNs are typically based on supervised
end-toend Stochastic Gradient Descent (SGD) training with error backpropagation (backprop). This
approach is considered biologically implausible by neuroscientists [5]. Instead, they propose
Hebbian learning as a biological alternative to model synaptic plasticity [6].
      </p>
      <p>Backprop-based algorithms need a large number of labeled training samples in order to
achieve high results, which are expensive to gather, as opposed to unlabeled samples.</p>
      <p>The idea behind our contribution [7, 8] is to tackle the sample eficiency problem by taking
inspiration from biology and Hebbian learning. Since Hebbian approaches are mainly
unsupervised, we propose to use them to perform the unsupervised pre-training stage on all the
available data, in a semi-supervised setting, followed by end-to-end backprop fine-tuning on
the labeled data only. In the rest of this paper, we illustrate the proposed methodology, and we
show experimental results in computer vision. The results are promising, motivating further
interest in the application of our approach to the domain of multimedia content retrieval on a
large scale.</p>
      <p>The remainder of this paper is structured as follows: Section 2 gives a background concerning
Hebbian learning and semi-supervised training; Section 3 delves deeper into the semi-supervised
approach based on Hebbian learning; In Section 4, we illustrate our experimental results and
discuss the conclusions;</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background and related work</title>
      <p>Several variants of Hebbian learning rules were developed over the years. Some examples are:
Hebbian learning with Winner-Takes-All (WTA) competition [9], Hebbian learning for Principal
Component Analysis (PCA) [6, 10], Hebbian/anti-Hebbian learning [11]. A brief overview is
given in Section 3. However, it was only recently that Hebbian learning started gaining attention
in the context of DNN training [12, 13, 14, 15, 16].</p>
      <p>In [14], a Hebbian learning rule based on inhibitory competition was used to train a neural
network composed of fully connected layers. The approach was validated on object recognition
tasks. Instead, the Hebbian/anti-Hebbian learning rule developed in [11] was applied in [13] to
train convolutional feature extractors. The resulting features were shown to be efective for
classification. Convolutional layers were also considered in [ 12], where a Hebbian approach
based on WTA competition was employed in this case.</p>
      <p>However, the previous approaches were based on relatively shallow network architectures
(2-3 layers). A further step was taken in [15, 16], where Hebbian learning rules were applied for
training a 6-layer Convolutional Neural Network (CNN).</p>
      <p>It is known that a pre-training phase allows to initialize network weights in a region near
a good local optimum [17, 18]. Previous papers investigated the idea of enhancing neural
network training with an unsupervised learning objective [19, 20]. In [19], Variational
AutoEncoders (VAE) were considered, in order to perform an unsupervised pre-training phase
using a limited amount of labeled samples. Also [20] relied on autoencoding architectures to
augment supervised training with unsupervised reconstruction objectives, showing that joint
optimization of supervised and unsupervised losses helped to regularize the learning process.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Hebbian learning strategies and sample eficiency</title>
      <p>Consider a single neuron with weight vector w and input x. Call  = w x the neuron output.
A learning rule defines a weight update as follows:
w
= w + Δw
(1)
where w is the updated weight vector, w is the old weight vector, and Δw is the weight
update.</p>
      <p>The Hebbian learning rule, in its simplest form, can be expressed as Δw =   x (where
 is the learning rate) [6]. Basically, this rule states that the weight on a given synapse is
reinforced when the input on that synapse and the output of the neuron are simultaneously
high. Therefore, connections between neurons whose activations are correlated are reinforced.
In order to prevent weights from growing unbounded, a weight decay term is generally added.
In the context of competitive learning [9], this is obtained as follows:</p>
      <p>Δwi =    x −    wi =    (x − wi)
parameter to cope with the variance of activations. For this reason, we introduced a variant of
this approach that uses a softmax</p>
      <p>operation in order to compute   :
  = ∑ 

 
where the subscript i refers to the i’th neuron in a given network layer. Moreover, the output  
can be replaced with the result   of a competitive nonlinearity, which allows to decorrelate the
activity of diferent neurons. In the Winner-Takes-All (WTA) approach [ 9], at each training
step, the neuron which produces the strongest activation for a given input is called the winner.
In this case,   = 1 if the i’th neuron is the winner and 0 otherwise. In other words, only the
winner is allowed to perform the weight update, so that it will be more likely for the same
neuron to win again if a similar input is presented again in the future. In this way diferent
neurons are induced to specialize on diferent patterns. In soft-WTA [
21],   is computed as
 . We found this formulation to work poorly in practice, because there is no tunable
  =</p>
      <p>/
∑   /</p>
      <p>=1</p>
      <p>=1
( wi) = [( −</p>
      <p>∑  (  ) wj)2]
Δwi =  (
 )( −
∑  (  )wj)
(2)
(3)
(4)
(5)
where T is called the temperature hyperparameter. The advantage of this formulation is that we
can tune the temperature in order to obtain the best performance on a given task, depending on
the distribution of the activations.</p>
      <p>The Hebbian Principal Component Analysis (HPCA) learning rule, in the case of nonlinear
neurons, is obtained by minimizing the so-called representation error :
where  () is the neuron activation function. Minimization of this objective leads to the nonlinear
HPCA rule [10]:</p>
      <p>It can be noticed that these learning rules do not require supervision, and they are local for
each network layer, i.e. they do not require backpropagation. .</p>
      <p>In order to contextualize our approach in a scenario with scarce data, let’s define the labeled
set   as a collection of elements for which the corresponding label is known. Conversely, the
unlabeled set   is a collection of elements whose labels are unknown. The whole training set
 is given by the union of   and  
the same statistical distribution. In a sample eficiency
. All the samples from  are assumed to be drawn from
scenario, the number of samples in   is
typically much smaller than the total number of samples in  . In particular, in an  % -sample
eficiency</p>
      <p>regime, the size of the labeled set is  % that of the whole training set.</p>
      <p>To tackle this scenario, we considered a semi-supervised approach in two phases. During the
ifrst phase, latent representations are obtained from hidden layers of a DNN, which are trained
using unsupervised Hebbian learning. This unsupervised pre-training is performed on all the
available training samples. During the second phase, a final linear classifier is placed on top of
the features extracted from deep network layers. Classifier and deep layers are fine-tuned in a
supervised training fashion, by running an end-to-end SGD optimization procedure using only
the few labeled samples at our disposal.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Results and conclusions</title>
      <p>In order to validate our method, we performed experiments 1 on various image datasets, and in
various sample eficiency regimes. For the sake of brevity, but without loss of generality, in this
venue we present the results on CIFAR10 [22], in sample eficiency regimes where the amount
of labeled samples was respectively 1%, 5%, 10%, and 100% of the whole training set. Further
results can be found in [7, 8].</p>
      <p>We considered a six layer neural network as shown in Fig. 1: five deep layers plus a final
linear classifier. The various layers were interleaved with other processing stages (such as
ReLU nonlinearities, max-pooling, etc.). We first performed unsupervised pre-training with
a chosen algorithm. Then, we cut the network in correspondence of a given layer, and we
attached a new classifier on top of the features extracted from that layer. Deep layers and
classifier were then fine-tuned with supervision in an end-to-end fashion and the resulting
accuracy was evaluated. This was done for each layer, in order to evaluate the network on a
layer-by-layer basis, and for each sample eficiency regime. For the unsupervised pre-training
of deep layers, we considered both the HPCA and the soft-WTA strategy. In addition, as a
baseline for comparison, we considered another popular unsupervised method for pre-training,
namely the Variational Auto-Encoder (VAE) [23] (considered also in [19]). Note that VAE is
unsupervised, but still backprop-based.</p>
      <p>The results are shown Tab. 1, together with 95% confidence intervals obtained from five
independent repetitions of the experiments. In summary, the results suggest that our
semisupervised approach based on unsupervised Hebbian pre-training performs generally better
than VAE pre-training, especially in low sample eficiency regimes, in which only a small portion
of the training set (between 1% and 10%) is assumed to be labeled. In particular, the HPCA
1Code available at:
https://github.com/GabrieleLagani/HebbianPCA/tree/hebbpca.
approach appears to perform generally better than soft-WTA. Concerning the computational
cost of Hebbian learning, the approach converged in just 1-2 epochs of training, while backprop
approaches required 10-20 epochs, showing promises towards scaling to large scale scenarios.</p>
      <p>In future work, we plan to investigate the combination of Hebbian approaches with alternative
semi-supervised methods, namely pseudo-labeling and consistency-based methods [24, 25],
which do not exclude unsupervised pre-training, but rather can be integrated together. Moreover,
we are currently conducting more thorough explorations of Hebbian algorithms in the domain
of large scale multimedia content based retrieval, and the results are promising [26].</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This work was partially supported by the H2020 projects AI4EU (GA 825619) and AI4Media
(GA 951911).
[5] R. C. O’Reilly, Y. Munakata, Computational explorations in cognitive neuroscience: Understanding
the mind by simulating the brain, MIT press, 2000.
[6] S. Haykin, Neural networks and learning machines, 3 ed., Pearson, 2009.
[7] G. Lagani, F. Falchi, C. Gennaro, G. Amato, Hebbian semi-supervised learning in a sample eficiency
setting, Neural Networks 143 (2021) 719–731.
[8] G. Lagani, F. Falchi, C. Gennaro, G. Amato, Evaluating hebbian learning in a semi-supervised setting,
in: International Conference on Machine Learning, Optimization, and Data Science, Springer, 2021,
pp. 365–379.
[9] S. Grossberg, Adaptive pattern classification and universal recoding: I. parallel development and
coding of neural feature detectors, Biological cybernetics 23 (1976) 121–134.
[10] J. Karhunen, J. Joutsensalo, Generalizations of principal component analysis, optimization problems,
and neural networks, Neural Networks 8 (1995) 549–562.
[11] C. Pehlevan, T. Hu, D. B. Chklovskii, A hebbian/anti-hebbian neural network for linear subspace
learning: A derivation from multidimensional scaling of streaming data, Neural computation 27
(2015) 1461–1495.
[12] A. Wadhwa, U. Madhow, Bottom-up deep learning using the hebbian principle, 2016.
[13] Y. Bahroun, A. Soltoggio, Online representation learning with single and multi-layer hebbian
networks for image classification, in: International Conference on Artificial Neural Networks,
Springer, 2017, pp. 354–363.
[14] D. Krotov, J. J. Hopfield, Unsupervised learning by competing hidden units, Proceedings of the</p>
      <p>National Academy of Sciences 116 (2019) 7723–7731.
[15] G. Lagani, F. Falchi, C. Gennaro, G. Amato, Training convolutional neural networks with competitive
hebbian learning approaches, in: International Conference on Machine Learning, Optimization,
and Data Science, Springer, 2021, pp. 25–40.
[16] G. Lagani, F. Falchi, C. Gennaro, G. Amato, Comparing the performance of hebbian against
backpropagation learning using convolutional neural networks, Neural Computing and Applications
(2022) 1–17.
[17] Y. Bengio, P. Lamblin, D. Popovici, H. Larochelle, Greedy layer-wise training of deep networks, in:</p>
      <p>Advances in neural information processing systems, 2007, pp. 153–160.
[18] H. Larochelle, Y. Bengio, J. Louradour, P. Lamblin, Exploring strategies for training deep neural
networks., Journal of machine learning research 10 (2009).
[19] D. P. Kingma, S. Mohamed, D. Jimenez Rezende, M. Welling, Semi-supervised learning with deep
generative models, Advances in neural information processing systems 27 (2014) 3581–3589.
[20] Y. Zhang, K. Lee, H. Lee, Augmenting supervised neural networks with unsupervised objectives
for large-scale image classification, in: International conference on machine learning, 2016, pp.
612–621.
[21] S. J. Nowlan, Maximum likelihood competitive learning, in: Advances in neural information
processing systems, 1990, pp. 574–582.
[22] A. Krizhevsky, G. Hinton, Learning multiple layers of features from tiny images, 2009.
[23] D. P. Kingma, M. Welling, Auto-encoding variational bayes, arXiv preprint arXiv:1312.6114 (2013).
[24] A. Iscen, G. Tolias, Y. Avrithis, O. Chum, Label propagation for deep semi-supervised learning, in:
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp.
5070–5079.
[25] P. Sellars, A. I. Aviles-Rivero, C.-B. Schönlieb, Laplacenet: A hybrid energy-neural model for deep
semi-supervised classification, arXiv preprint arXiv:2106.04527 (2021).
[26] G. Lagani, D. Bacciu, C. Gallicchio, F. Falchi, C. Gennaro, G. Amato, Deep features for cbir
with scarce data using hebbian learning, Submitted at CBMI 2022 conference (2022). URL: https:
//arxiv.org/abs/2205.08935.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Krizhevsky</surname>
          </string-name>
          , I. Sutskever,
          <string-name>
            <given-names>G. E.</given-names>
            <surname>Hinton</surname>
          </string-name>
          ,
          <article-title>Imagenet classification with deep convolutional neural networks</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Ren,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>Deep residual learning for image recognition</article-title>
          ,
          <source>in: Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>770</fpage>
          -
          <lpage>778</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. C. H.</given-names>
            <surname>Hoi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Deep learning for content-based image retrieval: A comprehensive study</article-title>
          ,
          <source>in: Proceedings of the 22nd ACM international conference on Multimedia</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>157</fpage>
          -
          <lpage>166</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Babenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Slesarev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chigorin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Lempitsky</surname>
          </string-name>
          ,
          <article-title>Neural codes for image retrieval</article-title>
          ,
          <source>in: European conference on computer vision</source>
          , Springer,
          <year>2014</year>
          , pp.
          <fpage>584</fpage>
          -
          <lpage>599</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>