<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>SC AOA arithmetic</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>New Graph regularized Sparse Coding Improving Automatic Image Annotation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ce´line RABOUY</string-name>
          <email>celine.rabouy@lsis.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Se´bastien PARIS</string-name>
          <email>sebastien.paris@lsis.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Herve´ GLOTIN</string-name>
          <email>glotin@univ-tln.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Aix-Marseille Universite ́</institution>
          ,
          <addr-line>CNRS, ENSAM, LSIS UMR 7296, 13397 Marseille</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institut Universitaire de France</institution>
          ,
          <addr-line>75005 Paris</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Universite ́ de Toulon</institution>
          ,
          <addr-line>CNRS, LSIS UMR 7296, 83957 La Garde</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <volume>82</volume>
      <issue>97</issue>
      <fpage>193</fpage>
      <lpage>204</lpage>
      <abstract>
        <p>Typical image classification pipeline for shallow architecture can be summarized by the following three main steps: i) a projection in high dimensional space of local features, ii) sparse constraints for the encoding scheme and iii) a pooling operation to obtain a global representation invariant to common transformation. Sparse Coding (SC) framework is one particular example of this general approach. The main problem raised by it is the local feature encoding which is done independently, loosing correlation of the input space. In this work we propose to simultaneously encode sparse codes to tackle this problem with Joint Sparse Coding (JSC) inspired by Graph regularized Sparse Coding (GSC). We experiment SC, GSC and JSC on UIUCsports and scenes15 database. We will show that results obtained, for UIUCsports, with SC (87:27 1:33), JSC (84:17 1:57) and the State-of-the-Art (88:47 2:32 [23]) are tackled by a simple fusion (95:37 1:29). Several assumptions will be advanced to explain this phenomenon which can't be generalized.</p>
      </abstract>
      <kwd-group>
        <kwd>Scenes categorization</kwd>
        <kwd>Sparse Coding</kwd>
        <kwd>Graph regularized Sparse Coding</kwd>
        <kwd>Dictionary Learning</kwd>
        <kwd>Scale Invariant Feature Transform</kwd>
        <kwd>Spatial Pyramid Matching</kwd>
        <kwd>Joint Sparse Coding</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        In the field of computer vision and signal processing, significant progress has been made
since the 2000s with more general methods such as Bag of Words (BoW) [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. We have
at our disposal a significant number of databases as, for example, UIUCsportss [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ],
scenes from 15 databases [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], where the goal is to label images into a finite number of
classes. The first way could be to evaluate the metric distance between two images.
Unfortunately, due to the high dimensionality of this input space, most of these distances
are concentrated into a sub-manifold whatever the image class, making the
discrimination by direct distances not robust. To overcome this problem, a solution has to be
designed to find a general application Y j(:; μ j) with parameter μ j which characterizes
the class C j satisfying:
dist(Y j(I1; μ j); Y j(I2; μ j)) ! 0 if I1 2 C j and I2 2 C j
dist(Y j(I1; μ j); Y j(I2; μ j)) ! ¥ if I1 2 C j and I2 2= C j;(1)
where I1 and I2 are two images. The choice of Y j represents a trade-off between its
representation capacity versus the μ j optimization difficulty. In general, in order to
estimate/optimize μ j, we have to start from a local representation (patches) x 2 Rd to
obtain the global representation Y j(:; μ j). From Y j associated to BoW, Sparse Coding
(SC) [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], up to ConvNet [
        <xref ref-type="bibr" rid="ref3 ref9">3, 9</xref>
        ] follow the three main procedures: i) high dimension
local feature projection, ii) sparsity constraints into the representation model and iii)
non-linearity operation and pooling to obtain a global invariant representation.
      </p>
      <p>
        In this article, we will focus on a new formulation of encoding method, which
corresponds more specifically to procedure ii), inspired by SC and more generally by Graph
regularized Sparse Coding (GSC) [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]. This new formulation allows to encode
simultaneously testing patches as with the GSC model which has good properties. Although
we will only work on a single layer, we will show that a simple fusion will allow to
improve considerably the classification accuracy and that our results will be close to
CNN (convolutional neural nets) [
        <xref ref-type="bibr" rid="ref18 ref6">6, 18</xref>
        ] initialized on Image Net as shown in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. This
article is divided into five parts. The first part focuses on SC models and its derivatives
(GSC especially). The second part presents our modeling Joint Sparse Coding (JSC).
The third part presents Graph regularized Sparse Coding (GSC) dictionary inspired by
[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. A fourth part presents results we obtained on UIUCsports and scenes15 databases
and in the last part, we conclude on our contribution.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related Works</title>
      <p>
        In this part, we will focus on the encoding step using linear coding to reconstruct
inputs. An approximation of any patches x 2 Rd can be given by xi = Dai, where
D , [d1; : : : ; dK ] 2 Rd K is a given/trained dictionary where 8k = 1; : : : ; K; kdkT dkk22 = 1
and dkj 0. A patch is a vector extracted from an image. A dictionary is a matrix of
“words” allowing the patch reconstruction. In many encoding methods, three common
steps can be found: i) a projection into a higher dimension space with (K &gt;&gt; d) ii)
sparse constraints and iii) a non-linear operation procedure. If ai is obtained with
Ordinary Least Square (OLS), the solution is full dense (all elements are non zero). One
way to get around this problem is the use of the `1-norm constraint which corresponds
to Lasso problem [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] or Basis Pursuit [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]:
      </p>
      <p>LSC(aijxi; D) = min
ai2RK 2</p>
      <p>kxi
1</p>
      <p>
        Daik22 + lkaik1;
(2)
with l the regularization parameter associated to the SC formulation. This parameter
controls the sparsity level as is shown in [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Thus, the more l is large, the more ai
(solution of eq.2) will be sparse.
      </p>
      <p>
        Usually in SC framework, if we take two neighbor patches xi and x j (with a strong
correlation between them), their respective sparse codes, ai and a j, can lose this strong
correlation, especially indexes of non-zero inputs can completely mismatch. It means
they are involving different atoms for their patches’ reconstructions. An atom is an
element of the vector patch. There exist some SC variations which have been introduced to
tackle this behaviour. Principles of this improvement can be divided into two categories:
one plays on adding of proximity constraint into the loss directly while the second adds
some extra terms into the regularization term. To illustrate the first category, we can
cite two approaches: Local Constrained linear Coding (LCC) [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] and the Local Sparse
Coding (LSC) [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. In the second category, we can mention GSC [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ].
      </p>
      <p>We will define the set of pre-computed sparse codes of Xtrain , f 1
xtrain; : : : ; xtNrtarianin g
by Atrain , f 1</p>
      <p>atrain; : : : ; atNrtarianin g where Ntrain designates the number of local features
sampled from the training set. Indeed, this adds a spatial constraint in the regularization
term. Its equation is:</p>
      <p>LGSC(aijxi; Atrain; D; l; b) = ami2iRnK kxi</p>
      <p>Daik22 + lkaik1 + bLiiaiT ai + 2baiT hi; (3)</p>
      <sec id="sec-2-1">
        <title>Ntrain</title>
        <p>where hi = å Li jatjrain, L = Li j i; j=1;:::;Ntrain is a Laplacian matrix and b a
reguj6=i
larization parameter. The matrix L is defined by L = S W, where W is a weight
matrix with and Wi; j = expf kxi xst2jraink22 g if xtjrain 2 V (xi) (where V (xi) is the set of
neighborhood of xi excluding xi itself), Wi; j = 0 else. The matrix S is diagonal and</p>
        <p>Ntrain
Si;i = å Wi; j. We propose to improve SC by simultaneously encoding all the test local
j=1
patches (for example associated with a test image). This new modeling will be inspired
from the GSC.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Joint Sparse Coding - JSC</title>
      <p>JSC principle is to jointly encode all local features Xtest = f 1
xtest ; : : : ; xtNetsetst g
simultaneously to overcome the decorrelation problem. We also enforce aik 0 in the previous
optimization problem. This additional constraint improves pooling performances, thus
avoiding to pool simultaneously on positive and negative sparse code values and
decreasing as a consequence the final size vector by a factor by two. The equation of our
modeling is very similar to GSC:
LJSC(aijxi; Atest ; D; l) = ami2iRnK kxi</p>
      <p>Daik22 + lkaik1 + bLiiaiT ai + 2baiT hi; s:t: aik</p>
      <sec id="sec-3-1">
        <title>Ntest</title>
        <p>where hi = å Li jatjest , L = Li j i; j=1;:::;Ntest is a Laplacian matrix, b a regularization
j6=i
parameter. Here, L = S W, where Wi; j = expf</p>
        <p>
          Ntest
else and Si;i = å Wi; j. Here, Atest , fat1est ; : : : ; atNetsetst g are computed and stacked
inij=1
tially. In practice Ntest &lt;&lt; Ntrain, so we need to store only a sparse K Ntest matrix.
kxi xtest 2
s2j k2 g if xtjest 2 V (xi), Wi; j = 0
0;
(4)
Our Laplacian matrix (Ntest Ntest ) is very sparse. If we don’t need to compute the
full matrix, one way is to only calculate the non-zero elements ((v + 1) Ntest ) with
the previous formulation. Each column of this ((v + 1) Ntest ) matrix is denoted by
Li. To realize this, we use a fast NN-search technical (FLANN) [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] which speeds up
the computation considerably. Thus, the solution of eq.4 is given by a modified
Feature Sign Search (FSS) algorithm [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] by adding a) a positivity constraint on sparse
codes and b) integrating the two right terms (in b) of eq.4 in the gradient
formulation used during the FSS algorithm. JSC is given by the algorithm 1. To illustrate the
Algorithm 1 Joint Sparse Coding
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Inputs: D, l, b, Xtest , s and v</title>
      <p>for i = 1 : Ntest do
[Vi, disti] = v-nn search of xtiest into Xtest
Vi are indexes of xi neighbors in Xtest</p>
      <p>Compute Li from disti and s
end for
Atest = lasso(Xtest ; D; l)
for i = 1 : Ntest do</p>
      <p>ai = JSC(xtiest ; Atest ; D; Li; Vi; l; b)
end for
Output: Atest
2 2 300 jå&lt;i [r(xi; x j)
Ñr = 300 299 i=å1 j=1
correlation problem, viewed with SC, we compare the normalized correlation computed
between two inputs vectors with the normalized correlation computed with their
respective output vectors. In this example, 300 different pairs, extracted from UIUCsports
local features, are chosen to realize this. The normalized correlation formulation between
x and y is given by r(x; y) = kxkx2Tkyyk2 2 [0; 1]: We also introduce the scalar value
r(ai; a j)]2 which measures the average quadratic
difference between normalized correlation of the input space and the output space. The
lower Ñr2 is the better. Table 1 summarizes our results including the sparsity
percentage. The last line presents r(ai; a j) correlation associated to output space, for a strong
correlation r(xi; x j) = 90% in input space. We note that the correlation gain is
accom</p>
      <p>Method
Level Sparsity</p>
      <p>Ñr2
r = 90%</p>
      <p>SC (0:2)
5:82%
126:75
31%</p>
      <p>GSC (0:4; 0:2)
9:36%
116:59
75%</p>
      <p>JSC (0:4; 0:2)
15:05%
81:83
63%</p>
      <p>GSC (0:2; 0:2)
17:66%
108:77
79%</p>
      <p>JSC (0:2; 0:2)
22:75%
73:35
70%
panied by a sparsity level drop. Thus, l is increasing sparsity while b is working in the
opposite direction.
4</p>
    </sec>
    <sec id="sec-5">
      <title>Dictionary Learning</title>
      <p>The analytical solution to update a dictionary D ,[d1; : : : ; dK ] off-line exists and it is
formulated as D = (XAT )(AAT ) 1, where A , faig; i = 1; : : : ; N and A 2 RK N . The
problems comes from the computation of (AAT ) 1. It is a matrix of size (K K) and
the computational complexity of this matrix inversion is in O(K3). Moreover, we have
to store the matrix A in central memory. Thus, we want efficient methods (in term of
complexity and memory occupation) to train such dictionaries under basis constraints.
One would minimize the regularized empirical risk Rn:</p>
      <p>RN (A; D) ,
1 N</p>
      <p>å l(xi; f (ai; D)) + G(A);</p>
      <p>N i=1
where f (ai; D) = Dai, l(:) is typically a quadratic loss function and G(:) represents the
regularization term (for example SC and GSC regularization terms). Eq. 5 would be
optimized iteratively by a (stochastic) gradient descent. Unfortunately, the problem is
not jointly convex but only conditionally convex. Alternatively, we can minimize:
1 N 1</p>
      <p>å</p>
      <p>N i=1 2
RN (AjDˆ ) ,
kxi</p>
      <p>k
Dˆ aik22 + G(ai); s:t: ai
1
and
5
5.1</p>
    </sec>
    <sec id="sec-6">
      <title>Experiments</title>
      <p>Metrics
1 N 1</p>
      <p>å</p>
      <p>N i=1 2
RN (Dj Aˆ) ,
kxi</p>
      <p>
        D aˆik22 s:t:kdkT dkk22 = 1 and dkj
0:
In order to obtain a suboptimal solution of eq. 5., eq. 6 can be solved efficiently in
parallel via SC/GSC procedures while eq. 7 can be solved by a constrained linear system
[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>In this section we present some results obtained with SC and GSC dictionaries when
we use SC and JSC for the encoding part. We fix the dictionary size to K = 1024
and a positivity constraint on dictionary columns and sparse codes are applied. The
regularization parameters are l = 0:2 for SC, (l = 0:4 ; b = 0:2) and (l = 0:2 ; b = 0:2)
for GSC and JSC for encoding part. Only the GSC (l = 0:2, b = 0:2) dictionary will be
used. We measure a classification rate given by a 1-vs-all approach thanks to a linear
Support Vector Machine (SVM). Its regularization parameter is fixed to C = 0:07. This
classification is made by an Average Overall Accuracy (AOA):</p>
      <p>AOA =
1 N ( 1 N</p>
      <p>å å d(yˆi;m
M m=1 N i=1</p>
      <p>
        )
yi;m) ;
(5)
(6)
(7)
(8)
where N represents the number of available data, d the loss function chosen (mean
square error), M, the number of cross validation and yˆi;m and yi;m, the true and predicted
label. We realize our experiments on UIUCsportss database [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and scenes15 database
[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. UIUCsportss database contains 1579 images from 8 different classes. The number
of images in each class varies from 137 to 250. We randomly select 70 images from each
class for training and 60 for testing. scenes15 database contains 4485 images belonging
to 15 different categories and the number of images per class varies between 200 to
400. 100 images are selected for training part and the others for testing part. In our
experiments, M = 10, NUIUCsports = 60 8 = 480 and Nscenes15 = 4485 15 100 =
2985. We extract densely SIFT patches (24 24) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] with a grey level and on one scale.
The grid size is 80 80 for UIUCsportss database and 30 30 for scenes15 database.
We apply a Spatial Pyramid Matching (SPM) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] which is defined on L levels. For
UIUCsportss, L = 2, thus pooling is performed on the entire image ((1 1) - first layer)
and the second layer on (2 2) grid with stride of 25%. For scenes15, L = 3, thus we
use (1 1), (2 2) and (4 4) sub-regions for SPM. We apply μ-pooling (μ = 2:5) for
the pooling step 1.
1 As remind, μ-pooling is written as f (v; w; μ) = åcm=1 wmvμm = wT vμ s:t:kwk22 = 1 and μ 6= 0,
where vμ = aμm ; m = 1; : : : ; c and wm encodes the contribution of the m-image location for
specific visual words [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
      </p>
      <sec id="sec-6-1">
        <title>XXXXDictionary</title>
        <p>Encoding XXX</p>
        <p>SC (0:2)
GSC (0:2; 0:2)</p>
        <p>SC (0:2)
gain (until +8 points) with SC dictionary. This is less significant with GSC (0.2,0.2)
dictionary where few relative gains are observed. For dictionary fusion, strong relative
PP Encoding
DictionParyPPPPP
fusion
16
14
12
%10
n
i
AO8
A
ifno 6
a
g
litveeR 24
a
0
−2
−4
gains are viewed for SC and the two GSC encoding models. There is no gain for the
two JSC encoding models. The best result is for SC dictionary and encoding with SC
and GSC (0.2,0.2) dictionary with SC encoding.
The table 5 summarizes our results: No gain is observed for this dataset. The best
re</p>
      </sec>
      <sec id="sec-6-2">
        <title>XXXXDictionary</title>
        <p>Encoding XXX
sults are for SC dictionary and encoding. Fusion results which follow, are summarized
in tables 6 and 7 which present fusion results obtained. Figures 4 and 5 illustrate the
previous tables respectively. We notice that the behaviour is inverted for the two
fuGSC (0.4,0.2)
JSC (0.4,0.2)
GSC (0.2,0.2)
JSC (0.2,0.2)
SC + GSC (0.4,0.2)
SC + JSC (0.4,0.2)
SC + GSC (0.2,0.2)
SC + JSC (0.2,0.2)
PP Encoding
DictionParyPPPPP
fusion</p>
        <p>SC</p>
        <p>GSC (0.4,0.2) JSC (0.4,0.2) GSC (0.2,0.2) JSC (0.2,0.2)
SC</p>
        <p>GSC (0.2,0.2)
sion cases. However, the deficits decrease with fusion and more specifically for GSC
(0.2,0.2) dictionary. For the dictionaries fusion, it is between the two models that we
obtain the most significant gain. The best result is for the couple (SC + GSC) dictionary
associated with SC encoding.
To go further more, we plot the accuracy for a weighted arithmetic fusion. In a first time,
the weights are the same for each classes and curves of figure 6 illustrate the weighted
arithmetic fusion (AOAarith =SC +(1 μ)AOAGSC ). We notice for UIUCsports, when
we use adapted coefficients with fusion, no improvement is observed and the accuracy
1.5
1
0.5
in% 0
A
fAO−0.5
o
n
iga −1
e
v
it
lea−1.5
R
−2
−2.5
−3</p>
        <p>UIUCsports
scenes15
0.1
0.2
0.3
0.4</p>
        <p>We0.i5ght
0.6
0.7
0.8
0.9
1
SC
decreases considerably for other couples. For scenes15, a very small improvement is
seen but it does not allow us to conclude to the real benefit of the method. Another
alternative would be to calculate others means as harmonic or energy means for
examples. Also, the considerable gain obtained with UIUCsports database can be explained
by putting forward two assumptions: the heterogeneity between images of training and
testing sets and the correlation conservation between the input and output space. The
study conducted so far shows that the second assumption is the one that goes in the right
direction.
6</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Conclusion</title>
      <p>Although the results obtained with GSC and JSC alone are not living up to our
expectations, we highlight the relevance of our proposal, thanks to the fusion procedure which</p>
      <p>
        Initial accuracy Blinded fusion Weighted fusion State-of-the-Art
87:27% 1:33 95:37% 1:29 95:37% 1:29 88:47 2:32 [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]
84:69% 0:6 84:66% 0:64 84:88% 0:55 81:04% 0:5 [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
      </p>
      <p>
        Table 8. Summarize of fusion results - details in Tables 2, 3, 4, 5, 6, 7.
greatly improves the State-of-the-Art for UIUCsports (88:47 2:32) of [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] (our
modeling: 95:37 1:29). A complete study must be realized with different couples (l; b)
for dictionary and encoding parts to find the right setting for UIUCsports and scenes15
databases. Also, the nature of the images is to be considerate and a study of the
heterogeneity level of images could be achieved [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] through the Shannon entropy measure.
However, we think that our modeling can be improved by three ways. The first will be
to get even better stabilized JSC results by adding an outer loop in the JSC algorithm.
After multiple stages, we can expect some improvements. The second is a direct
extension of the JSC by integrating some Laplacian regularization computed from a training
set of local features. Here, sparse codes will be reconstructed by simultaneously
minimize the deviation from both this training set and the image local features. The fusion
could be improved by weighted average fusion using statistic from code image. Finally,
it had been shown that adding some orthogonal constraints during the dictionary
learning process can improves results [
        <xref ref-type="bibr" rid="ref17 ref5">5, 17</xref>
        ]. Here, too, a full study should be conducted
with the two methods of sparse codes encoding.
      </p>
      <p>Acknowledgement. We thank Direction Ge´ne´rale de l’Armement (DGA) for a
financial support to this research. We thank Lucian ALECU for his comments.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>C.</given-names>
            <surname>Bauge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lagrange</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <article-title>Ande´n, and</article-title>
          <string-name>
            <given-names>S.</given-names>
            <surname>Mallat</surname>
          </string-name>
          .
          <article-title>Representing environmental sounds using the separable scattering transform</article-title>
          .
          <source>In ICASSP</source>
          , pages
          <fpage>8667</fpage>
          -
          <lpage>8671</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          .
          <article-title>Learning deep architectures for ai</article-title>
          .
          <source>Found. Trends Mach. Learn.</source>
          ,
          <volume>2</volume>
          (
          <issue>1</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>127</lpage>
          , Jan.
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>K.</given-names>
            <surname>Chatfield</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Simonyan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vedaldi</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Zisserman</surname>
          </string-name>
          .
          <article-title>Return of the devil in the details: Delving deep into convolutional nets</article-title>
          .
          <source>CoRR, abs/1405.3531</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. L.</given-names>
            <surname>Donoho</surname>
          </string-name>
          , Michael, and
          <string-name>
            <given-names>A.</given-names>
            <surname>Saunders</surname>
          </string-name>
          .
          <article-title>Atomic decomposition by basis pursuit</article-title>
          .
          <source>SIAM Journal on Scientific Computing</source>
          ,
          <volume>20</volume>
          :
          <fpage>33</fpage>
          -
          <lpage>61</lpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>A.</given-names>
            <surname>Cherian</surname>
          </string-name>
          .
          <article-title>Nearest neighbors using compact sparse codes</article-title>
          . In T. Jebara and
          <string-name>
            <surname>E. P.</surname>
          </string-name>
          Xing, editors,
          <source>Proceedings of the 31st International Conference on Machine Learning (ICML - 14)</source>
          , pages
          <fpage>1053</fpage>
          -
          <lpage>1061</lpage>
          . JMLR Worshop and Conference Proceedings,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>J.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Do</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Su</surname>
          </string-name>
          , and L.
          <string-name>
            <surname>Fei-Fei</surname>
          </string-name>
          .
          <article-title>Construction and Analysis of a Large Scale Image Ontology</article-title>
          .
          <source>Vision Sciences Society</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>J.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Tian</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Yan</surname>
          </string-name>
          .
          <article-title>Geometric `p-norm feature pooling for image classification</article-title>
          .
          <source>In CVPR</source>
          , pages
          <fpage>2697</fpage>
          -
          <lpage>2704</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>S.</given-names>
            <surname>Lazebnik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Schmid</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Ponce</surname>
          </string-name>
          .
          <article-title>Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories</article-title>
          .
          <source>In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision</source>
          and Pattern Recognition - Volume
          <volume>2</volume>
          , CVPR '
          <volume>06</volume>
          , pages
          <fpage>2169</fpage>
          -
          <lpage>2178</lpage>
          , Washington, DC, USA,
          <year>2006</year>
          . IEEE Computer Society.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Y.</given-names>
            <surname>LeCun</surname>
          </string-name>
          , K. Kavukcuoglu, and
          <string-name>
            <given-names>C.</given-names>
            <surname>Farabet</surname>
          </string-name>
          .
          <article-title>Convolutional networks and applications in vision</article-title>
          . In ISCAS, pages
          <fpage>253</fpage>
          -
          <lpage>256</lpage>
          . IEEE,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>H.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Battle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Raina</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. Y.</given-names>
            <surname>Ng</surname>
          </string-name>
          .
          <article-title>Efficient sparse coding algorithms</article-title>
          . In In NIPS, pages
          <fpage>801</fpage>
          -
          <lpage>808</lpage>
          . NIPS,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. L.
          <string-name>
            <surname>-J. Li</surname>
          </string-name>
          .
          <article-title>What, where and who? classifying event by scene and object recognition</article-title>
          .
          <source>In In IEEE International Conference on Computer Vision</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>D. G.</given-names>
            <surname>Lowe.</surname>
          </string-name>
          <article-title>Object recognition from local scale-invariant features</article-title>
          .
          <source>In Proceedings of the International Conference on Computer Vision-Volume 2 -</source>
          Volume 2, ICCV '
          <volume>99</volume>
          , pages
          <fpage>1150</fpage>
          -, Washington, DC, USA,
          <year>1999</year>
          . IEEE Computer Society.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>J. Mairal</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Bach</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Ponce</surname>
            , and
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Sapiro</surname>
          </string-name>
          .
          <article-title>Online dictionary learning for sparse coding</article-title>
          .
          <source>In Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09</source>
          , pages
          <fpage>689</fpage>
          -
          <lpage>696</lpage>
          , New York, NY, USA,
          <year>2009</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>M.</given-names>
            <surname>Muja</surname>
          </string-name>
          and
          <string-name>
            <given-names>D. G.</given-names>
            <surname>Lowe</surname>
          </string-name>
          .
          <article-title>Scalable nearest neighbor algorithms for high dimensional data</article-title>
          .
          <source>Pattern Analysis and Machine Intelligence</source>
          , IEEE Transactions on,
          <volume>36</volume>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>G. V.</given-names>
            <surname>Pendse</surname>
          </string-name>
          .
          <article-title>A tutorial on the lasso and the ”shooting algorithm”</article-title>
          .
          <source>Technical report</source>
          , P.A.I.N Group, Imaging and Analysis Group - McLean
          <string-name>
            <surname>Hospital</surname>
          </string-name>
          ,
          <source>Harvard Medical School, 8 February</source>
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <given-names>F.</given-names>
            <surname>Perronnin</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Sa</surname>
          </string-name>
          ´nchez, and
          <string-name>
            <given-names>T.</given-names>
            <surname>Mensink</surname>
          </string-name>
          .
          <article-title>Improving the fisher kernel for large-scale image classification</article-title>
          .
          <source>In Proceedings of the 11th European Conference on Computer Vision: Part IV, ECCV'10</source>
          , pages
          <fpage>143</fpage>
          -
          <lpage>156</lpage>
          , Berlin, Heidelberg,
          <year>2010</year>
          . Springer-Verlag.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17. I.
          <string-name>
            <surname>Ramirez</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Lecumberry</surname>
            , and
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Sapiro</surname>
          </string-name>
          .
          <article-title>Universal priors for sparse modeling</article-title>
          .
          <source>In Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP)</source>
          ,
          <year>2009</year>
          3rd IEEE International Workshop on, pages
          <fpage>197</fpage>
          -
          <lpage>200</lpage>
          ,
          <year>Dec 2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <given-names>O.</given-names>
            <surname>Russakovsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Krause</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Satheesh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Karpathy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Khosla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bernstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. C.</given-names>
            <surname>Berg</surname>
          </string-name>
          , and L.
          <string-name>
            <surname>Fei-Fei</surname>
          </string-name>
          .
          <article-title>Imagenet large scale visual recognition challenge</article-title>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <given-names>J.</given-names>
            <surname>Sivic</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Zisserman. Video Google</surname>
          </string-name>
          :
          <article-title>A text retrieval approach to object matching in videos</article-title>
          .
          <source>In Proceedings of the International Conference on Computer Vision</source>
          , volume
          <volume>2</volume>
          , pages
          <fpage>1470</fpage>
          -
          <lpage>1477</lpage>
          , Oct.
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>J. J. Thiagarajan</surname>
            ,
            <given-names>K. N.</given-names>
          </string-name>
          <string-name>
            <surname>Ramamurthy</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Spanias</surname>
          </string-name>
          .
          <article-title>Local Sparse Coding for Image Classification and Retrieval</article-title>
          .
          <source>Technical report</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <given-names>R.</given-names>
            <surname>Tibshirani</surname>
          </string-name>
          .
          <article-title>Regression shrinkage and selection via the lasso</article-title>
          .
          <source>Journal of the Royal Statistical Society</source>
          , Series B,
          <volume>58</volume>
          :
          <fpage>267</fpage>
          -
          <lpage>288</lpage>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <given-names>S.</given-names>
            <surname>Tollari</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Glotin</surname>
          </string-name>
          .
          <article-title>Lda versus mmd approximation on mislabeled images for keyword dependant selection of visual features and their heterogeneity</article-title>
          .
          <source>In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)</source>
          , volume II, pages
          <fpage>413</fpage>
          -
          <lpage>416</lpage>
          , may
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Bai</surname>
          </string-name>
          , W. Liu, and
          <string-name>
            <given-names>Z.</given-names>
            <surname>Tu</surname>
          </string-name>
          .
          <article-title>Max-margin multiple-instance dictionary learning</article-title>
          . In S. Dasgupta and
          <string-name>
            <given-names>D.</given-names>
            <surname>Mcallester</surname>
          </string-name>
          , editors,
          <source>Proceedings of the 30th International Conference on Machine Learning (ICML-13)</source>
          , volume
          <volume>28</volume>
          , pages
          <fpage>846</fpage>
          -
          <lpage>854</lpage>
          . JMLR Workshop and Conference Proceedings, May
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <given-names>B.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Song</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Tao</surname>
          </string-name>
          .
          <article-title>Large-scale dictionary learning for local coordinate coding</article-title>
          .
          <source>In Proceedings of the British Machine Vision Conference</source>
          , pages
          <fpage>36</fpage>
          .
          <fpage>1</fpage>
          -
          <issue>36</issue>
          .9. BMVA Press,
          <year>2010</year>
          . doi:
          <volume>10</volume>
          .5244/C.24.36.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>M. Zheng</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Bu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
            , G. Qiu, and
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Cai</surname>
          </string-name>
          .
          <article-title>Graph regularized sparse coding for image representation</article-title>
          .
          <source>IEEE Transaction on Image Processing</source>
          ,
          <volume>20</volume>
          (
          <issue>5</issue>
          ):
          <fpage>1327</fpage>
          -
          <lpage>1336</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>