<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">New Graph regularized Sparse Coding Improving Automatic Image Annotation</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Céline</forename><surname>Rabouy</surname></persName>
							<email>celine.rabouy@lsis.org</email>
							<affiliation key="aff0">
								<orgName type="laboratory">LSIS UMR 7296</orgName>
								<orgName type="institution" key="instit1">Aix-Marseille Université</orgName>
								<orgName type="institution" key="instit2">CNRS</orgName>
								<orgName type="institution" key="instit3">ENSAM</orgName>
								<address>
									<postCode>13397</postCode>
									<settlement>Marseille</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="laboratory">LSIS UMR 7296</orgName>
								<orgName type="institution" key="instit1">Université de Toulon</orgName>
								<orgName type="institution" key="instit2">CNRS</orgName>
								<address>
									<postCode>83957</postCode>
									<settlement>La Garde</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Sébastien</forename><surname>Paris</surname></persName>
							<email>sebastien.paris@lsis.org</email>
							<affiliation key="aff0">
								<orgName type="laboratory">LSIS UMR 7296</orgName>
								<orgName type="institution" key="instit1">Aix-Marseille Université</orgName>
								<orgName type="institution" key="instit2">CNRS</orgName>
								<orgName type="institution" key="instit3">ENSAM</orgName>
								<address>
									<postCode>13397</postCode>
									<settlement>Marseille</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="laboratory">LSIS UMR 7296</orgName>
								<orgName type="institution" key="instit1">Université de Toulon</orgName>
								<orgName type="institution" key="instit2">CNRS</orgName>
								<address>
									<postCode>83957</postCode>
									<settlement>La Garde</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Hervé</forename><surname>Glotin</surname></persName>
							<email>glotin@univ-tln.fr</email>
							<affiliation key="aff0">
								<orgName type="laboratory">LSIS UMR 7296</orgName>
								<orgName type="institution" key="instit1">Aix-Marseille Université</orgName>
								<orgName type="institution" key="instit2">CNRS</orgName>
								<orgName type="institution" key="instit3">ENSAM</orgName>
								<address>
									<postCode>13397</postCode>
									<settlement>Marseille</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="laboratory">LSIS UMR 7296</orgName>
								<orgName type="institution" key="instit1">Université de Toulon</orgName>
								<orgName type="institution" key="instit2">CNRS</orgName>
								<address>
									<postCode>83957</postCode>
									<settlement>La Garde</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
							<affiliation key="aff2">
								<orgName type="institution">Institut Universitaire de France</orgName>
								<address>
									<postCode>75005</postCode>
									<settlement>Paris</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">New Graph regularized Sparse Coding Improving Automatic Image Annotation</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">4473F52DBF5D6B943BEA48AEF27A8E06</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T23:37+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Scenes categorization</term>
					<term>Sparse Coding</term>
					<term>Graph regularized Sparse Coding</term>
					<term>Dictionary Learning</term>
					<term>Scale Invariant Feature Transform</term>
					<term>Spatial Pyramid Matching</term>
					<term>Joint Sparse Coding</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Typical image classification pipeline for shallow architecture can be summarized by the following three main steps: i) a projection in high dimensional space of local features, ii) sparse constraints for the encoding scheme and iii) a pooling operation to obtain a global representation invariant to common transformation. Sparse Coding (SC) framework is one particular example of this general approach. The main problem raised by it is the local feature encoding which is done independently, loosing correlation of the input space. In this work we propose to simultaneously encode sparse codes to tackle this problem with Joint Sparse Coding (JSC) inspired by Graph regularized Sparse Coding (GSC). We experiment SC, GSC and JSC on UIUCsports and scenes15 database. We will show that results obtained, for UIUCsports, with SC (87.27 ± 1.33), JSC (84.17 ± 1.57) and the State-of-the-Art (88.47 ± 2.32 <ref type="bibr" target="#b22">[23]</ref>) are tackled by a simple fusion (95.37 ± 1.29). Several assumptions will be advanced to explain this phenomenon which can't be generalized.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>In the field of computer vision and signal processing, significant progress has been made since the 2000s with more general methods such as Bag of Words (BoW) <ref type="bibr" target="#b18">[19]</ref>. We have at our disposal a significant number of databases as, for example, UIUCsportss <ref type="bibr" target="#b10">[11]</ref>, scenes from 15 databases <ref type="bibr" target="#b7">[8]</ref>, where the goal is to label images into a finite number of classes. The first way could be to evaluate the metric distance between two images. Unfortunately, due to the high dimensionality of this input space, most of these distances are concentrated into a sub-manifold whatever the image class, making the discrimination by direct distances not robust. To overcome this problem, a solution has to be designed to find a general application Ψ j (.; µ j ) with parameter µ j which characterizes the class C j satisfying: dist(Ψ j (I 1 ; µ j ), Ψ j (I 2 ; µ j )) → 0 if I 1 ∈ C j and I 2 ∈ C j dist(Ψ j (I 1 ; µ j ), Ψ j (I 2 ; µ j )) → ∞ if I 1 ∈ C j and I 2 / ∈ C j ,</p><p>where I 1 and I 2 are two images. The choice of Ψ j represents a trade-off between its representation capacity versus the µ j optimization difficulty. In general, in order to estimate/optimize µ j , we have to start from a local representation (patches) x ∈ R d to obtain the global representation Ψ j (.; µ j ). From Ψ j associated to BoW, Sparse Coding (SC) <ref type="bibr" target="#b20">[21]</ref>, up to ConvNet <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b8">9]</ref> follow the three main procedures: i) high dimension local feature projection, ii) sparsity constraints into the representation model and iii) non-linearity operation and pooling to obtain a global invariant representation.</p><p>In this article, we will focus on a new formulation of encoding method, which corresponds more specifically to procedure ii), inspired by SC and more generally by Graph regularized Sparse Coding (GSC) <ref type="bibr" target="#b24">[25]</ref>. This new formulation allows to encode simultaneously testing patches as with the GSC model which has good properties. Although we will only work on a single layer, we will show that a simple fusion will allow to improve considerably the classification accuracy and that our results will be close to CNN (convolutional neural nets) <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b17">18]</ref> initialized on Image Net as shown in <ref type="bibr" target="#b2">[3]</ref>. This article is divided into five parts. The first part focuses on SC models and its derivatives (GSC especially). The second part presents our modeling Joint Sparse Coding (JSC). The third part presents Graph regularized Sparse Coding (GSC) dictionary inspired by <ref type="bibr" target="#b12">[13]</ref>. A fourth part presents results we obtained on UIUCsports and scenes15 databases and in the last part, we conclude on our contribution.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Related Works</head><p>In this part, we will focus on the encoding step using linear coding to reconstruct inputs. An approximation of any patches x ∈ R d can be given by x i = Dα i , where</p><formula xml:id="formula_1">D [d 1 , . . . , d K ] ∈ R d×K is a given/trained dictionary where ∀k = 1, . . . , K, d T k d k 2 2 = 1 and d j k ≥ 0.</formula><p>A patch is a vector extracted from an image. A dictionary is a matrix of "words" allowing the patch reconstruction. In many encoding methods, three common steps can be found: i) a projection into a higher dimension space with (K &gt;&gt; d) ii) sparse constraints and iii) a non-linear operation procedure. If α * i is obtained with Ordinary Least Square (OLS), the solution is full dense (all elements are non zero). One way to get around this problem is the use of the 1 -norm constraint which corresponds to Lasso problem <ref type="bibr" target="#b20">[21]</ref> or Basis Pursuit <ref type="bibr" target="#b3">[4]</ref>:</p><formula xml:id="formula_2">L SC (α i |x i ; D) = min α i ∈R K 1 2 x i − Dα i 2 2 + λ α i 1 ,<label>(2)</label></formula><p>with λ the regularization parameter associated to the SC formulation. This parameter controls the sparsity level as is shown in <ref type="bibr" target="#b14">[15]</ref>. Thus, the more λ is large, the more α * i (solution of eq.2) will be sparse.</p><p>Usually in SC framework, if we take two neighbor patches x i and x j (with a strong correlation between them), their respective sparse codes, α i and α j , can lose this strong correlation, especially indexes of non-zero inputs can completely mismatch. It means they are involving different atoms for their patches' reconstructions. An atom is an element of the vector patch. There exist some SC variations which have been introduced to tackle this behaviour. Principles of this improvement can be divided into two categories: one plays on adding of proximity constraint into the loss directly while the second adds some extra terms into the regularization term. To illustrate the first category, we can cite two approaches: Local Constrained linear Coding (LCC) <ref type="bibr" target="#b23">[24]</ref> and the Local Sparse Coding (LSC) <ref type="bibr" target="#b19">[20]</ref>. In the second category, we can mention GSC <ref type="bibr" target="#b24">[25]</ref>.</p><p>We will define the set of pre-computed sparse codes of X </p><formula xml:id="formula_3">L GSC (α i |x i , A train ; D, λ, β) = min α i ∈R K x i − Dα i 2 2 + λ α i 1 + βL ii α T i α i + 2βα T i h i ,<label>(3)</label></formula><p>where</p><formula xml:id="formula_4">h i = N train ∑ j =i L i j α train j</formula><p>, L = L i j i, j=1,...,N train is a Laplacian matrix and β a regularization parameter. The matrix L is defined by L = S − W, where W is a weight matrix with and W i, j = exp{−</p><formula xml:id="formula_5">x i −x train j 2 2 σ 2 } if x train j ∈ V (x i ) (where V (x i )</formula><p>is the set of neighborhood of x i excluding x i itself), W i, j = 0 else. The matrix S is diagonal and</p><formula xml:id="formula_6">S i,i = N train ∑ j=1 W i, j .</formula><p>We propose to improve SC by simultaneously encoding all the test local patches (for example associated with a test image). This new modeling will be inspired from the GSC.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Joint Sparse Coding -JSC</head><p>JSC principle is to jointly encode all local features X test = {x test 1 , . . . , x test N test } simultaneously to overcome the decorrelation problem. We also enforce α k i ≥ 0 in the previous optimization problem. This additional constraint improves pooling performances, thus avoiding to pool simultaneously on positive and negative sparse code values and decreasing as a consequence the final size vector by a factor by two. The equation of our modeling is very similar to GSC:</p><formula xml:id="formula_7">L JSC (α i |x i , A test ; D, λ) = min α i ∈R K x i − Dα i 2 2 + λ α i 1 + βL ii α T i α i + 2βα T i h i , s.t. α k i ≥ 0,<label>(4)</label></formula><p>where</p><formula xml:id="formula_8">h i = N test ∑ j =i L i j α test j , L = L i j i, j=1,...,N test is a Laplacian matrix, β a regularization parameter. Here, L = S − W, where W i, j = exp{− x i −x test j 2 2 σ 2 } if x test j ∈ V (x i ), W i, j = 0 else and S i,i = N test ∑ j=1 W i, j .</formula><p>Here, A test {α test 1 , . . . , α test N test } are computed and stacked initially. In practice N test &lt;&lt; N train , so we need to store only a sparse K × N test matrix.</p><p>Our Laplacian matrix (N test × N test ) is very sparse. If we don't need to compute the full matrix, one way is to only calculate the non-zero elements ((v + 1) × N test ) with the previous formulation. Each column of this ((v + 1) × N test ) matrix is denoted by L i . To realize this, we use a fast NN-search technical (FLANN) <ref type="bibr" target="#b13">[14]</ref> which speeds up the computation considerably. Thus, the solution of eq.4 is given by a modified Feature Sign Search (FSS) algorithm <ref type="bibr" target="#b9">[10]</ref> by adding a) a positivity constraint on sparse codes and b) integrating the two right terms (in β) of eq.4 in the gradient formulation used during the FSS algorithm. JSC is given by the algorithm 1. To illustrate the Algorithm 1 Joint Sparse Coding</p><formula xml:id="formula_9">Inputs: D, λ, β, X test , σ and v for i = 1 : N test do [V i , dist i ] = v-nn search of x test i into X test V i are indexes of x i neighbors in X test Compute L i from dist i and σ end for A test = lasso(X test ; D, λ) for i = 1 : N test do α i = JSC(x test i , A test , D, L i , V i , λ, β) end for Output: A test</formula><p>correlation problem, viewed with SC, we compare the normalized correlation computed between two inputs vectors with the normalized correlation computed with their respective output vectors. In this example, 300 different pairs, extracted from UIUCsports local features, are chosen to realize this. The normalized correlation formulation between x and y is given by ρ(x, y) = x T y x 2 y 2 ∈ [0, 1]. We also introduce the scalar value 2 which measures the average quadratic difference between normalized correlation of the input space and the output space. The lower ∇ρ 2 is the better. Table <ref type="table" target="#tab_1">1</ref> summarizes our results including the sparsity percentage. The last line presents ρ(α i , α j ) correlation associated to output space, for a strong correlation ρ(x i , x j ) = 90% in input space. We note that the correlation gain is accom- panied by a sparsity level drop. Thus, λ is increasing sparsity while β is working in the opposite direction.</p><formula xml:id="formula_10">∇ρ 2 = 2 300×299 300 ∑ i=1 j&lt;i ∑ j=1 [ρ(x i , x j ) − ρ(α i , α j )]</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Dictionary Learning</head><p>The analytical solution to update a dictionary D [d 1 , . . . , d K ] off-line exists and it is formulated as D = (XA T )(AA T ) −1 , where A {α i }, i = 1, . . . , N and A ∈ R K×N . The problems comes from the computation of (AA T ) −1 . It is a matrix of size (K × K) and the computational complexity of this matrix inversion is in O(K 3 ). Moreover, we have to store the matrix A in central memory. Thus, we want efficient methods (in term of complexity and memory occupation) to train such dictionaries under basis constraints.</p><p>One would minimize the regularized empirical risk R n :</p><p>R N (A, D)</p><formula xml:id="formula_11">1 N N ∑ i=1 l(x i ; f (α i , D)) + Γ(A),<label>(5)</label></formula><p>where f (α i , D) = Dα i , l(.) is typically a quadratic loss function and Γ(.) represents the regularization term (for example SC and GSC regularization terms). Eq. 5 would be optimized iteratively by a (stochastic) gradient descent. Unfortunately, the problem is not jointly convex but only conditionally convex. Alternatively, we can minimize:</p><p>R N (A| D)</p><formula xml:id="formula_12">1 N N ∑ i=1 1 2 x i − Dα i 2 2 + Γ(α i ), s.t. α k i ≥ 1<label>(6)</label></formula><p>and</p><formula xml:id="formula_13">R N (D| Â) 1 N N ∑ i=1 1 2 x i − D αi 2 2 s.t. d T k d k 2 2 = 1 and d j k ≥ 0.<label>(7)</label></formula><p>In order to obtain a suboptimal solution of eq. 5., eq. 6 can be solved efficiently in parallel via SC/GSC procedures while eq. 7 can be solved by a constrained linear system <ref type="bibr" target="#b12">[13]</ref>.</p><p>5 Experiments</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">Metrics</head><p>In this section we present some results obtained with SC and GSC dictionaries when we use SC and JSC for the encoding part. We fix the dictionary size to K = 1024 and a positivity constraint on dictionary columns and sparse codes are applied. The regularization parameters are λ = 0.2 for SC, (λ = 0.4 ; β = 0.2) and (λ = 0.2 ; β = 0.2) for GSC and JSC for encoding part. Only the GSC (λ = 0.2, β = 0.2) dictionary will be used. We measure a classification rate given by a 1-vs-all approach thanks to a linear Support Vector Machine (SVM). Its regularization parameter is fixed to C = 0.07. This classification is made by an Average Overall Accuracy (AOA):</p><formula xml:id="formula_14">AOA = 1 M N ∑ m=1 1 N N ∑ i=1 δ( ŷi,m − y i,m ) ,<label>(8)</label></formula><p>where N represents the number of available data, δ the loss function chosen (mean square error), M, the number of cross validation and ŷi,m and y i,m , the true and predicted label. We realize our experiments on UIUCsportss database <ref type="bibr" target="#b10">[11]</ref> and scenes15 database We extract densely SIFT patches (24×24) <ref type="bibr" target="#b11">[12]</ref> with a grey level and on one scale. The grid size is 80 × 80 for UIUCsportss database and 30 × 30 for scenes15 database. We apply a Spatial Pyramid Matching (SPM) <ref type="bibr" target="#b7">[8]</ref> which is defined on L levels. For UIUCsportss, L = 2, thus pooling is performed on the entire image ((1 × 1) -first layer) and the second layer on (2 × 2) grid with stride of 25%. For scenes15, L = 3, thus we use (1 × 1), (2 × 2) and (4 × 4) sub-regions for SPM. We apply µ-pooling (µ = 2.5) for the pooling step 1 .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">Results on UIUCsports</head><p>Table <ref type="table" target="#tab_2">2</ref> summarizes obtained results. We observe different behaviours. If we focus on encoding part variations (horizontal reading), we see that for all dictionaries choices, SC encoding is the best. Any gain is viewed for the others and a similar behavior is obtained if we read the table vertically. To go further more, in order to evaluate if SC and JSC models are complementary, we measure the accuracy of the arithmetic and geometric means of their estimates (AOA arithmetic and AOA geometric). AOA arithmetic is defined as the sum of probabilities of two selected models and AOA geometric as the 1 As remind, µ-pooling is written as  square root of the product of two selected models. Tables <ref type="table" target="#tab_4">3 and 4</ref>, associated to figures 2 and 3 respectively (only the arithmetic fusion is showed here, because geometric fusion is lower than the first), summarize results obtained with initial models and their associated fusion.  Fig. <ref type="figure" target="#fig_1">2</ref>. Benefits and deficits obtained with GSC, JSC and arithmetic fusions encodings compared to SC encoding for the three different dictionaries for UIUCsports database. gain (until +8 points) with SC dictionary. This is less significant with GSC (0.2,0.2) dictionary where few relative gains are observed. For dictionary fusion, strong relative  <ref type="table">4</ref>. Evolution of the arithmetic and geometric Accuracy for UIUCsportss database (dictionary fusion). The best result is obtained with the couple SC and GSC (0.2,0.2) dictionaries associated with SC encoding.An illustration is given in figure <ref type="figure">3</ref> SC GSC (0.4,0. Relative gain of AOA in % GSC (0.2,0.2) SC + GSC (0.2,0.2) Fig. <ref type="figure">3</ref>. Beneficits and deficits obtained with GSC and arithmetic fusions dictionaries compared to SC dictionary with five different encoding method choices for UIUCsports database. gains are viewed for SC and the two GSC encoding models. There is no gain for the two JSC encoding models. The best result is for SC dictionary and encoding with SC and GSC (0.2,0.2) dictionary with SC encoding.</p><formula xml:id="formula_15">f (v; w, µ) = ∑ c m=1 w m v µ m = w T v µ s.t.</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3">Results on scenes15</head><p>The table <ref type="table" target="#tab_6">5</ref>   <ref type="table">7</ref>. Evolution of the arithmetic and geometric Accuracy for scenes15 database (dictionary fusion). The best result is obtained with the couple SC and GSC (0.2,0.2) dictionaries associated with SC encoding.An illustration is given in figure <ref type="figure">5</ref> sion cases. However, the deficits decrease with fusion and more specifically for GSC (0.2,0.2) dictionary. For the dictionaries fusion, it is between the two models that we obtain the most significant gain. The best result is for the couple (SC + GSC) dictionary associated with SC encoding.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.4">Weighted fusion</head><p>To go further more, we plot the accuracy for a weighted arithmetic fusion. In a first time, the weights are the same for each classes and curves of figure <ref type="figure">6</ref> illustrate the weighted arithmetic fusion (AOA arith = SC +(1 − µ)AOA GSC ). We notice for UIUCsports, when we use adapted coefficients with fusion, no improvement is observed and the accuracy Relative gain of AOA in % GSC (0.2,0.2) SC + GSC (0.2,0.2) Fig. <ref type="figure">5</ref>. Beneficits and deficits obtained with GSC and arithmetic fusions dictionaries compared to SC dictionary with five different encoding method choices for scenes15 database. Fig. <ref type="figure">6</ref>. Evolution of the accuracy with different coefficient. The first point corresponds to the chosen model for fusion and the last point is the SC model. Notice the best result for UIUCsports is obtained with a coefficient of 0.5, and for scenes15, it is 0.8 for SC and 0.2 for GSC (0.2,0.2) dictionary associated with SC. For these two examples, the fusion is between SC (dictionary and encoding) and GSC (0.2,0.2) dictionary with SC encoding.</p><p>decreases considerably for other couples. For scenes15, a very small improvement is seen but it does not allow us to conclude to the real benefit of the method. Another alternative would be to calculate others means as harmonic or energy means for examples. Also, the considerable gain obtained with UIUCsports database can be explained by putting forward two assumptions: the heterogeneity between images of training and testing sets and the correlation conservation between the input and output space. The study conducted so far shows that the second assumption is the one that goes in the right direction.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Conclusion</head><p>Although the results obtained with GSC and JSC alone are not living up to our expectations, we highlight the relevance of our proposal, thanks to the fusion procedure which greatly improves the State-of-the-Art for UIUCsports (88.47 ± 2.32) of <ref type="bibr" target="#b22">[23]</ref> (our modeling: 95.37 ± 1.29). A complete study must be realized with different couples (λ, β) for dictionary and encoding parts to find the right setting for UIUCsports and scenes15 databases. Also, the nature of the images is to be considerate and a study of the heterogeneity level of images could be achieved <ref type="bibr" target="#b21">[22]</ref> through the Shannon entropy measure. However, we think that our modeling can be improved by three ways. The first will be to get even better stabilized JSC results by adding an outer loop in the JSC algorithm.</p><p>After multiple stages, we can expect some improvements. The second is a direct extension of the JSC by integrating some Laplacian regularization computed from a training set of local features. Here, sparse codes will be reconstructed by simultaneously minimize the deviation from both this training set and the image local features. The fusion could be improved by weighted average fusion using statistic from code image. Finally, it had been shown that adding some orthogonal constraints during the dictionary learning process can improves results <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b16">17]</ref>. Here, too, a full study should be conducted with the two methods of sparse codes encoding.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>[ 8 ]Fig. 1 .</head><label>81</label><figDesc>Fig. 1. UIUCsports dataset (left) -scenes15 (right)</figDesc><graphic coords="6,158.58,235.78,298.20,112.88" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>w 2 2 =</head><label>2</label><figDesc>1 and µ = 0, where v µ = α µ m , m = 1, . . . , c and w m encodes the contribution of the m-image location for specific visual words<ref type="bibr" target="#b6">[7]</ref> </figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 1 .</head><label>1</label><figDesc>∇ρ2 and correlation ρ = 90%, as an example of strong correlation, for SC, GSC and JSC for two couples (λ, β), on testing patches. The lower ∇ρ</figDesc><table><row><cell>Method</cell><cell>SC (0.2)</cell><cell>GSC (0.4, 0.2)</cell><cell>JSC (0.4, 0.2)</cell><cell>GSC (0.2, 0.2)</cell><cell>JSC (0.2, 0.2)</cell></row><row><cell>Level Sparsity</cell><cell>5.82%</cell><cell>9.36%</cell><cell>15.05%</cell><cell>17.66%</cell><cell>22.75%</cell></row><row><cell>∇ρ 2</cell><cell>126.75</cell><cell>116.59</cell><cell>81.83</cell><cell>108.77</cell><cell>73.35</cell></row><row><cell>ρ = 90%</cell><cell>31%</cell><cell>75%</cell><cell>63%</cell><cell>79%</cell><cell>70%</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell cols="2">2 is obtained for JSC (0.2, 0.2)and</cell></row><row><cell cols="6">the best result for correlation parameter ρ is for GSC (0.2, 0.2), however, the low sparsity level is</cell></row><row><cell>obtained for SC.</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 2 .</head><label>2</label><figDesc>Evolution of the Average Overall Accuracy for UIUCsports database. The best result is obtained with the couple SC dictionary and SC encoding</figDesc><table><row><cell>X X Encoding X X X X X Dictionary</cell><cell>SC (0.2)</cell><cell>(0.4, 0.2) GSC</cell><cell>JSC (0.4, 0.2)</cell><cell>(0.2, 0.2) GSC</cell><cell>JSC (0.2, 0.2)</cell></row><row><cell>SC (0.2)</cell><cell cols="3">87.27 ± 1.33 80.75 ± 1.69 83.6 ± 1.66</cell><cell>80 ± 2.01</cell><cell>84.17 ± 1.57</cell></row><row><cell>GSC (0.2, 0.2)</cell><cell cols="5">84.81 ± 1.87 80.71 ± 2.05 81.6 ± 1.77 80.92 ± 2.15 84.17 ± 1.02</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 3</head><label>3</label><figDesc></figDesc><table><row><cell></cell><cell></cell><cell cols="5">corresponds to a horizontal reading (encoding fusion) and table 4 to</cell></row><row><cell cols="7">a vertical reading (dictionary fusion) for UIUCsports. We notice an important relative</cell></row><row><cell>P P P P Dictionary</cell><cell cols="2">Encoding fusion P P P P</cell><cell>SC + GSC (0.4,0.2)</cell><cell>SC + JSC (0.4,0.2)</cell><cell>SC + GSC (0.2,0.2)</cell><cell>SC + JSC (0.2,0.2)</cell></row><row><cell>SC</cell><cell></cell><cell>AOA arithmetic</cell><cell>94.31 ± 1.28</cell><cell>94.77 ± 1.31</cell><cell>94.23 ± 1.3</cell><cell>94.94 ± 1.05</cell></row><row><cell></cell><cell></cell><cell>AOA geometric</cell><cell>93.33 ± 1.23</cell><cell>93.94 ± 1.19</cell><cell>93.37 ± 1.22</cell><cell>94.19 ± 1.2</cell></row><row><cell cols="2">GSC</cell><cell>AOA arithmetic</cell><cell>84.42 ± 1.5</cell><cell>85.08 ± 1.67</cell><cell>84.37 ± 1.51</cell><cell>85.12 ± 1.62</cell></row><row><cell cols="2">(0.2,0.2)</cell><cell>AOA geometric</cell><cell>84.48 ± 1.52</cell><cell>84.9 ± 1.62</cell><cell>84.5 ± 1.65</cell><cell>84.98 ± 1.61</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 3 .</head><label>3</label><figDesc>Evolution of the arithmetic and geometric Accuracy for UIUCsportss database (encoding fusion). The best result is obtained with the couple SC dictionary associated with SC and JSC (0.2,0.2) encodings. An illustration is given in figure2.</figDesc><table><row><cell></cell><cell>10</cell><cell></cell></row><row><cell></cell><cell></cell><cell></cell><cell>GSC (0.4,0.2)</cell></row><row><cell></cell><cell>8</cell><cell></cell><cell>JSC (0.4,0.2) GSC (0.2,0.2)</cell></row><row><cell></cell><cell>6</cell><cell></cell><cell>JSC (0.2,0.2) SC + GSC (0.4,0.2)</cell></row><row><cell>Relative gain of AOA in %</cell><cell>−4 −2 0 2 4</cell><cell></cell><cell>SC + JSC (0.4,0.2) SC + GSC (0.2,0.2) SC + JSC (0.2,0.2)</cell></row><row><cell></cell><cell>−6</cell><cell></cell></row><row><cell></cell><cell>−8</cell><cell></cell></row><row><cell></cell><cell>−10</cell><cell>SC</cell><cell>GSC (0.2,0.2)</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head></head><label></label><figDesc>± 1.15 92.31 ± 1.42 83.89 ± 1.29 91.21 ± 1.58 84.31 ± 1.57</figDesc><table><row><cell>P fusion P P P P Encoding P P P Dictionary</cell><cell>SC</cell><cell>GSC (0.4,0.2) JSC (0.4,0.2) GSC (0.2,0.2) JSC (0.2,0.2)</cell></row><row><cell>AOA</cell><cell cols="2">95.37 ± 1.29 92.56 ± 1.11 83.33 ± 1.36 92.46 ± 1.15 84.25 ± 1.22</cell></row><row><cell cols="2">arithmetic AOA Geometric 94.62 Table SC+GSC(0.2,0.2)</cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_6"><head>Table 5 .</head><label>5</label><figDesc>summarizes our results: No gain is observed for this dataset. The best re-Evolution of the Average Overall Accuracy for scenes15 database. The best result is obtained with the couple SC dictionary and SC encoding sults are for SC dictionary and encoding. Fusion results which follow, are summarized in tables 6 and 7 which present fusion results obtained. Figures4 and 5illustrate the previous tables respectively. We notice that the behaviour is inverted for the two fu-</figDesc><table><row><cell>X X Encoding X X X X X Dictionary</cell><cell>SC (0.2)</cell><cell>(0.4, 0.2) GSC</cell><cell>JSC (0.4, 0.2)</cell><cell>(0.2, 0.2) GSC</cell><cell>JSC (0.2, 0.2)</cell></row><row><cell>SC (0.2)</cell><cell>84.69 ± 0.6</cell><cell cols="4">80.31 ± 0.6 80.82 ± 0.63 80.59 ± 0.64 81.47 ± 0.47</cell></row><row><cell>GSC (0.2, 0.2)</cell><cell cols="5">83.35 ± 0.59 78.79 ± 0.66 78.4 ± 0.79 79.06 ± 0.62 80.81 ± 0.66</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_7"><head>Table 6 .</head><label>6</label><figDesc>Evolution of the arithmetic and geometric Accuracy for scenes15 database (encoding fusion). No results improve tha of SC. An illustration is given in figure4. Fig.4. Benefits and deficits obtained with GSC, JSC and arithmetic fusions encodings compared to SC encoding for the three different dictionaries for scenes15 database.</figDesc><table><row><cell></cell><cell>0</cell><cell></cell></row><row><cell></cell><cell>−1</cell><cell></cell></row><row><cell>Relative gain of AOA in %</cell><cell>−4 −3 −2</cell><cell></cell><cell>GSC (0.4,0.2)</cell></row><row><cell></cell><cell></cell><cell></cell><cell>JSC (0.4,0.2)</cell></row><row><cell></cell><cell></cell><cell></cell><cell>GSC (0.2,0.2)</cell></row><row><cell></cell><cell>−5</cell><cell></cell><cell>JSC (0.2,0.2) SC + GSC (0.4,0.2)</cell></row><row><cell></cell><cell></cell><cell></cell><cell>SC + JSC (0.4,0.2)</cell></row><row><cell></cell><cell></cell><cell></cell><cell>SC + GSC (0.2,0.2)</cell></row><row><cell></cell><cell></cell><cell></cell><cell>SC + JSC (0.2,0.2)</cell></row><row><cell></cell><cell>−6</cell><cell>SC</cell><cell>GSC (0.2,0.2)</cell></row><row><cell cols="2">P fusion P P P P Encoding P P P Dictionary</cell><cell></cell><cell>SC</cell><cell>GSC (0.4,0.2) JSC (0.4,0.2) GSC (0.2,0.2) JSC (0.2,0.2)</cell></row><row><cell></cell><cell></cell><cell>AOA</cell><cell>84.66 ± 0.64 79.76 ± 0.62 81.4 ± 0.71 80.41 ± 0.67 82.35 ± 0.75</cell></row><row><cell cols="2">SC + GSC (0.2,0.2)</cell><cell>arithmetic AOA</cell><cell>84.62 ± 0.71 79.76 ± 0.63 81.38 ± 0.69 80.47 ± 0.57 82.2 ± 0.77</cell></row><row><cell></cell><cell></cell><cell>Geometric</cell></row><row><cell>Table</cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_8"><head>Table 8 .</head><label>8</label><figDesc>Summarize of fusion results -details in Tables2, 3, 4, 5, 6, 7.</figDesc><table><row><cell></cell><cell>Initial accuracy</cell><cell>Blinded fusion</cell><cell>Weighted fusion</cell><cell>State-of-the-Art</cell></row><row><cell>UIUCsports</cell><cell>87.27% ± 1.33</cell><cell>95.37% ± 1.29</cell><cell>95.37% ± 1.29</cell><cell>88.47 ± 2.32 [23]</cell></row><row><cell>scenes15</cell><cell>84.69% ± 0.6</cell><cell>84.66% ± 0.64</cell><cell>84.88% ± 0.55</cell><cell>81.04% ± 0.5 [8]</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Acknowledgement. We thank Direction Générale de l'Armement (DGA) for a financial support to this research. We thank Lucian ALECU for his comments.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Representing environmental sounds using the separable scattering transform</title>
		<author>
			<persName><forename type="first">C</forename><surname>Bauge</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lagrange</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Andén</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Mallat</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ICASSP</title>
				<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="8667" to="8671" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Learning deep architectures for ai</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Found. Trends Mach. Learn</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="1" to="127" />
			<date type="published" when="2009-01">Jan. 2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">Return of the devil in the details: Delving deep into convolutional nets</title>
		<author>
			<persName><forename type="first">K</forename><surname>Chatfield</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Simonyan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Vedaldi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Zisserman</surname></persName>
		</author>
		<idno>CoRR, abs/1405.3531</idno>
		<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Atomic decomposition by basis pursuit</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">S</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">L</forename><surname>Donoho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Michael</surname></persName>
		</author>
		<author>
			<persName><surname>Saunders</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">SIAM Journal on Scientific Computing</title>
		<imprint>
			<biblScope unit="volume">20</biblScope>
			<biblScope unit="page" from="33" to="61" />
			<date type="published" when="1998">1998</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Nearest neighbors using compact sparse codes</title>
		<author>
			<persName><forename type="first">A</forename><surname>Cherian</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 31st International Conference on Machine Learning (ICML -14)</title>
				<editor>
			<persName><forename type="first">T</forename><surname>Jebara</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">E</forename><forename type="middle">P</forename><surname>Xing</surname></persName>
		</editor>
		<meeting>the 31st International Conference on Machine Learning (ICML -14)</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="1053" to="1061" />
		</imprint>
	</monogr>
	<note>JMLR Worshop and Conference Proceedings</note>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">Construction and Analysis of a Large Scale Image Ontology</title>
		<author>
			<persName><forename type="first">J</forename><surname>Deng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Do</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Su</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Fei-Fei</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2009">2009</date>
			<publisher>Vision Sciences Society</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Geometric p -norm feature pooling for image classification</title>
		<author>
			<persName><forename type="first">J</forename><surname>Feng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Ni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Tian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Yan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CVPR</title>
				<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="2697" to="2704" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories</title>
		<author>
			<persName><forename type="first">S</forename><surname>Lazebnik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Schmid</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ponce</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition -Volume 2</title>
				<meeting>the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition -Volume 2<address><addrLine>Washington, DC, USA</addrLine></address></meeting>
		<imprint>
			<publisher>IEEE Computer Society</publisher>
			<date type="published" when="2006">2006</date>
			<biblScope unit="page" from="2169" to="2178" />
		</imprint>
	</monogr>
	<note>CVPR &apos;06</note>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Convolutional networks and applications in vision</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Lecun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Kavukcuoglu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Farabet</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ISCAS</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="253" to="256" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Efficient sparse coding algorithms</title>
		<author>
			<persName><forename type="first">H</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Battle</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Raina</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">Y</forename><surname>Ng</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">NIPS</title>
				<imprint>
			<publisher>NIPS</publisher>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="801" to="808" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">What, where and who? classifying event by scene and object recognition</title>
		<author>
			<persName><forename type="first">L.-J</forename><surname>Li</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE International Conference on Computer Vision</title>
				<imprint>
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Object recognition from local scale-invariant features</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">G</forename><surname>Lowe</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International Conference on Computer Vision-Volume 2 -Volume 2, ICCV &apos;99</title>
				<meeting>the International Conference on Computer Vision-Volume 2 -Volume 2, ICCV &apos;99<address><addrLine>Washington, DC, USA</addrLine></address></meeting>
		<imprint>
			<publisher>IEEE Computer Society</publisher>
			<date type="published" when="1999">1999</date>
			<biblScope unit="page">1150</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Online dictionary learning for sparse coding</title>
		<author>
			<persName><forename type="first">J</forename><surname>Mairal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Bach</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ponce</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Sapiro</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 26th Annual International Conference on Machine Learning, ICML &apos;09</title>
				<meeting>the 26th Annual International Conference on Machine Learning, ICML &apos;09<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="689" to="696" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Scalable nearest neighbor algorithms for high dimensional data</title>
		<author>
			<persName><forename type="first">M</forename><surname>Muja</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">G</forename><surname>Lowe</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on</title>
		<imprint>
			<biblScope unit="volume">36</biblScope>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
	<note>Pattern Analysis and Machine Intelligence</note>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<title level="m" type="main">A tutorial on the lasso and the &quot;shooting algorithm</title>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">V</forename><surname>Pendse</surname></persName>
		</author>
		<editor>P.A.I.N Group</editor>
		<imprint>
			<date type="published" when="2011-02-08">8 February 2011</date>
		</imprint>
		<respStmt>
			<orgName>Imaging and Analysis Group -McLean Hospital ; Harvard Medical School</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Technical report</note>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Improving the fisher kernel for large-scale image classification</title>
		<author>
			<persName><forename type="first">F</forename><surname>Perronnin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Sánchez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mensink</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 11th European Conference on Computer Vision: Part IV, ECCV&apos;10</title>
				<meeting>the 11th European Conference on Computer Vision: Part IV, ECCV&apos;10<address><addrLine>Berlin, Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer-Verlag</publisher>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="143" to="156" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Universal priors for sparse modeling</title>
		<author>
			<persName><forename type="first">I</forename><surname>Ramirez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Lecumberry</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Sapiro</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP)</title>
				<imprint>
			<date type="published" when="2009-12">2009. Dec 2009</date>
			<biblScope unit="page" from="197" to="200" />
		</imprint>
	</monogr>
	<note>3rd IEEE International Workshop on</note>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<title level="m" type="main">Imagenet large scale visual recognition challenge</title>
		<author>
			<persName><forename type="first">O</forename><surname>Russakovsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Deng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Su</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Krause</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Satheesh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Karpathy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Khosla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Bernstein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">C</forename><surname>Berg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Fei-Fei</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Video Google: A text retrieval approach to object matching in videos</title>
		<author>
			<persName><forename type="first">J</forename><surname>Sivic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Zisserman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International Conference on Computer Vision</title>
				<meeting>the International Conference on Computer Vision</meeting>
		<imprint>
			<date type="published" when="2003-10">Oct. 2003</date>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="1470" to="1477" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<title level="m" type="main">Local Sparse Coding for Image Classification and Retrieval</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">J</forename><surname>Thiagarajan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">N</forename><surname>Ramamurthy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Spanias</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
	<note type="report_type">Technical report</note>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Regression shrinkage and selection via the lasso</title>
		<author>
			<persName><forename type="first">R</forename><surname>Tibshirani</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of the Royal Statistical Society, Series B</title>
		<imprint>
			<biblScope unit="volume">58</biblScope>
			<biblScope unit="page" from="267" to="288" />
			<date type="published" when="1994">1994</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Lda versus mmd approximation on mislabeled images for keyword dependant selection of visual features and their heterogeneity</title>
		<author>
			<persName><forename type="first">S</forename><surname>Tollari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Glotin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)</title>
				<imprint>
			<date type="published" when="2006-05">may. 2006</date>
			<biblScope unit="volume">II</biblScope>
			<biblScope unit="page" from="413" to="416" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Max-margin multiple-instance dictionary learning</title>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Bai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Tu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 30th International Conference on Machine Learning (ICML-13)</title>
				<editor>
			<persName><forename type="first">S</forename><surname>Dasgupta</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><surname>Mcallester</surname></persName>
		</editor>
		<meeting>the 30th International Conference on Machine Learning (ICML-13)</meeting>
		<imprint>
			<date type="published" when="2013-05">May 2013</date>
			<biblScope unit="volume">28</biblScope>
			<biblScope unit="page" from="846" to="854" />
		</imprint>
	</monogr>
	<note>JMLR Workshop and Conference Proceedings</note>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Large-scale dictionary learning for local coordinate coding</title>
		<author>
			<persName><forename type="first">B</forename><surname>Xie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Song</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Tao</surname></persName>
		</author>
		<idno type="DOI">10.5244/C.24.36</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the British Machine Vision Conference</title>
				<meeting>the British Machine Vision Conference</meeting>
		<imprint>
			<publisher>BMVA Press</publisher>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="1" to="36" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Graph regularized sparse coding for image representation</title>
		<author>
			<persName><forename type="first">M</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Bu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Qiu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Cai</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transaction on Image Processing</title>
		<imprint>
			<biblScope unit="volume">20</biblScope>
			<biblScope unit="issue">5</biblScope>
			<biblScope unit="page" from="1327" to="1336" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
