<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Visual Scenes Clustering Using Variational Incremental Learning of Infinite Generalized Dirichlet Mixture Models</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Wentao</forename><surname>Fan</surname></persName>
						</author>
						<author role="corresp">
							<persName><forename type="first">Nizar</forename><surname>Bouguila</surname></persName>
							<email>nizar.bouguila@concordia.ca</email>
						</author>
						<author>
							<affiliation key="aff0">
								<orgName type="department">Electrical and Computer Engineering</orgName>
								<orgName type="institution">Concordia University</orgName>
								<address>
									<country key="CA">Canada</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff1">
								<orgName type="department">wenta</orgName>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff2">
								<orgName type="department">Institute for Information Systems Engineering</orgName>
								<orgName type="institution">Concordia University</orgName>
								<address>
									<country key="CA">Canada</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Visual Scenes Clustering Using Variational Incremental Learning of Infinite Generalized Dirichlet Mixture Models</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">B790AACDC1955991282174279C804BE1</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T07:17+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In this paper, we develop a clustering approach based on variational incremental learning of a Dirichlet process of generalized Dirichlet (GD) distributions. Our approach is built on nonparametric Bayesian analysis where the determination of the complexity of the mixture model (i.e. the number of components) is sidestepped by assuming an infinite number of mixture components. By leveraging an incremental variational inference algorithm, the model complexity and all the involved model's parameters are estimated simultaneously and effectively in a single optimization framework. Moreover, thanks to its incremental nature and Bayesian roots, the proposed framework allows to avoid over-and under-fitting problems, and to offer good generalization capabilities. The effectiveness of the proposed approach is tested on a challenging application involving visual scenes clustering.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Incremental clustering plays a crucial role in many data mining and computer vision applications <ref type="bibr" target="#b6">[Opelt et al., 2006;</ref><ref type="bibr" target="#b7">Sheikh et al., 2007;</ref><ref type="bibr" target="#b5">Li et al., 2007]</ref>. Incremental clustering is particularly efficient in the following scenarios: when data points are obtained sequentially, when the available memory is limited, or when we have large-scale data sets to deal with. Bayesian approaches have been widely used to develop powerful clustering techniques. Bayesian approaches applied for incremental clustering fall basically into two categories: parametric and non-parametric, and allow to mimic the human learning process which is based on iterative accumulation of knowledge. As opposed to parametric approaches in which a fixed number of parameters is considered, Bayesian nonparametric approaches use an infinite-dimensional parameter space and allow the complexity of models to grow with data size. The consideration of an infinite-dimensional parameter space allows to determine appropriate model complexity, which is normally referred to as the problem of model selection or model adaptation. This is a crucial issue in clustering since it permits to capture the underlying data structure more precisely, and also to avoid over-and under-fitting problems. This paper focuses on the latter one since it is more adapted to modern data mining applications (i.e. modern applications involve generally dynamic data sets). Nowadays, the most popular Bayesian nonparametric formalism is the Dirichlet process (DP) <ref type="bibr" target="#b6">[Neal, 2000;</ref><ref type="bibr" target="#b8">Teh et al., 2004]</ref> generally translated to a mixture model with a countably infinite number of components in which the difficulty of selecting the appropriate number of clusters, that usually occurs in the finite case, is avoided. A common way to learn Dirichlet process model is through Markov chain Monte Carlo (MCMC) techniques. Nevertheless, MCMC approaches have several drawbacks such as the high computational cost and the difficulty of monitoring convergence. These shortcomings of MCMC approaches can be solved by adopting an alternative namely variational inference (or variational Bayes) <ref type="bibr" target="#b0">[Attias, 1999]</ref>, which is a deterministic approximation technique that requires a modest amount of computational power. Variational inference has provided promising performance in many applications involving mixture models <ref type="bibr" target="#b2">[Corduneanu and Bishop, 2001;</ref><ref type="bibr">Constantinopoulos et al., 2006;</ref><ref type="bibr" target="#b3">Fan et al., 2012;</ref><ref type="bibr" target="#b3">2013]</ref>. In our work, we employ an incremental version of variational inference proposed by <ref type="bibr" target="#b3">[Gomes et al., 2008]</ref> to learn infinite generalized Dirichlet (GD) mixtures in the context where data points are supposed to arrive sequentially. The consideration of the GD distribution is motivated by its promising performance when handling non-Gaussian data, and in particular proportional data (which are subject to two restrictions: nonnegativity and unit-sum) which are naturally generated in several data mining, machine learning, computer vision, and bioinformatics applications <ref type="bibr" target="#b1">[Bouguila and Ziou, 2006;</ref><ref type="bibr" target="#b5">2007;</ref><ref type="bibr" target="#b1">Boutemedjet et al., 2009]</ref>. Examples of applications include textual documents (or images) clustering where a given document (or image) is described as a normalized histogram of words (or visual words) frequencies. The main contributions of this paper are listed as the following: 1) we develop an incremental variational learning algorithm for the infinite GD mixture model, which is much more efficient when dealing with massive and sequential data as opposed to the corresponding batch approach; 2) we apply the proposed approach to tackle a challenging real-world problem namely visual scenes clustering. The effectiveness and merits of our approach are illustrated through extensive simulations. The rest of this paper is organized as follows. Section 2 presents the infinite GD mixture model. The incre-mental variational inference framework for model learning is described in Section 3. Section 4 is devoted to the experimental results. Finally, conclusion follows in Section 5.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">The Infinite GD Mixture Model</head><p>Let Y = (Y 1 , . . . , Y D ) be a D-dimensional random vector drawn from an infinite mixture of GD distributions:</p><formula xml:id="formula_0">p( Y | π, α, β) = ∞ j=1 πjGD( Y | αj, βj)<label>(1)</label></formula><p>where π represents the mixing weights that are positive and sum to one. α j = (α j1 , . . . , α jD ) and β j = (β j1 , . . . , β jD ) are the positive parameters of the GD distribution associated with component j, while GD( Y | α j , β j ) is defined as</p><formula xml:id="formula_1">GD( Y | αj, βj) = D l=1 Γ(α jl + β jl ) Γ(α jl )Γ(β jl ) Y α jl −1 l 1 − l k=1 Y k γ jl<label>(2)</label></formula><p>where D l=1 Y l &lt; 1 and 0 &lt; y l &lt; 1 for l = 1, . . . , D, γ jl = β jl − α jl+1 − β jl+1 for l = 1, . . . , D − 1, and γ jD = β jD − 1. Γ(•) is the gamma function defined by Γ(x) = ∞ 0 u x−1 e −u du. Furthermore, we exploit an interesting and convenient mathematical property of the GD distribution which is thoroughly discussed in <ref type="bibr" target="#b1">[Boutemedjet et al., 2009]</ref>, to transform the original data points into another D-dimensional space where the features are conditionally independent and rewrite the infinite GD mixture model in the following form</p><formula xml:id="formula_2">p( X| π, α, β) = ∞ j=1 πj D l=1 Beta(X l |α jl , β jl )<label>(3)</label></formula><p>where</p><formula xml:id="formula_3">X l = Y l and X l = Y l /(1 − l−1 k=1 Y k ) for l &gt; 1. Beta(X l |α jl , β jl ) is a Beta distribution parameterized with (α jl , β jl ).</formula><p>In this work, we construct the Dirichlet process through a stick-breaking representation <ref type="bibr" target="#b7">[Sethuraman, 1994]</ref>. Therefore, the mixing weights π j are constructed by recursively breaking a unit length stick into an infinite number of pieces as π j = λ j j−1 k=1 (1 − λ k ). λ j is known as the stick breaking variable and is distributed independently according to λ j ∼ Beta(1, ξ), where ξ &gt; 0 is the concentration parameter of the Dirichlet process. For an observed data set ( X 1 , . . . , X N ), we introduce a set of mixture component assignment variables Z = (Z 1 , . . . , Z N ), one for each data point. Each element Z i of Z has an integer value j specifying the component from which X i is drawn. The marginal distribution over Z is given by</p><formula xml:id="formula_4">p( Z| λ) = N i=1 ∞ j=1 λj j−1 k=1 (1 − λ k ) 1[Z i =j] (4)</formula><p>where 1[•] is an indicator function which equals to 1 when Z i = j, and equals to 0 otherwise. Since our model framework is Bayesian, we need to place prior distributions over random variables α and β. Since the formal conjugate prior for Beta distribution is intractable, we adopt Gamma priors G(•) to approximate the conjugate priors of α and β as: p( α) = G( α| u, v) and p( β) = G( β| s, t), with the assumption that these parameters are statistically independent.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Model Learning</head><p>In our work, we adopt an incremental learning framework proposed in <ref type="bibr" target="#b3">[Gomes et al., 2008]</ref> to learn the proposed infinite GD mixture model through variational Bayes. In this algorithm, data points can be sequentially processed in small batches where each one may contain one or a group of data points. The model learning framework involves the following two phases: 1) model building phase: to inference the optimal mixture model with the currently observed data points; 2) compression phase: to estimate which mixture component that groups of data points should be assigned to.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Model Building Phase</head><p>For an observed data set X = ( X 1 , . . . , X N ), we define Θ = { Z, α, β, λ} as the set of unknown random variables. The main target of variational Bayes is to estimate a proper approximation q(Θ) for the true posterior distribution p(Θ|X ). This problem can be solved by maximizing the free energy F(X , q), where F(X , q) = q(Θ) ln[p(X , Θ)/q(Θ)]dΘ. In our algorithm, inspired by <ref type="bibr" target="#b0">[Blei and Jordan, 2005]</ref>, we truncate the variational distribution q(Θ) at a value M , such that λ M = 1, π j = 0 when j &gt; M , and M j=1 π j = 1, where the truncation level M is a variational parameter which can be freely initialized and will be optimized automatically during the learning process <ref type="bibr" target="#b0">[Blei and Jordan, 2005]</ref>. In order to achieve tractability, we also assume that the approximated posterior distribution q(Θ) can be factorized into disjoint tractable factors as:</p><formula xml:id="formula_5">q(Θ) = [ N i=1 q(Z i )][ M j=1 D l=1 q(α jl )q(β jl )][ M j=1 q(λ j )].</formula><p>By maximizing the free energy F(X , q) with respect to each variational factor, we can obtain the following update equations for these factors:</p><formula xml:id="formula_6">q( Z) = N i=1 M j=1 r 1[Z i =j] ij , q( α) = M j=1 D l=1 G(α jl |u * jl , v * jl ) (5) q( β) = M j=1 D l=1 G(β jl |s * jl , t * jl ), q( λ) = M j=1 Beta(λj|aj, bj) (6)</formula><p>where we have defined</p><formula xml:id="formula_7">rij = exp(ρij) M j=1 exp(ρij) (7) ρij = D l=1 R jl + ( ᾱjl − 1) ln X il + ( βjl − 1) ln(1 − X il ) + ln λj + j−1 k=1 ln(1 − λ k ) u * jl = u jl + N i=1 Zi = j [Ψ( ᾱjl + βjl ) − Ψ( ᾱjl ) + βjl ×Ψ ( ᾱjl + βjl )( ln β jl − ln βjl )] ᾱjl s * jl = s jl + N i=1 Zi = j [Ψ( ᾱjl + βjl ) − Ψ( βjl ) + ᾱjl ×Ψ ( ᾱjl + βjl )( ln α jl − ln ᾱjl )] βjl v * jl = v jl − N i=1 Zi = j ln X il , bj = ξj + N i=1 M k=j+1 Zi = k t * jl = t jl − N i=1 Zi = j ln(1 − X il ), aj = 1 + N i=1 Zi = j</formula><p>where Ψ(•) is the digamma function, and • is the expectation evaluation. Note that, R is the lower bound of R = ln Γ(α+β)  Γ(α)Γ(β) . Since this expectation is intractable, the second-order Taylor series expansion is applied to find its lower bound. The expected values in the above formulas are given by</p><formula xml:id="formula_8">Z i = j = r ij , ᾱjl = α jl = u * jl /v * jl , βjl = β jl = s * jl /t * jl , ln λ j = Ψ(a j ) − Ψ(a j + b j ), ln(1−λ j ) = Ψ(b j )−Ψ(a j +b j ), ln α jl = Ψ(u * jl )−ln v * jl</formula><p>and ln β jl = Ψ(s * jl ) − ln t * jl . After convergence, the currently observed data points are clustered into M groups according to corresponding responsibilities r ij through Eq. ( <ref type="formula">7</ref>). According to <ref type="bibr" target="#b3">[Gomes et al., 2008]</ref>, these newly formed groups of data points are also denoted as "clumps". Following <ref type="bibr" target="#b3">[Gomes et al., 2008]</ref>, these clumps are subject to the constraint that all data points X i in the clump c share the same q(Z i ) ≡ q(Z c ) which is a key factor in the following compression phase.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Algorithm 1</head><p>1: Choose the initial truncation level M . 2: Initialize the values for hyper-parameters u jl , v jl , s jl , t jl and ξj. 3: Initialize the values of rij by K-Means algorithm. 4: while More data to be observed do 5:</p><p>Perform the model building phase through Eqs. ( <ref type="formula">5</ref>) and ( <ref type="formula">6</ref>). 6:</p><p>Initialize the compression phase using Eq. (10). 7:</p><p>while MC ≥ C do 8:</p><p>for j = 1 to M do 9:</p><p>if evaluated(j) = false then 10:</p><p>Split component j and refine this split using Eqs (9). 11: ∆F(j) = change in Eq. ( <ref type="formula">8</ref>). 12: evaluated(j) = true. 13: end if 14:</p><p>end for 15:</p><p>Split component j with the largest value of ∆F(j). 16: M = M + 1. 17:</p><p>end while 18:</p><p>Discard the current observed data points.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>19:</head><p>Save resulting components into next learning round. 20: end while</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Compression Phase</head><p>Within the compression phase, we need to estimate clumps that are possibly belong to the same mixture component while taking into consideration future arriving data. Now assume that we have already observed N data points, our aim is to make an inference at some target time T where T ≥ N . we can tackle this problem by scaling the observed data to the target size T , which is equivalent to using the variational posterior distribution of the observed data N as a predictive model of the future data <ref type="bibr" target="#b3">[Gomes et al., 2008]</ref>. We then have a modified free energy for the compression phase in the following form</p><formula xml:id="formula_9">F = M j=1 D l=1 ln p(α jl |u jl , v jl ) q(α jl ) + ln p(β jl |s jl , t jl ) q(β jl ) + M j=1 ln p(λj|ξj) q(λj) + T N c |nc| ln M j=1 exp(ρcj) (8)</formula><p>where |n c | represents the number of data points in clump c and T N is the data magnification factor. The corresponding update equations for maximizing this free energy function can be obtained as</p><formula xml:id="formula_10">rcj = exp(ρcj) M j=1 exp(ρcj) (9) ρij = D l=1 R jl + ( ᾱjl − 1) ln X cl + ( βjl − 1) ln(1 − X cl ) + ln λj + j−1 k=1 ln(1 − λ k ) u * jl = u jl + T N c |nc|rcj[Ψ( ᾱjl + βjl ) − Ψ( ᾱjl ) + βjl ×Ψ ( ᾱjl + βjl )( ln β jl − ln βjl )] ᾱjl s * jl = s jl + T N c |nc|rcj[Ψ( ᾱjl + βjl ) − Ψ( βjl ) + ᾱjl ×Ψ ( ᾱjl + βjl )( ln α jl − ln ᾱjl )] βjl v * jl = v jl − T N c |nc|rcj ln X cl t * jl = t jl − T N c |nc|rcj ln(1 − X cl ) aj = 1 + T N c |nc| Zc = j bj = ξj + T N c |nc| M k=j+1 Zc = k</formula><p>where X cl denotes average over all data points contained in clump c. The first step of the compression phase is to assign each clump or data point to the component with the highest responsibility r cj calculated from the model building phase as</p><formula xml:id="formula_11">Ic = arg max j rcj (<label>10</label></formula><formula xml:id="formula_12">)</formula><p>where {I c } denote which component the clump (or data point) c belongs to in the compression phase. Next, we cycle through each component and split it along its principal component into two subcomponents. This split is refined by updating Eqs. ( <ref type="formula">9</ref>). The clumps are then hard assigned to one of the two candidate components after convergence for refining the split. Among all the potential splits, we select the one that results in the largest change in the free energy (Eq. ( <ref type="formula">8</ref>)).</p><p>The splitting process repeats itself until a stopping criterion is met. According to <ref type="bibr" target="#b3">[Gomes et al., 2008]</ref>, the stoping criterion for the splitting process can be expressed as a limit on the amount of memory required to store the components. In our case, the component memory cost for the mixture model is MC = 2DN c , where 2D is the number of parameters contained in a D-variate GD component, and N c is the number of components. Accordingly, We can define an upper limit on the component memory cost C, and the compression phase stops when MC ≥ C. As a result, the computational time and the space requirement is bounded in each learning round. After the compression phase, the currently observed data points are discarded while the resulting components can be treated in the same way as data points in the next round of leaning.</p><p>Our incremental variational inference algorithm for infinite GD mixture model is summarized in Algorithm 1. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Visual Scenes Clustering</head><p>In this section, the effectiveness of the proposed incremental infinite GD mixture model (InGDMM) is tested on a challenging real-world application namely visual scenes clustering. The problem is important since images are being produced at exponential increasing rates and very challenging due to the difficulty of capturing the variability of appearance and shape of diverse objects belonging to the same scene, while avoiding confusing objects from different scenes. In our experiments, we initialize the truncation level M as 15. The initial values of the hyperparameters are set as: (u jl , v jl , s jl , t jl , ξ j ) = (1, 0.01, 1, 0.01, 0.1), which have been found to be reasonable choices according to our experimental results.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Database and Experimental Design</head><p>In this paper, we test our approach on a challenging and publicly available database known as the OT database, which was introduced by Oliva and Torralba <ref type="bibr" target="#b6">[Oliva and Torralba, 2001]</ref> 1 . This database contains 2,688 images with the size of 256 × 1 OT database is available at: http://cvcl.mit.edu/database.htm.</p><p>256 pixels, and is composed of eight urban and natural scene categories: coast (360 images), forest (328 images), highway (260 images), inside-city (308 images), mountain (374 images), open country (410 images), street (292 images), and tall building (356 images). Figure <ref type="figure" target="#fig_0">1</ref> shows some sample images from the different categories in the OT database.</p><p>Our methodology is based on the proposed incremental infinite GD mixture model in conjunction with a bag-of-visual words representation, and can be summarized as follows: Firstly, we use the Difference-of-Gaussians (DoG) interest point detector to extract Scale-invariant feature transform (SIFT) descriptors (128-dimensional) <ref type="bibr" target="#b6">[Lowe, 2004]</ref> from each image. Secondly, K-Means algorithm is adopted to construct a visual vocabulary by quantizing these SIFT vectors into visual words. As a result, each image is represented as the frequency histogram over the visual words. We have tested different sizes of the visual vocabulary |W| = [100, 1000], and the optimal performance was obtained for |W| = 750 according to our experimental results. Then, the Probabilistic Latent Semantic Analysis (pLSA) model <ref type="bibr" target="#b4">[Hofmann, 2001</ref>] is applied to the obtained histograms to represent each image by a 55-dimensional proportional vector where 55 is the number of latent aspects. Finally, the proposed InGDMM is deployed to cluster the images supposed to arrive in a sequential way. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Experimental Results</head><p>In our experiments, we randomly divided the OT database into two halves: one for constructing the visual vocabulary, another for testing. Since our approach is unsupervised, the class labels are not involved in our experiments, except for evaluation of the clustering results. The entire methodology was repeated 30 times to evaluate the performance. For comparison, we have also applied three other mixture-modeling approaches: the finite GD mixture model (FiGDMM), the infinite Gaussian mixture model (InGMM) and the finite Gaussian mixture model (FiGMM). To make a fair comparison, all of the aforementioned approaches are learned through incremental variational inference. Table <ref type="table" target="#tab_0">1</ref> shows the average confusion matrix of the OT database calculated by the proposed InGDMM. performances are obtained for approaches that adopt the infinite mixtures (InGDMM and InGMM) than the corresponding finite mixtures (FiGDMM and FiGMM), which demonstrate the advantage of using infinite mixture models over finite ones. Moreover, according to Table <ref type="table" target="#tab_1">2</ref>, GD mixture has higher performance than Gaussian mixture which verifies that the GD mixture model has better modeling capability than the Gaussian for proportional data clustering.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Conclusion</head><p>In this work, we have presented an incremental nonparametric Bayesian approach for clustering. The proposed approach is based on infinite GD mixture models with a Dirichlet process framework, and is learned using an incremental variational inference framework. Within this framework, the model parameters and the number of mixture components are determined simultaneously. The effectiveness of the proposed approach has been evaluated on a challenging application namely visual scenes clustering. Future works could be devoted to the application of the proposed algorithm for other data mining tasks involving continually changing or growing volumes of proportional data.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Sample images from the OT data set.</figDesc><graphic coords="4,57.86,346.10,56.69,56.69" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>Average rounded confusion matrix for the OT database calculated by InGDMM.</figDesc><table><row><cell></cell><cell>C</cell><cell>F</cell><cell>H</cell><cell>I</cell><cell>M</cell><cell>O</cell><cell>S</cell><cell>T</cell></row><row><cell>Coast (C)</cell><cell>127</cell><cell>10</cell><cell>4</cell><cell>2</cell><cell>3</cell><cell>31</cell><cell>2</cell><cell>1</cell></row><row><cell>Forest (F)</cell><cell>2</cell><cell>155</cell><cell>1</cell><cell>2</cell><cell>1</cell><cell>3</cell><cell>0</cell><cell>0</cell></row><row><cell>Highway (H)</cell><cell>0</cell><cell>0</cell><cell>122</cell><cell>1</cell><cell>0</cell><cell>3</cell><cell>3</cell><cell>1</cell></row><row><cell>Inside-city (I)</cell><cell>2</cell><cell>4</cell><cell>2</cell><cell>119</cell><cell>3</cell><cell>2</cell><cell>15</cell><cell>7</cell></row><row><cell>Mountain (M)</cell><cell>6</cell><cell>21</cell><cell>4</cell><cell>5</cell><cell>139</cell><cell>9</cell><cell>1</cell><cell>2</cell></row><row><cell>Open country (O)</cell><cell>2</cell><cell>22</cell><cell>19</cell><cell>15</cell><cell>9</cell><cell>131</cell><cell>3</cell><cell>4</cell></row><row><cell>Street (S)</cell><cell>0</cell><cell>1</cell><cell>4</cell><cell>8</cell><cell>5</cell><cell>5</cell><cell>122</cell><cell>1</cell></row><row><cell>Tall building (T)</cell><cell>4</cell><cell>9</cell><cell>7</cell><cell>23</cell><cell>3</cell><cell>19</cell><cell>3</cell><cell>110</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 :</head><label>2</label><figDesc>Table2illustrates the average categorization performance using different approaches for the OT database. As we can see from this table, it is obvious that our approach (InGDMM) provides the best performance in terms of the highest categorization rate (77.47%) among all the tested approaches. In addition, we can observe that better The average classification accuracy rate (Acc) (%) obtained over 30 runs using different approaches.</figDesc><table><row><cell cols="5">Method InGDMM FiGDMM InGMM FiGMM</cell></row><row><cell>Acc(%)</cell><cell>77.47</cell><cell>74.25</cell><cell>72.54</cell><cell>70.19</cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">A hybrid SEM algorithm for high-dimensional unsupervised learning using a finite generalized Dirichlet mixture</title>
		<author>
			<persName><forename type="first">;</forename><forename type="middle">H</forename><surname>Attias</surname></persName>
		</author>
		<author>
			<persName><forename type="first">;</forename><surname>Attias</surname></persName>
		</author>
		<author>
			<persName><forename type="first">;</forename><forename type="middle">D M</forename><surname>Blei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">I</forename><surname>Blei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">;</forename><forename type="middle">N</forename><surname>Jordan ; Bouguila</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Bouguila</surname></persName>
		</author>
		<author>
			<persName><surname>Ziou</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of Advances in Neural Information Processing Systems (NIPS)</title>
				<meeting>of Advances in Neural Information essing Systems (NIPS)</meeting>
		<imprint>
			<date type="published" when="1999">1999. 1999. 2005. 2005. 2006</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="2657" to="2668" />
		</imprint>
	</monogr>
	<note>Bayesian Analysis</note>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Highdimensional unsupervised selection and estimation of a finite generalized Dirichlet mixture model based on minimum message length</title>
		<author>
			<persName><forename type="first">Ziou</forename><surname>Bouguila</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Bouguila</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Ziou</surname></persName>
		</author>
		<author>
			<persName><surname>Boutemedjet</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Pattern Analysis and Machine Intelligence</title>
		<imprint>
			<biblScope unit="volume">29</biblScope>
			<biblScope unit="issue">8</biblScope>
			<biblScope unit="page" from="1429" to="1443" />
			<date type="published" when="2006">2007. 2007. 2009. 2009. 2006</date>
		</imprint>
	</monogr>
	<note>IEEE Transactions on Pattern Analysis and Machine Intelligence. Constantinopoulos et al.. C. Constantinopoulos</note>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Bayesian feature and model selection for Gaussian mixture models</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">K</forename><surname>Titsias</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Likas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">M</forename><surname>Corduneanu</surname></persName>
		</author>
		<author>
			<persName><surname>Bishop</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 8th International Conference on Artificial Intelligence and Statistics (AISTAT)</title>
				<meeting>of the 8th International Conference on Artificial Intelligence and Statistics (AISTAT)</meeting>
		<imprint>
			<date type="published" when="2001">2006. 2001</date>
			<biblScope unit="volume">28</biblScope>
			<biblScope unit="page" from="27" to="34" />
		</imprint>
	</monogr>
	<note>Variational Bayesian model selection for mixture distributions</note>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Unsupervised hybrid feature extraction selection for high-dimensional non-Gaussian data clustering with variational inference</title>
		<author>
			<persName><forename type="first">Fan</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</title>
				<meeting>of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</meeting>
		<imprint>
			<date type="published" when="2008">2012. 2012. 2013. 2008</date>
			<biblScope unit="volume">23</biblScope>
			<biblScope unit="page" from="1" to="8" />
		</imprint>
	</monogr>
	<note>Incremental learning of nonparametric Bayesian mixture models</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Unsupervised learning by probabilistic latent semantic analysis</title>
		<author>
			<persName><forename type="first">;</forename><forename type="middle">T</forename><surname>Hofmann</surname></persName>
		</author>
		<author>
			<persName><surname>Hofmann</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Machine Learning</title>
		<imprint>
			<biblScope unit="volume">42</biblScope>
			<biblScope unit="issue">1/2</biblScope>
			<biblScope unit="page" from="177" to="196" />
			<date type="published" when="2001">2001. 2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Optimol: automatic online picture collection via incremental model learning</title>
		<author>
			<persName><surname>Li</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</title>
				<meeting>of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</meeting>
		<imprint>
			<date type="published" when="2007">2007. 2007</date>
			<biblScope unit="page" from="1" to="8" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Modeling the shape of the scene: A holistic representation of the spatial envelope</title>
		<author>
			<persName><forename type="first">;</forename><forename type="middle">D G</forename><surname>Lowe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">M</forename><surname>Lowe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">;</forename><surname>Neal</surname></persName>
		</author>
		<author>
			<persName><surname>Oliva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">;</forename><forename type="middle">A</forename><surname>Torralba</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Oliva</surname></persName>
		</author>
		<author>
			<persName><surname>Torralba</surname></persName>
		</author>
		<author>
			<persName><surname>Opelt</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</title>
				<meeting>of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</meeting>
		<imprint>
			<date type="published" when="2000">2004. 2004. 2000. 2000. 2001. 2001. 2006. 2006</date>
			<biblScope unit="volume">60</biblScope>
			<biblScope unit="page" from="3" to="10" />
		</imprint>
	</monogr>
	<note>Incremental learning of object detectors using a visual shape alphabet</note>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">A constructive definition of Dirichlet priors</title>
		<author>
			<persName><forename type="first">;</forename><forename type="middle">J</forename><surname>Sethuraman</surname></persName>
		</author>
		<author>
			<persName><surname>Sethuraman</surname></persName>
		</author>
		<author>
			<persName><surname>Sheikh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the IEEE 11th International Conference on Computer Vision (ICCV)</title>
				<meeting>of the IEEE 11th International Conference on Computer Vision (ICCV)</meeting>
		<imprint>
			<date type="published" when="1994">1994. 1994. 2007. 2007</date>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="page" from="1" to="8" />
		</imprint>
	</monogr>
	<note>Mode-seeking by medoidshifts</note>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Hierarchical Dirichlet processes</title>
		<author>
			<persName><surname>Teh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of the American Statistical Association</title>
		<imprint>
			<biblScope unit="volume">101</biblScope>
			<biblScope unit="page" from="705" to="711" />
			<date type="published" when="2004">2004. 2004</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
