<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Visual Scenes Clustering Using Variational Incremental Learning of Infinite Generalized Dirichlet Mixture Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Wentao Fan</string-name>
          <email>fa@encs.concordia.ca</email>
          <email>wenta fa@encs.concordia.ca</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nizar Bouguila</string-name>
          <email>nizar.bouguila@concordia.ca</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Electrical and Computer Engineering, Concordia University</institution>
          ,
          <country country="CA">Canada</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute for Information Systems Engineering, Concordia University</institution>
          ,
          <country country="CA">Canada</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we develop a clustering approach based on variational incremental learning of a Dirichlet process of generalized Dirichlet (GD) distributions. Our approach is built on nonparametric Bayesian analysis where the determination of the complexity of the mixture model (i.e. the number of components) is sidestepped by assuming an infinite number of mixture components. By leveraging an incremental variational inference algorithm, the model complexity and all the involved model's parameters are estimated simultaneously and effectively in a single optimization framework. Moreover, thanks to its incremental nature and Bayesian roots, the proposed framework allows to avoid over- and under-fitting problems, and to offer good generalization capabilities. The effectiveness of the proposed approach is tested on a challenging application involving visual scenes clustering.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Incremental clustering plays a crucial role in many data
mining and computer vision applications [Opelt et al., 2006;
Sheikh et al., 2007; Li et al., 2007]. Incremental clustering
is particularly efficient in the following scenarios: when data
points are obtained sequentially, when the available memory
is limited, or when we have large-scale data sets to deal with.
Bayesian approaches have been widely used to develop
powerful clustering techniques. Bayesian approaches applied for
incremental clustering fall basically into two categories:
parametric and non-parametric, and allow to mimic the human
learning process which is based on iterative accumulation of
knowledge. As opposed to parametric approaches in which
a fixed number of parameters is considered, Bayesian
nonparametric approaches use an infinite-dimensional parameter
space and allow the complexity of models to grow with data
size. The consideration of an infinite-dimensional
parameter space allows to determine appropriate model complexity,
which is normally referred to as the problem of model
selection or model adaptation. This is a crucial issue in clustering
since it permits to capture the underlying data structure more
precisely, and also to avoid over- and under-fitting problems.
This paper focuses on the latter one since it is more adapted
to modern data mining applications (i.e. modern applications
involve generally dynamic data sets).</p>
      <p>Nowadays, the most popular Bayesian nonparametric
formalism is the Dirichlet process (DP) [Neal, 2000; Teh et al.,
2004] generally translated to a mixture model with a
countably infinite number of components in which the difficulty
of selecting the appropriate number of clusters, that
usually occurs in the finite case, is avoided. A common way
to learn Dirichlet process model is through Markov chain
Monte Carlo (MCMC) techniques. Nevertheless, MCMC
approaches have several drawbacks such as the high
computational cost and the difficulty of monitoring convergence.
These shortcomings of MCMC approaches can be solved
by adopting an alternative namely variational inference (or
variational Bayes) [Attias, 1999], which is a deterministic
approximation technique that requires a modest amount of
computational power. Variational inference has provided
promising performance in many applications involving
mixture models [Corduneanu and Bishop, 2001;
Constantinopoulos et al., 2006; Fan et al., 2012; 2013]. In our work, we
employ an incremental version of variational inference
proposed by [Gomes et al., 2008] to learn infinite generalized
Dirichlet (GD) mixtures in the context where data points are
supposed to arrive sequentially. The consideration of the
GD distribution is motivated by its promising performance
when handling non-Gaussian data, and in particular
proportional data (which are subject to two restrictions:
nonnegativity and unit-sum) which are naturally generated in
several data mining, machine learning, computer vision, and
bioinformatics applications [Bouguila and Ziou, 2006; 2007;
Boutemedjet et al., 2009]. Examples of applications include
textual documents (or images) clustering where a given
document (or image) is described as a normalized histogram of
words (or visual words) frequencies.</p>
      <p>The main contributions of this paper are listed as the
following: 1) we develop an incremental variational learning
algorithm for the infinite GD mixture model, which is much
more efficient when dealing with massive and sequential data
as opposed to the corresponding batch approach; 2) we
apply the proposed approach to tackle a challenging real-world
problem namely visual scenes clustering. The effectiveness
and merits of our approach are illustrated through extensive
simulations. The rest of this paper is organized as follows.
Section 2 presents the infinite GD mixture model. The
incremental variational inference framework for model learning is
described in Section 3. Section 4 is devoted to the
experimental results. Finally, conclusion follows in Section 5.
2</p>
    </sec>
    <sec id="sec-2">
      <title>The Infinite GD Mixture Model</title>
      <p>Let Y~ = (Y1; : : : ; YD) be a D-dimensional random vector
drawn from an infinite mixture of GD distributions:
1
p(Y~ j~ ; ~ ; ~) = X jGD(Y~ j~ j; ~j)</p>
      <p>j=1
where ~ represents the mixing weights that are positive and
sum to one. ~ j = ( j1; : : : ; jD) and ~j = ( j1; : : : ; jD)
are the positive parameters of the GD distribution associated
with component j, while GD(Y~ j~ j ; ~j ) is defined as</p>
      <p>D
GD(Y~ j~ j; ~j) = Y
l=1
( jl + jl) Y jl 1 1
( jl) ( jl) l</p>
      <p>l
X Yk
k=1
where PD</p>
      <p>l=1 Yl &lt; 1 and 0 &lt; yl &lt; 1 for l = 1; : : : ; D,
jl = jl jl+1 jl+1 for l = 1; : : : ; D 1, and
jD = jD 1. ( ) is the gamma function defined by
(x) = R01 ux 1e udu. Furthermore, we exploit an
interesting and convenient mathematical property of the GD
distribution which is thoroughly discussed in [Boutemedjet et
al., 2009], to transform the original data points into another
D-dimensional space where the features are conditionally
independent and rewrite the infinite GD mixture model in the
following form</p>
      <p>1
p(X~ j~ ; ~ ; ~) = X
j=1</p>
      <p>D
j Y Beta(Xlj jl; jl)
l=1
where Xl = Yl and Xl = Yl=(1 Plk=11 Yk) for l &gt; 1.
Beta(Xlj jl; jl) is a Beta distribution parameterized with
( jl; jl).</p>
      <p>In this work, we construct the Dirichlet process through a
stick-breaking representation [Sethuraman, 1994].
Therefore, the mixing weights j are constructed by recursively
breaking a unit length stick into an infinite number of pieces
as j = j Qjk=11(1 k). j is known as the stick
breaking variable and is distributed independently according to
j Beta(1; ), where &gt; 0 is the concentration
parameter of the Dirichlet process.</p>
      <p>For an observed data set (X~ 1; : : : ; X~ N ), we introduce a set of
mixture component assignment variables Z~ = (Z1; : : : ; ZN ),
one for each data point. Each element Zi of Z~ has an integer
value j specifying the component from which X~ i is drawn.
The marginal distribution over Z~ is given by
p(Z~ j~ ) =</p>
      <p>N 1
Y Y
i=1 j=1</p>
      <p>j 1
j Y(1
k=1
k)
(4)
where 1[ ] is an indicator function which equals to 1 when
Zi = j, and equals to 0 otherwise. Since our model
framework is Bayesian, we need to place prior distributions over
(1)
jl
(2)
(3)
random variables ~ and ~. Since the formal conjugate prior
for Beta distribution is intractable, we adopt Gamma
priors G( ) to approximate the conjugate priors of ~ and ~ as:
p(~ ) = G(~ j~u; ~v) and p(~) = G(~j~s; ~t), with the assumption
that these parameters are statistically independent.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Model Learning</title>
      <p>In our work, we adopt an incremental learning framework
proposed in [Gomes et al., 2008] to learn the proposed
infinite GD mixture model through variational Bayes. In this
algorithm, data points can be sequentially processed in small
batches where each one may contain one or a group of data
points. The model learning framework involves the following
two phases: 1) model building phase: to inference the
optimal mixture model with the currently observed data points;
2) compression phase: to estimate which mixture component
that groups of data points should be assigned to.
3.1</p>
      <sec id="sec-3-1">
        <title>Model Building Phase</title>
        <p>For an observed data set X = (X~ 1; : : : ; X~ N ), we define =
fZ~ ; ~ ; ~; ~ g as the set of unknown random variables. The
main target of variational Bayes is to estimate a proper
approximation q( ) for the true posterior distribution p( jX ).
This problem can be solved by maximizing the free energy
F (X ; q), where F (X ; q) = R q( ) ln[p(X ; )=q( )]d . In
our algorithm, inspired by [Blei and Jordan, 2005], we
truncate the variational distribution q( ) at a value M , such
that M = 1, j = 0 when j &gt; M , and PjM=1 j =
1, where the truncation level M is a variational
parameter which can be freely initialized and will be optimized
automatically during the learning process [Blei and
Jordan, 2005]. In order to achieve tractability, we also
assume that the approximated posterior distribution q( ) can
be factorized into disjoint tractable factors as: q( ) =
[QiN=1 q(Zi)][QjM=1 QlD=1 q( jl)q( jl)][QjM=1 q( j )].
By maximizing the free energy F (X ; q) with respect to each
variational factor, we can obtain the following update
equations for these factors:</p>
        <p>N M M D
q(Z~ ) = Y Y ri1j[Zi=j]; q(~ ) = Y Y G( jljujl; vjl) (5)
i=1 j=1
q(~) =</p>
        <p>M D
Y Y G( jljsjl; tjl); q(~ ) =</p>
        <p>M</p>
        <p>Y Beta( jjaj; bj) (6)
where hXcli denotes average over all data points contained
in clump c.</p>
        <p>The first step of the compression phase is to assign each
clump or data point to the component with the highest
responsibility rcj calculated from the model building phase as
Ic = arg max rcj
j
(10)
make an inference at some target time T where T N . we
can tackle this problem by scaling the observed data to the
target size T , which is equivalent to using the variational
posterior distribution of the observed data N as a predictive model
of the future data [Gomes et al., 2008]. We then have a
modified free energy for the compression phase in the following
form</p>
        <p>F =</p>
        <p>M D
X X
where jncj represents the number of data points in clump
c and NT is the data magnification factor. The corresponding
update equations for maximizing this free energy function can
be obtained as
(8)
(9)
where ( ) is the digamma function, and h i is the
expectation evaluation. Note that, Re is the lower bound of
R = ln (( )+( )) . Since this expectation is intractable,
the second-order Taylor series expansion is applied to find
its lower bound. The expected values in the above
formulas are given by hZi = ji = rij , jl = h jli = ujl=vjl,
jl = h jli = sjl=tjl, hln j i =
hln(1 j )i = (bj )
(aj +bj ), hln jli =
(aj )</p>
        <p>(aj + bj ),
(ujl) ln vjl
and hln jli = (sjl) ln tjl .</p>
        <p>After convergence, the currently observed data points are
clustered into M groups according to corresponding
responsibilities rij through Eq. (7). According to [Gomes et al.,
2008], these newly formed groups of data points are also
denoted as “clumps”. Following [Gomes et al., 2008], these
clumps are subject to the constraint that all data points X~ i in
the clump c share the same q(Zi) q(Zc) which is a key
factor in the following compression phase.</p>
        <p>Algorithm 1
1: Choose the initial truncation level M .
2: Initialize the values for hyper-parameters ujl, vjl, sjl, tjl and
j .
3: Initialize the values of rij by K-Means algorithm.
4: while More data to be observed do
5: Perform the model building phase through Eqs. (5) and (6).
6: Initialize the compression phase using Eq. (10).
7: while MC C do
8: for j = 1 to M do
9: if evaluated(j) = false then
10: Split component j and refine this split using Eqs (9).
11: F (j) = change in Eq. (8).
12: evaluated(j) = true.
13: end if
14: end for
15: Split component j with the largest value of F (j).
16: M = M + 1.
17: end while
18: Discard the current observed data points.
19: Save resulting components into next learning round.
20: end while</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2 Compression Phase</title>
        <p>Within the compression phase, we need to estimate clumps
that are possibly belong to the same mixture component while
taking into consideration future arriving data. Now assume
that we have already observed N data points, our aim is to
where fIcg denote which component the clump (or data
point) c belongs to in the compression phase. Next, we
cycle through each component and split it along its principal
component into two subcomponents. This split is refined by
updating Eqs. (9). The clumps are then hard assigned to one
rcj =
ij =</p>
        <p>D
X
l=1</p>
        <p>exp( cj )
PM
j=1 exp( cj )</p>
        <p>j 1
+ ln j + X ln(1</p>
        <p>k)
k=1
ujl = ujl +
jncjrcj [ ( jl + jl)
of the two candidate components after convergence for
refining the split. Among all the potential splits, we select the one
that results in the largest change in the free energy (Eq. (8)).
The splitting process repeats itself until a stopping criterion
is met. According to [Gomes et al., 2008], the stoping
criterion for the splitting process can be expressed as a limit on
the amount of memory required to store the components. In
our case, the component memory cost for the mixture model
is MC = 2DNc, where 2D is the number of parameters
contained in a D-variate GD component, and Nc is the number
of components. Accordingly, We can define an upper limit
on the component memory cost C, and the compression phase
stops when MC C. As a result, the computational time and
the space requirement is bounded in each learning round.
After the compression phase, the currently observed data points
are discarded while the resulting components can be treated
in the same way as data points in the next round of leaning.
Our incremental variational inference algorithm for infinite
GD mixture model is summarized in Algorithm 1.
coast
forest
highway
inside-city
mountain
open country
street
tall building</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Visual Scenes Clustering</title>
      <p>In this section, the effectiveness of the proposed incremental
infinite GD mixture model (InGDMM) is tested on a
challenging real-world application namely visual scenes
clustering. The problem is important since images are being
produced at exponential increasing rates and very challenging
due to the difficulty of capturing the variability of appearance
and shape of diverse objects belonging to the same scene,
while avoiding confusing objects from different scenes. In
our experiments, we initialize the truncation level M as
15. The initial values of the hyperparameters are set as:
(ujl; vjl; sjl; tjl; j ) = (1; 0:01; 1; 0:01; 0:1), which have
been found to be reasonable choices according to our
experimental results.
4.1</p>
      <sec id="sec-4-1">
        <title>Database and Experimental Design</title>
        <p>In this paper, we test our approach on a challenging and
publicly available database known as the OT database, which was
introduced by Oliva and Torralba [Oliva and Torralba, 2001]
1. This database contains 2,688 images with the size of 256
1OT database is available at: http://cvcl.mit.edu/database.htm.
256 pixels, and is composed of eight urban and natural scene
categories: coast (360 images), forest (328 images), highway
(260 images), inside-city (308 images), mountain (374
images), open country (410 images), street (292 images), and
tall building (356 images). Figure 1 shows some sample
images from the different categories in the OT database.
Our methodology is based on the proposed incremental
infinite GD mixture model in conjunction with a bag-of-visual
words representation, and can be summarized as follows:
Firstly, we use the Difference-of-Gaussians (DoG) interest
point detector to extract Scale-invariant feature transform
(SIFT) descriptors (128-dimensional) [Lowe, 2004] from
each image. Secondly, K-Means algorithm is adopted to
construct a visual vocabulary by quantizing these SIFT
vectors into visual words. As a result, each image is
represented as the frequency histogram over the visual words. We
have tested different sizes of the visual vocabulary jWj =
[100; 1000], and the optimal performance was obtained for
jWj = 750 according to our experimental results. Then, the
Probabilistic Latent Semantic Analysis (pLSA) model
[Hofmann, 2001] is applied to the obtained histograms to
represent each image by a 55-dimensional proportional vector
where 55 is the number of latent aspects. Finally, the
proposed InGDMM is deployed to cluster the images supposed
to arrive in a sequential way.
In our experiments, we randomly divided the OT database
into two halves: one for constructing the visual vocabulary,
another for testing. Since our approach is unsupervised, the
class labels are not involved in our experiments, except for
evaluation of the clustering results. The entire methodology
was repeated 30 times to evaluate the performance. For
comparison, we have also applied three other mixture-modeling
approaches: the finite GD mixture model (FiGDMM), the
infinite Gaussian mixture model (InGMM) and the finite
Gaussian mixture model (FiGMM). To make a fair comparison,
all of the aforementioned approaches are learned through
incremental variational inference. Table 1 shows the
average confusion matrix of the OT database calculated by the
proposed InGDMM. Table 2 illustrates the average
categorization performance using different approaches for the OT
database. As we can see from this table, it is obvious that
our approach (InGDMM) provides the best performance in
terms of the highest categorization rate (77.47%) among all
the tested approaches. In addition, we can observe that better
performances are obtained for approaches that adopt the
infinite mixtures (InGDMM and InGMM) than the
corresponding finite mixtures (FiGDMM and FiGMM), which
demonstrate the advantage of using infinite mixture models over
finite ones. Moreover, according to Table 2, GD mixture has
higher performance than Gaussian mixture which verifies that
the GD mixture model has better modeling capability than the
Gaussian for proportional data clustering.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>In this work, we have presented an incremental
nonparametric Bayesian approach for clustering. The proposed
approach is based on infinite GD mixture models with a
Dirichlet process framework, and is learned using an incremental
variational inference framework. Within this framework, the
model parameters and the number of mixture components
are determined simultaneously. The effectiveness of the
proposed approach has been evaluated on a challenging
application namely visual scenes clustering. Future works could be
devoted to the application of the proposed algorithm for other
data mining tasks involving continually changing or growing
volumes of proportional data.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>[Attias</source>
          , 1999]
          <string-name>
            <given-names>H.</given-names>
            <surname>Attias</surname>
          </string-name>
          .
          <article-title>A variational Bayes framework for graphical models</article-title>
          .
          <source>In Proc. of Advances in Neural Information Processing Systems (NIPS)</source>
          , pages
          <fpage>209</fpage>
          -
          <lpage>215</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>[Blei and Jordan</source>
          , 2005]
          <string-name>
            <given-names>D.M.</given-names>
            <surname>Blei</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.I.</given-names>
            <surname>Jordan</surname>
          </string-name>
          .
          <article-title>Variational inference for Dirichlet process mixtures</article-title>
          .
          <source>Bayesian Analysis</source>
          ,
          <volume>1</volume>
          :
          <fpage>121</fpage>
          -
          <lpage>144</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <source>[Bouguila and Ziou</source>
          , 2006]
          <string-name>
            <given-names>N.</given-names>
            <surname>Bouguila</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Ziou</surname>
          </string-name>
          .
          <article-title>A hybrid SEM algorithm for high-dimensional unsupervised learning using a finite generalized Dirichlet mixture</article-title>
          .
          <source>IEEE Transactions on Image Processing</source>
          ,
          <volume>15</volume>
          (
          <issue>9</issue>
          ):
          <fpage>2657</fpage>
          -
          <lpage>2668</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <source>[Bouguila and Ziou</source>
          , 2007]
          <string-name>
            <given-names>N.</given-names>
            <surname>Bouguila</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Ziou</surname>
          </string-name>
          .
          <article-title>Highdimensional unsupervised selection and estimation of a finite generalized Dirichlet mixture model based on minimum message length</article-title>
          .
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          ,
          <volume>29</volume>
          :
          <fpage>1716</fpage>
          -
          <lpage>1731</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [Boutemedjet et al.,
          <year>2009</year>
          ]
          <string-name>
            <given-names>S.</given-names>
            <surname>Boutemedjet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Bouguila</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Ziou</surname>
          </string-name>
          .
          <article-title>A hybrid feature extraction selection approach for high-dimensional non-Gaussian data clustering</article-title>
          .
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          ,
          <volume>31</volume>
          (
          <issue>8</issue>
          ):
          <fpage>1429</fpage>
          -
          <lpage>1443</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [Constantinopoulos et al.,
          <year>2006</year>
          ]
          <string-name>
            <given-names>C.</given-names>
            <surname>Constantinopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.K.</given-names>
            <surname>Titsias</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Likas</surname>
          </string-name>
          .
          <article-title>Bayesian feature and model selection for Gaussian mixture models</article-title>
          .
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          ,
          <volume>28</volume>
          (
          <issue>6</issue>
          ):
          <fpage>1013</fpage>
          -
          <lpage>1018</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <source>[Corduneanu and Bishop</source>
          , 2001]
          <string-name>
            <given-names>A.</given-names>
            <surname>Corduneanu</surname>
          </string-name>
          and
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Bishop</surname>
          </string-name>
          .
          <article-title>Variational Bayesian model selection for mixture distributions</article-title>
          .
          <source>In Proc. of the 8th International Conference on Artificial Intelligence and Statistics (AISTAT)</source>
          , pages
          <fpage>27</fpage>
          -
          <lpage>34</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [Fan et al.,
          <year>2012</year>
          ]
          <string-name>
            <given-names>W.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Bouguila</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Ziou</surname>
          </string-name>
          .
          <article-title>Variational learning for finite Dirichlet mixture models and applications</article-title>
          .
          <source>IEEE Transactions on Neural Netw. Learning Syst</source>
          .,
          <volume>23</volume>
          (
          <issue>5</issue>
          ):
          <fpage>762</fpage>
          -
          <lpage>774</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [Fan et al.,
          <year>2013</year>
          ]
          <string-name>
            <given-names>Wentao</given-names>
            <surname>Fan</surname>
          </string-name>
          , Nizar Bouguila, and
          <string-name>
            <given-names>Djemel</given-names>
            <surname>Ziou</surname>
          </string-name>
          .
          <article-title>Unsupervised hybrid feature extraction selection for high-dimensional non-Gaussian data clustering with variational inference</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          ,
          <volume>25</volume>
          (
          <issue>7</issue>
          ):
          <fpage>1670</fpage>
          -
          <lpage>1685</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [Gomes et al.,
          <year>2008</year>
          ]
          <string-name>
            <given-names>R.</given-names>
            <surname>Gomes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Welling</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Perona</surname>
          </string-name>
          .
          <article-title>Incremental learning of nonparametric Bayesian mixture models</article-title>
          .
          <source>In Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <source>[Hofmann</source>
          , 2001]
          <string-name>
            <given-names>T.</given-names>
            <surname>Hofmann</surname>
          </string-name>
          .
          <article-title>Unsupervised learning by probabilistic latent semantic analysis</article-title>
          .
          <source>Machine Learning</source>
          ,
          <volume>42</volume>
          (
          <issue>1</issue>
          /2):
          <fpage>177</fpage>
          -
          <lpage>196</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>[Li</surname>
          </string-name>
          et al.,
          <year>2007</year>
          ] L.
          <string-name>
            <surname>-J. Li</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
            , and
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Fei-Fei</surname>
          </string-name>
          .
          <article-title>Optimol: automatic online picture collection via incremental model learning</article-title>
          .
          <source>In Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <source>[Lowe</source>
          , 2004]
          <string-name>
            <given-names>D.G.</given-names>
            <surname>Lowe</surname>
          </string-name>
          .
          <article-title>Distinctive image features from scale-invariant keypoints</article-title>
          .
          <source>International Journal of Computer Vision</source>
          ,
          <volume>60</volume>
          (
          <issue>2</issue>
          ):
          <fpage>91</fpage>
          -
          <lpage>110</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <source>[Neal</source>
          ,
          <year>2000</year>
          ]
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Neal</surname>
          </string-name>
          .
          <article-title>Markov chain sampling methods for Dirichlet process mixture models</article-title>
          .
          <source>Journal of Computational and Graphical Statistics</source>
          ,
          <volume>9</volume>
          (
          <issue>2</issue>
          ):
          <fpage>249</fpage>
          -
          <lpage>265</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <source>[Oliva and Torralba</source>
          , 2001]
          <string-name>
            <given-names>A.</given-names>
            <surname>Oliva</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Torralba</surname>
          </string-name>
          .
          <article-title>Modeling the shape of the scene: A holistic representation of the spatial envelope</article-title>
          .
          <source>International Journal of Computer Vision</source>
          ,
          <volume>42</volume>
          :
          <fpage>145</fpage>
          -
          <lpage>175</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [Opelt et al.,
          <year>2006</year>
          ]
          <string-name>
            <given-names>A.</given-names>
            <surname>Opelt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pinz</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Zisserman</surname>
          </string-name>
          .
          <article-title>Incremental learning of object detectors using a visual shape alphabet</article-title>
          .
          <source>In Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          , volume
          <volume>1</volume>
          , pages
          <fpage>3</fpage>
          -
          <lpage>10</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <source>[Sethuraman</source>
          , 1994]
          <string-name>
            <given-names>J.</given-names>
            <surname>Sethuraman</surname>
          </string-name>
          .
          <article-title>A constructive definition of Dirichlet priors</article-title>
          .
          <source>Statistica Sinica</source>
          ,
          <volume>4</volume>
          :
          <fpage>639</fpage>
          -
          <lpage>650</lpage>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [Sheikh et al.,
          <year>2007</year>
          ]
          <string-name>
            <given-names>Y.A.</given-names>
            <surname>Sheikh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.A.</given-names>
            <surname>Khan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Kanade</surname>
          </string-name>
          .
          <article-title>Mode-seeking by medoidshifts</article-title>
          .
          <source>In Proc. of the IEEE 11th International Conference on Computer Vision</source>
          (ICCV), pages
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [Teh et al.,
          <year>2004</year>
          ]
          <string-name>
            <given-names>Y.W.</given-names>
            <surname>Teh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.I.</given-names>
            <surname>Jordan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.J.</given-names>
            <surname>Beal</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.M.</given-names>
            <surname>Blei</surname>
          </string-name>
          .
          <article-title>Hierarchical Dirichlet processes</article-title>
          .
          <source>Journal of the American Statistical Association</source>
          ,
          <volume>101</volume>
          :
          <fpage>705</fpage>
          -
          <lpage>711</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>