<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Modelling Data Segmentation for Image Retrieval Systems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Leticia Flores-Pulido</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oleg Starostenko</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gustavo Rodr´ıguez-G´omez</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vicente Alarc´on-Aquino</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>The analysis of a large amount of data with complex structures is a challenging task in engineering and scientific applications. The data segmentation task usually involves either probabilistic or statistical approaches, however, global minima disadvantage is still present in each mentioned approaches. A combination of these two approaches is based on subspace arrangements avoiding global minima in classification methods. In this paper we propose a new approach for a generalized principal component analysis algorithm (GPCA) improving knowledge representation in images. The linear algebra concepts achieve an abstraction of data sets whose items as images, documents or stellar spectra could be handled providing a knowledge description for image classification process of involved data. We describe a solution to optimization GPCA function using Gutmann Algorithm for segmentation data sets.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The best classification methods for Visual Image Retrieval systems have low
performance when image set are noised or not homogeneous. Knowledge
representation could imply definition of geometric dispersion, statistical analysis,
abstract representation of data sets which improves segmentation data and
classification results. The statistical methods improves classification ability, however
they imply initialization parameters that lead to low performance and incorrect
clustering. The probabilistic methods do not require initialization and provide
significant advantages in discrimination process for segmentation of data sets.
Probabilistic methods combined with statistical treatment allow an abstract
representation that increases such reliability of classifications tasks as the
classification of image data sets depending on knowledge representation parameters.
In this work a GPCA algorithm
        <xref ref-type="bibr" rid="ref3">(Vidal, 2004)</xref>
        requires an adaptive function
improved with Gutmann algorithm
        <xref ref-type="bibr" rid="ref14">(Regis et al. 2007)</xref>
        in classification tasks is
described. The main problem goal is to decrease the error of data sets
classification. Our proposal consist in hybrid linear model implementation for image data
sets segmentation combining them best features of statistical and probabilistic
approaches in order to overcome local minima problem presented in
computational methods such as neural networks, EM, PCA, ICA, and others
        <xref ref-type="bibr" rid="ref13">(Ma et al.,
2008)</xref>
        . We propose an alternative to stochastic methods for optimization of
iteration process of GPCA algorithm. We purpose improving the GPCA algorithm
for segmentation of data sets with high dimensionality (Section 3). A stochastic
method can be used in cases phase, however a deterministic method also can
be implemented GPCA optimizing (Section 5). The Gutmann algorithm allows
optimization and adaptating process for objective functions with less complexity
than a stochastic method.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Segmentation Data Problem</title>
      <p>
        A typical learning uses statistical or probabilistical data analysis. Huge
collections handled mixed data that are typically modelled as a a group of samples
{s1, s2, sn} ⊂ IRn are obtained from a learning approach with some
probabilistic distribution. Each one of the samples has a domain of values and they are
composed of a set of vectors. Every vector can be modelled as a singular value
array that describes relevant knowledge about the features of a sample as an
image, a stellar spectrum, or a document. These features represents knowledge
content about image data set. The main problem here is the implementation of
some approaches that not only avoid local minima disadvantages, but permit to
use a hybrid approach as GPCA. GPCA can be improved in its optimization
process trough not only evolution strategies but other kind of approaches as
deterministic methods. One of the deterministic method employed for reduction
of iterations is based on the Gutmann algorithm. Gutmann algorithm
        <xref ref-type="bibr" rid="ref14">(Regis
et al. 2007)</xref>
        allows a cheaper iterative function. We propose to design a
mathematical abstraction for data segmentation, using GPCA algorithm by adding
an optimization phase. GPCA makes calls to objective function, but if
dimensionality of data increase, its optimization process has not so well performed.
Iteration process in GPCA for segmentation data is computationally intractable
when data dimensionality increase concluding propose an alternative approach
to overcome former disadvantages with Gutmann implementation algorithm
inside GPCA. Gutmann algorithm allows successfully global minima search and
guarantee decreasing number of iterations comparing with stochastic methods
like genetic algorithms, evolution strategies, or tabu search. Also, Gutmann
algorithm encourage high data dimensionality analysis when iteration process are
computationally intractable.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Modelling Data Segmentation</title>
      <p>The general outline of our proposal titled Modelling Data Segmentation (MDS)
is shown in Figure 1. The proposal is divided in to three levels: The abstraction
level that describes an improving of generalized principal components analysis
algorithm (GPCA) for data segmentation. The abstraction level is related to
procedures such as rings of polynomials, veronese maps, and vanishing ideals
for design an alternative approach based on subspace arrangements. The
operational level is composed by subspace arangements, data segmentation method,
and GPCA improving procedures at this level the new contributions will be
implemented. Finally, the system level consists of experimentation with input data
sets, testing and evaluation of the model, including analysis of classification
results of subspace arrangements with the method proposed.
4</p>
    </sec>
    <sec id="sec-4">
      <title>State of Art</title>
      <p>
        Several works about segmentation methods for subspace arrangements of data
sets have been developed. Subspace arrangements can be computed using
vanishing ideal in polynomial rings. Some of the works that describe this process
are
        <xref ref-type="bibr" rid="ref1">(Bjorner, 2005)</xref>
        ,
        <xref ref-type="bibr" rid="ref2">(Derksen, 2005)</xref>
        , and
        <xref ref-type="bibr" rid="ref3">(Vidal, 2004)</xref>
        . The concept of outlier
in subspace arrangements refers to elements that lie out of some well classified
data group. Some research papers written by
        <xref ref-type="bibr" rid="ref4">(Sugaya, 2003)</xref>
        and
        <xref ref-type="bibr" rid="ref5">(Yang, 2006)</xref>
        help to detect and to avoid outliers in subspace arrangements. Other models for
detection of subspaces are described in the following papers:
        <xref ref-type="bibr" rid="ref6">(Kanatani, 2001)</xref>
        ,
        <xref ref-type="bibr" rid="ref7">(Kanatani, 2004)</xref>
        and
        <xref ref-type="bibr" rid="ref8">(Roweis 2000)</xref>
        as alternatives for well known approaches
for subspace arrangements. The relevant related works that describe GPCA
improvement are
        <xref ref-type="bibr" rid="ref9">(Huang, 2004)</xref>
        ,
        <xref ref-type="bibr" rid="ref13">(Ma, 2008)</xref>
        ,
        <xref ref-type="bibr" rid="ref10">(Ma, 2005)</xref>
        ,
        <xref ref-type="bibr" rid="ref11">(Rao, 2005)</xref>
        and
        <xref ref-type="bibr" rid="ref12">(Vidal,
2005)</xref>
        .
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Methodology</title>
      <p>
        The adopted methodology computes subspace arrangements from knowledge
representation of visual data sets, representing images or multimedia documents
frequently used in engineering and science applications. Our proposal imply
mathematical treatment of data using extension of GPCA algorithm
        <xref ref-type="bibr" rid="ref13">(Ma et al., 2008)</xref>
        .
The novelty of our proposal is detailed in Figure 2.
      </p>
      <p>
        GPCA algorithm describes subspace arrangements with special features. Hilbert
function is a special kind of subspace arrangement that represents a data set with
particular invariances
        <xref ref-type="bibr" rid="ref15">(Lang, 2002)</xref>
        and
        <xref ref-type="bibr" rid="ref1">(Bjorner, 2005)</xref>
        . Radial Basis Function
is the core of Gutmann algorithm
        <xref ref-type="bibr" rid="ref14">(Regis et al. 2007)</xref>
        through which GPCA
improvement could be achieved. In this section the mentioned previously
approaches are presented by formal definitions as it follows.
      </p>
      <p>Definition 1. (Subspace arrangement). A subspace arrangement in IFD is the
union
(1)
(2)
.</p>
      <p>A = V1 ∪ V2 ∪ ... ∪ Vn.</p>
      <p>of n subspaces V1, V2, ..., Vn of IFD.</p>
      <p>Definition 2. (Radial Basis Function Interpolation). Given n distinct points
x1, ..., xn ∈ IRD where the function values f (x1, ..., xn) are known, we use
an interpolant of the form</p>
      <p>n
sn(x) = X λiφ(k x − xi k) + p(x), x ∈ IRd</p>
      <p>
        i=1
where k . k is the Euclidean norm, λi ∈ IRD for i = 1, ..., n, p ∈ Πnd (the linear
space of polynomials in d variables of degree less than or equal to m), and
φ is a real valued function that can take many forms. Gutmann algorithm
        <xref ref-type="bibr" rid="ref14">(Regis et al. 2007)</xref>
        works better with φ(r) = r2logr, r &gt; 0 and φ(0) = 0.
      </p>
    </sec>
    <sec id="sec-6">
      <title>Experimentation</title>
      <p>
        This research implies the first step of generation of data sets and data
functions. The data sets can be generated randomly, and they need a
definition of vector size and a number of dimensions. This can be done with
haltonseq.m code written by Daniel Dougherty (MCFE link). An
alternative for random data sets generation are Schoen functions
        <xref ref-type="bibr" rid="ref18">(Schoen, 1992)</xref>
        useful in global optimization problems with special features such as: easy
to built for any dimension, global minimum and maximum known a
priori, controllable smoothness as well as number and location of stationary
points. After generation of random points it is important to extract
vanishing ideals
        <xref ref-type="bibr" rid="ref13">(Ma et al. 2008)</xref>
        from data sets. The vanishing ideals can be
obtained with by applying MACULAY routine
        <xref ref-type="bibr" rid="ref19">(Grayson, 1997)</xref>
        . One
example of its use is shown in Figure 3(a). Vanishing ideals provides
information about number and dimension of subspace arrangements of segmented
data sets. Once vanishing ideals has been detected, we will accomplish
experimentations with original code of Kanatani used for subspace detection
http://perception.csl.uiuc.edu/gpca
        <xref ref-type="bibr" rid="ref16">(Kanatani, 2002)</xref>
        . This code has several
tests for global optimization problem using generalized principal component
analysis applied to data segmentation. We will use this code for applications
with image data sets. Figure 3(b) shows an example of GPCA algorithm
applied to data sets.
7
      </p>
    </sec>
    <sec id="sec-7">
      <title>Conclusions</title>
      <p>We have exposed a novel approach for modelling data segmentation. We
purpose GPCA for extracting knowledge representation from image data sets.
Our method allows GPCA improvement to high dimensional data
classification. The methodology involves vanishing ideals computation, and subspace
detection. Improve GPCA is stated in Gutmann Algorithm useful for high
dimensional data sets. Gutmann Algorithm works with radial basis function
interpolation. We have been explained the methodology that improves in
knowledge representation methods for image data sets.
7</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>[Bjorner</source>
          ,
          <year>2005</year>
          ]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bjorner</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Peeva</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Sidman</surname>
          </string-name>
          ,
          <article-title>Subspace arrangements defined by products of linear forms</article-title>
          ,
          <source>J. London Math. Soc. (2) 71</source>
          (
          <year>2005</year>
          )
          <fpage>273</fpage>
          -
          <lpage>288</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>[Derksen</source>
          , 2005]
          <string-name>
            <given-names>H.</given-names>
            <surname>Derksen</surname>
          </string-name>
          : Hilbert Series of Subspace Arrangements, preprint, arXiv.org,
          <year>2005</year>
          ; available online from http://arxiv.org/abs/math/0510584.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <source>[Vidal</source>
          ,
          <year>2004</year>
          ]
          <string-name>
            <given-names>R.</given-names>
            <surname>Vidal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ma</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Piazzi</surname>
          </string-name>
          ,
          <article-title>A new GPCA algorithm for clustering subspaces by fitting, differentiating and dividing polynomials</article-title>
          ,
          <source>in Proceedings of the IEEE International Conference on Computer Vision</source>
          and Pattern
          <string-name>
            <surname>Recognition</surname>
          </string-name>
          (
          <year>2004</year>
          )
          <fpage>510</fpage>
          -
          <lpage>517</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <source>[Sugaya</source>
          , 2003]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sugaya</surname>
          </string-name>
          and
          <string-name>
            <given-names>K.</given-names>
            <surname>Kanatani</surname>
          </string-name>
          ,
          <article-title>Outlier removal for motion tracking by subspace separation</article-title>
          ,
          <source>IEICE Trans. Inform. Systems</source>
          E86-D (
          <year>2003</year>
          )
          <fpage>1095</fpage>
          -
          <lpage>1102</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <source>[Yang</source>
          ,
          <year>2006</year>
          ]
          <string-name>
            <given-names>A.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rao</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <article-title>Robust statistical estimation and segmentation of multiple subspaces</article-title>
          ,
          <source>in Workshop on 25 years of RANSAC IEEE International Conference on Computer Vision</source>
          and Pattern
          <string-name>
            <surname>Recognition</surname>
          </string-name>
          (
          <year>2006</year>
          )
          <fpage>99</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>[Kanatani</source>
          , 2001]
          <string-name>
            <given-names>K.</given-names>
            <surname>Kanatani</surname>
          </string-name>
          ,
          <article-title>Motion Segmentation by Subspace Separation and Model Selection</article-title>
          ,
          <source>The 8th Internacional Conference in Computer Vision July</source>
          (
          <year>2001</year>
          ) Vancouver Canada (
          <volume>2</volume>
          )
          <fpage>586</fpage>
          -
          <lpage>591</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <source>[Kanatani</source>
          , 2004]
          <string-name>
            <given-names>K.</given-names>
            <surname>Kanatani</surname>
          </string-name>
          ,
          <article-title>For geometric inference from images, what kind of statistical model is necessary? Systems and Computers in Japan (35) 6 (</article-title>
          <year>2004</year>
          )
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <source>[Roweis</source>
          , 2000]
          <string-name>
            <given-names>S.</given-names>
            <surname>Roweis</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Saul</surname>
          </string-name>
          ,
          <article-title>Nonlinear dimensionality reduction by locally linear embedding</article-title>
          ,
          <source>Science</source>
          <volume>290</volume>
          (
          <year>2000</year>
          )
          <fpage>2323</fpage>
          -
          <lpage>2326</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <source>[Huang</source>
          ,
          <year>2004</year>
          ]
          <string-name>
            <given-names>K.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wagner</surname>
          </string-name>
          , Y. Ma:
          <article-title>Identification of hybrid linear timeinvariant systems via subspace embedding and segmentation</article-title>
          ,
          <source>in Proceedings of the IEEE Conference on Decision and Control</source>
          <volume>3</volume>
          (
          <year>2004</year>
          )
          <fpage>3227</fpage>
          -
          <lpage>3234</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [Ma, 2005]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ma</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Vidal</surname>
          </string-name>
          ,
          <article-title>A closed form solution to the identification of hybrid ARX models via the identification of algebraic varieties</article-title>
          ,
          <source>in Proceedings of the International Conference on Hybrid Systems Computation and Control</source>
          (
          <year>2005</year>
          )
          <fpage>449</fpage>
          -
          <lpage>465</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <source>[Rao</source>
          , 2005]
          <string-name>
            <given-names>S.</given-names>
            <surname>Rao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wagner</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <article-title>Segmentation of hybrid motions via hybrid quadratic surface analysis</article-title>
          ,
          <source>in Proceedings of IEEE International Conference on Computer Vision</source>
          (
          <year>2005</year>
          )
          <fpage>2</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <source>[Vidal</source>
          ,
          <year>2005</year>
          ]
          <string-name>
            <given-names>R.</given-names>
            <surname>Vidal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ma</surname>
          </string-name>
          , and S. Sastry:
          <article-title>Generalized principal component analysis (GPCA)</article-title>
          ,
          <source>IEEE Trans. Pattern Anal. Machine Intelligence</source>
          <volume>27</volume>
          (
          <year>2005</year>
          )
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [Ma et al. 2008]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Harm</surname>
          </string-name>
          ,
          <string-name>
            <surname>R.</surname>
          </string-name>
          <article-title>Fossum: Estimation of Subspace Arrangements with Applications in Modelling and Segmenting Mixed Data</article-title>
          ,
          <source>SIAM Review Society for industrial and applied mathematics</source>
          , Vol
          <volume>50</volume>
          Num 3
          <fpage>413</fpage>
          -
          <lpage>458</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [Regis et al.
          <year>2007</year>
          ]
          <string-name>
            <given-names>R. G.</given-names>
            <surname>Regis</surname>
          </string-name>
          , Ch. A. Shoemaker:
          <article-title>Improved strategies for radial basis function methods for global optimization</article-title>
          ,
          <source>J Glob Optim</source>
          (
          <year>2007</year>
          )
          <volume>37</volume>
          :
          <fpage>113</fpage>
          -
          <lpage>135</lpage>
          , DOI 10.1007/s10898-006-9040-1
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <source>[Lang</source>
          , 2002] S. Lang: Algebra, Springer-Verlag, New York,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <source>[Kanatani</source>
          , 2002]
          <string-name>
            <given-names>K.</given-names>
            <surname>Kanatani</surname>
          </string-name>
          ,
          <article-title>Motion segmentation by subspace separation: Model selection and reliability evaluation</article-title>
          ,
          <source>Int. J. Image Graphics</source>
          ,
          <volume>2</volume>
          (
          <year>2002</year>
          ),
          <fpage>179</fpage>
          -
          <lpage>197</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <source>[Gutmann</source>
          , 2001
          <string-name>
            <surname>] H.M. Gutmann</surname>
            ,
            <given-names>A Radial</given-names>
          </string-name>
          <string-name>
            <surname>Basis</surname>
          </string-name>
          <article-title>Function Method for Global Optimization</article-title>
          ,
          <source>Journal of Global Optimization</source>
          <volume>19</volume>
          :
          <fpage>201</fpage>
          -
          <lpage>227</lpage>
          , (
          <year>2001</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <source>[Schoen</source>
          , 1992]
          <string-name>
            <given-names>F.</given-names>
            <surname>Schoen</surname>
          </string-name>
          ,
          <article-title>A wide class of test functions for global optimization</article-title>
          ,
          <source>Journal of Global Optimization</source>
          ,
          <volume>3</volume>
          :
          <fpage>133</fpage>
          -
          <lpage>137</lpage>
          ,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <source>[Grayson</source>
          , 1993]
          <string-name>
            <given-names>D.</given-names>
            <surname>Grayson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Stillman</surname>
          </string-name>
          , Macaulay 2
          <article-title>- a system for computation in algebraic geometry and commutative algebra</article-title>
          , http:/www.math.uiuc.
          <source>edu/Macaulay2</source>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>