<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>UAIC participation at Robot Vision @ 2012 An updated vision</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Emanuela Boros</string-name>
          <email>emanuela.boros@info.uaic.ro</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexandru Lucian G</string-name>
          <email>lucian.ginsca@info.uaic.ro</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Adrian Iftene</string-name>
          <email>adiftene@info.uaic.ro</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Alexandru Ioan Cuza University, Faculty of Computer Science General Berthelot</institution>
          ,
          <addr-line>16, 700483, Iasi</addr-line>
          ,
          <country country="RO">Romania</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2012</year>
      </pub-date>
      <abstract>
        <p>In this paper we describe a system that participated in the fourth benchmarking activity ImageCLEF, in the Robot Vision task, for which we approach the task of topological localization without using a temporal continuity of the sequences of images. We provide details for the state-of-the-art methods that were selected: Color Histograms, SIFT (Scale Invariant Feature Transform), ASIFT (A ne SIFT) and RGBSIFT, Bag-of-Visual-Words strategy inspired from the text retrieval community. We focused on nding the optimal set of features and a deepened analysis was carried out. We o er an analysis of the di erent features, similarity measures and a performance evaluation of combinations of the proposed methods for topological localization. Also, we detail a genetic algorithm that was used for eliminating the false positives results. In the end, we draw several conclusions targeting the advantages of using proper con gurations of visual-based appearance descriptors, similarity measures and classi ers.</p>
      </abstract>
      <kwd-group>
        <kwd>Robot Topological Localization</kwd>
        <kwd>Global Features</kwd>
        <kwd>Invariant Local Features</kwd>
        <kwd>Visual Words</kwd>
        <kwd>SVMs</kwd>
        <kwd>Genetic Algorithm</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        In this paper, we present an approach to vision-based mobile robot localization
that uses a single perspective camera taken within an o ce environment. The
robot should be able to answer the question where are you? when presented with
a test sequence representing a room category seen during training [
        <xref ref-type="bibr" rid="ref25 ref30 ref33">30, 33, 25</xref>
        ].
We analyze the problem without taking in consideration the use of the temporal
continuity of the sequences of images. We perform an exhaustive evaluation and
introduce a new analysis statistic between quantization techniques of a large set
of features, from which di erent system con gurations are picked and tested.
      </p>
      <p>
        Traditionally, robot vision systems heavily relied on di erent methods for
robotic topological localization such as topological map building which makes
good use of temporal continuity [
        <xref ref-type="bibr" rid="ref37">37</xref>
        ], panoramic vision creation [
        <xref ref-type="bibr" rid="ref38">38</xref>
        ],
simultaneous localization and mapping [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], appearance-based place recognition for
topological localization [
        <xref ref-type="bibr" rid="ref38">38</xref>
        ], Monte-Carlo localization [
        <xref ref-type="bibr" rid="ref41">41</xref>
        ].
      </p>
      <p>
        The problem of topological mobile localization has mainly three dimensions:
a type of environment (indoor, outdoor, outdoor natural), a perception (sensing
modality) and a localization model (probabilistic, basic). Numerous papers deal
with indoor environments [
        <xref ref-type="bibr" rid="ref10 ref21 ref37 ref38">37, 38, 10, 21</xref>
        ] and a few deal with outdoor
environments, natural or urban [
        <xref ref-type="bibr" rid="ref13 ref36">36, 13</xref>
        ].
      </p>
      <p>
        Current work on robot localization in indoor environments has focused on
introducing probabilistic models to improve local feature matching and the
integration of speci c kernels. Experimental results for wide baseline image
matching suggest the need for local invariant descriptors of images. Invariant features
have achieved relative success with object detection and image matching. There
has also been research into the development of fully invariant features [
        <xref ref-type="bibr" rid="ref26 ref27 ref4">4, 26,
27</xref>
        ]. In his milestone paper [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ], D. Lowe has proposed a scale invariant feature
transform (SIFT) that is invariant to image scaling and rotation, illumination
and viewpoint changes. Lately, a new method has been proposed, A ne-SIFT
(ASIFT) that simulates all the views obtainable by varying the two camera axis
orientation parameters, namely the latitude and the longitude angles [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ].
      </p>
      <p>
        The Bag-of-Visual-Words [
        <xref ref-type="bibr" rid="ref12 ref8">8, 12</xref>
        ] model is a great addition to place
recognition and was initially inspired by the bag-of-words models in text classi cation
where a document is represented by an unsorted set of the contained words. This
data modeling technique was rst been introduced in the case of video retrieval
[
        <xref ref-type="bibr" rid="ref35">35</xref>
        ]. Due to its e ciency and e ectiveness, it became very popular in the elds
of image retrieval and classi cation [
        <xref ref-type="bibr" rid="ref20 ref43">20, 43</xref>
        ].
      </p>
      <p>
        The classi cation level of images relies more on unsupervised then supervised
learning techniques. Categorizing in unsupervised learning scenarios is a much
harder problem, due to the absence of class labels that would guide the search
for relevant information. In supervised learning scenarios, image categorizing has
been studied widely in the literature. Among supervised learning techniques, the
most popular in this context are Bayesian classi ers [
        <xref ref-type="bibr" rid="ref12 ref18 ref19 ref8">8, 18, 12, 19</xref>
        ] and Support
Vector Machines (SVM) [
        <xref ref-type="bibr" rid="ref18 ref39 ref44 ref8">39, 8, 18, 44</xref>
        ]. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] also uses random forests. Actually,
state-of-the-art results are due to SVM classi ers: the method described in [
        <xref ref-type="bibr" rid="ref44">44</xref>
        ]
combines a local matching of the features and speci c kernels based on the Earth
Movers Distance [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ] or 2 [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] yielded the best results.
      </p>
      <p>
        Our approach represents an extension of our previous work [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ] where each
RGB image is processed to extract sets of SIFT keypoints from where the
descriptors are de ned. Making use of global and local features, a quantization
technique, SVMs and a genetic algorithm that aims at eliminating the false
positives, we approached the task of recognition with di erent con gurations and
the one that got the best results has been reviewed in the 2012 Robot Vision
task in ImageCLEF international campaign.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Image Analysis</title>
      <p>In this section, we describe the image features that have been used in this work
in order to obtain a precise and e ective model for the topological localization
task. In order to obtain an image representation which captures the essential
appearance of the location and is robust to occlusions and changes in image
brightness, we compare two di erent image descriptors and their associated
distance measure. In the rst case, we use color histograms integrated and in the
second case each image is represented by a set of local scale-invariant features,
quantized in bags of visual words.
2.1</p>
      <sec id="sec-2-1">
        <title>Global Features</title>
        <p>
          Many recognition systems based on images use global features that describe the
entire image, an overall view of the image that is transformed in histograms of
frequencies. Adopting the analysis of global features has brought great
improvement in robot localization systems as in [
          <xref ref-type="bibr" rid="ref33">33</xref>
          ] or in content based image retrieval
systems as in medical related images analysis in [
          <xref ref-type="bibr" rid="ref34">34</xref>
          ]. Such features are
important because they produce very compact representations of images, where each
image corresponds to a point in a high dimensional feature space.
        </p>
        <p>In the following, we attempt to model image densities using two di erent
color spaces, RGB and HSV.</p>
      </sec>
      <sec id="sec-2-2">
        <title>RGB (Red, Green, and Blue) Color Model is composed of the primary</title>
        <p>colors Red, Green, and Blue. They are considered the additive primaries since
the colors are added together to produce the desired color. White is produced
when all three primary colors at the maximum light intensity (255). The RGB
space has the major de ciency of not being perceptually uniform, this being the
motivation of adding HSV color histograms.</p>
        <p>HSV (Hue, Saturation, and Value) Color Model de nes colors in terms
of three constituent components: hue, saturation and value or brightness. The
hue and saturation components are intimately related to the way human eye
perceives color because they capture the whole spectrum of colors. The value
represents intensity of a color, which is decoupled from the color information
in the represented image. This color model is attractive because color image
processing performed independently on the color channels does not introduce
false colors (hues). However, it has also inconvenient due to the necessary
nonlinearity in forward and reverse transformations with RGB space.</p>
        <p>A color histogram denotes the joint probabilities of the intensities of the three
color channels and is computed by discretizing the colors within the image and
counting the number of pixels of each color. Since the number of colors is nite,
it is usually more convenient to transform the three channel histogram into a
single variable histogram, therefore a quantization of the histograms is needed.
The histogram dimension (the number of histogram bins) n is determined by the
color representation scheme and quantization level. Most color spaces represent
a color as a three-dimensional vector with real values (e.g. RGB, HSV). We
quantize the color space of three axes into k bins for the rst axis, l bins for the
second axis and m bins for the third axis. The histogram can be represented as
an n-dimensional vector where n = k l m. Because the retrieval performance is
saturated when the number of bins is increased beyond some value, normalized
color histogram di erence can be a satisfactory measure of frame dissimilarity,
even when colors are quantized into only 64 bins (4 Green 4 Red 4 Blue).
As a conclusion, we chose a 18 10 10 multidimensional HSV histogram, and
a 10 10 10 multidimensional RGB histogram, as di erences between colors of
the o ce environment have a high level of similarity and have slight changes in
hues.
2.2</p>
      </sec>
      <sec id="sec-2-3">
        <title>Local Features</title>
        <p>
          A di erent paradigm is to use local features, which are descriptors of local image
neighborhoods computed at multiple interest points. There are many local
features developed in the last years for image analysis, with the outstanding SIFT
as the most popular. In the literature, there are several works studying the di
erent features and their descriptors, for instance [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] evaluates the performance of
local descriptors, and [
          <xref ref-type="bibr" rid="ref44">44</xref>
          ] shows a study on the performance of di erent feature
for object recognition.
        </p>
        <p>
          The three types of features used in our experiments are SIFT (Scale
Invariant Feature Transform), ASIFT (A ne Scale Invariant Feature Transform) and
RGB-SIFT (RGB Scale Invariant Feature Transform).These features were
extracted using [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. Also, the localization experiments using these features show
advantages and disadvantages of using one or another.
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>SIFT (Scale Invariant Feature Transforms)[23, 4, 24] features corre</title>
        <p>
          spond to highly distinguishable image locations which can be detected e ciently
and have been shown to be stable across wide variations of viewpoint and scale.
The algorithm basically extracts features that are invariant to rotation, scaling
an partially invariant to changes in illumination an a ne transformations. This
feature has been explained in our previous work being one of our key level of
our systems [
          <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-5">
        <title>ASIFT (A ne Scale Invariant Feature Transforms), as described in</title>
        <p>
          [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ], simulates with enough accuracy all distortions caused by a variation of the
camera optical axis direction. Then it applies the SIFT method. In other words,
ASIFT simulates three parameters: the scale, the camera longitude angle and the
latitude angle and normalizes the other three (translation and rotation), what
SIFT lacked.
        </p>
        <p>
          RGB-SIFT (RGB Scale Invariant Feature Transforms) descriptors
are computed for every RGB channel independently. Therefore, each channel
is normalized separately which brings another important aspect for SIFT, the
invariance to light color change. For a color image, the SIFT descriptions
independently from each RGB component and concatenated into a 384-dimensional
local feature (RGB-SIFT) [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
2.3
        </p>
      </sec>
      <sec id="sec-2-6">
        <title>Feature Matching</title>
        <p>
          In this subsection we introduce di erent dissimilarity measures to compare
features. That is, a measure of dissimilarity between two features and thus between
the underlying images is calculated. Many of the features presented are in fact
histograms (color histograms, invariant feature histograms). As comparison of
distributions is a well known problem, a lot of comparison measures have been
proposed and compared before [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ].
        </p>
        <p>In the following, dissimilarity measures to compare two histograms H and
K are proposed. Each of these histograms has n bins and Hi is the value of the
i-th bin of histogram H.</p>
        <p>
          { Minkowski-form Distance (L1 distance is often used for computing
dissimilarity between color images, also experimented in color histograms
comparison [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]):
        </p>
        <p>D 2 (H; K) =</p>
        <p>X (Hi
i=1</p>
        <p>Ki)2</p>
        <p>Hi
DB(H; K) =
ln X p(HiKi)</p>
        <p>i=1
DLr(H; K) = (X jHi
i=1</p>
        <p>
          1
Kij) r
{ Jensen Shannon Divergence (also referred to as Je rey Divergence [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ],
is an empirical extension of the Kullback-Leibler Divergence. It is symmetric
and numerically more stable):
        </p>
        <p>DJSD(H; K) = X Hi log
i=1</p>
        <p>2Hi
Hi + Ki
+ Ki log</p>
        <p>2Ki
Ki + Hi
{</p>
        <p>
          2 Distance (measures how unlikely it is that one distribution was drawn
from the population represented by the other, [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ]):
(1)
(2)
(3)
(4)
{ Bhattacharyya Distance [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] (measures the similarity of two discrete or
continuous probability distributions). For discrete probability distributions
H and K over the same domain, it is de ned as:
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Classi cation</title>
      <p>Many of the features presented in Section 2 are in fact histograms (color
histograms, invariant feature histograms, texture histograms, local feature
histograms). As comparison of distributions is a well known problem, a lot of
comparison measures have been proposed in Section 2.3. To analyze the di erent
measure distances we summarize a well known choice for supervised classi
cation.</p>
      <p>
        Support Vector Machines are the state-of-the-art large margin classi ers
which recently gained popularity within visual pattern and object recognition
[
        <xref ref-type="bibr" rid="ref15 ref18 ref40 ref42 ref44 ref8">15, 8, 18, 44, 40, 42</xref>
        ]. Choosing the most appropriate kernel highly depends on the
problem at hand - and ne tuning its parameters can easily become a tedious
task. For our experimental setup, we chose the linear kernel (which is trivial
and won't be presented), the radial basis function and the 2 kernel, presented
below.
The Gaussian Kernel is an example of radial basis function kernel.
The
2 Kernel comes from the 2 distribution.
      </p>
      <p>
        Kg(x; y) = exp
kx
Recent advances in the image recognition eld have shown that
bag-of-visualwords [
        <xref ref-type="bibr" rid="ref12 ref8">8, 12</xref>
        ] - a strategy that draws inspiration from the text retrieval
community - approaches are a good method for many image classi cation problems.
BoVWs representations have recently become popular for content based image
classi cation because of their simplicity and extremely good performance.
      </p>
      <p>Basically, to give an estimation of the distribution we create histograms of
the local features. The key idea of the bag-of-visual-words representation is to
quantize each keypoint into one of the visual words that are often derived by
clustering. Typically k-means clustering is used. The size of the vocabulary k
is a user-supplied parameter. The visual words are the k cluster centers. The
baseline of our tests are based on a bag-of-visual-words with a 100 visual words,
meaning a 100-means clustering. The resulting k n-dimensional cluster centers
cj represent the visual words.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Experimental Setup</title>
      <p>In this section, we explain the experimental setup, then we present and
discuss the results. The di erent choices of distance measures and classi cation
parameters are analyzed performing also a comparison with previous work
results. Conclusions are drawn in bene t of an accurate solution for topological
localization, data modeling and classi cation.
4.1</p>
      <sec id="sec-4-1">
        <title>Datasets (Benchmark)</title>
        <p>The chosen dataset contains images from nine sections of an o ce obtained from</p>
      </sec>
      <sec id="sec-4-2">
        <title>CLEF (Conference on Multilingual and Multimodal Information Ac</title>
        <p>
          cess Evaluation). Detailed information about the dataset are in the overviews
and ImageCLEF publications [
          <xref ref-type="bibr" rid="ref25 ref30 ref33">30, 33, 25</xref>
          ]. The dataset has already been split into
three training sets of images, as shown in Table 1 one di erent from another.
The provided images are in the RGB color space. The sequences are acquired
within the same building and oor but there can be variations in the lighting
conditions (sunny, cloudy, night) or the acquisition procedure (clockwise and
counter clockwise).
        </p>
        <p>Areas
Corridor
ElevatorArea
LoungeArea
PrinterRoom
ProfessorO ce
StudentO ce
TechnicalRoom
Toilet
VisioConference
Finally, a method for the elimination of the unwanted results is performed,
therefore the retrieved classes for images (Corridor, LoungeArea etc.) depend on a
threshold, those below this value being rejected, with the meaning that the
system doesn't recognize the image. This becomes an optimization problem of
nding the best value that will cut the unwanted results, considering that it is
better to have no results than inconsistent results.</p>
        <p>
          We adapted the implementation of the genetic algorithm described in [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. In
order to capture the particularities of the distance measures that are correlated
with the rooms on which they are used, we considered a di erent threshold for
each room. As a justi cation for choosing multiple thresholds rather than a single
one, let us consider the case in which we are trying to classify images taken from
a room that is more distinguishable from the others. The values returned by
the similarity measures when comparing these images to others taken from the
same room are further apart from the values returned in the case of comparing
these images with others taken from di erent rooms. In contrast, if we consider
a room that is visually similar to others, these values will be closer on the real
axis. This is why it is harder to correctly separate erroneous classi cations for
the good ones with a single threshold.
        </p>
        <p>For the genetic algorithm, the chromosomes are vectors of length 9,
representing the thresholds for the 9 rooms. For the genetic operators we used the binary
representation of these vectors. The tness function evaluates the quality of the
thresholds and it is the measure used to score runs in the Robot Vision task. As
a selection strategy, we used the rank selection, which sorts the chromosomes
accordingly to their value given by the tness function. In the crossover process,
we don't allow the parent chromosomes which are the input for the crossover
to be the same individual as it could lead to early convergence. To prevent this
from happening, we rst select one chromosome from the population and then
run the selection process in a loop until a di erent chromosome is returned. We
also used elitism in order to assure the survival of the best chromosomes of each
generation. In order to balance the diversity of the population, this method is
accompanied by a slightly increased mutation probability.</p>
        <p>For these experiments, we used a population of 200 individuals, the mutation
probability of 0 : 15, and the crossover, of 0 : 7. The optimization process is
stopped after 1000 generations.
4.3</p>
      </sec>
      <sec id="sec-4-3">
        <title>Results Interpretation</title>
        <p>
          We are interested in observing the performances of the nal con gurations to
see which features/dissimilarity measures lead to good results and which do
not. As it is well known that combinations of di erent methods lead to good
results [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], an objective is to combine the brie y presented features. However,
it is not obvious how to combine the features. To analyze the characteristics of
Method
RGB-Only
HSV-Only
RGB-HSV
Basic-BoVW-SIFT
Basic-BoVW-SIFT+HSV+RGB
Basic-BoVW-ASIFT+HSV+RGB
SVM-RBF-BoVW-SIFT+HSV+RGB
SVM-LINEAR-BoVW-SIFT+HSV+RGB
SVM- 2-BoVW-SIFT+HSV+RGB
features and which features have similar properties, we perform an evaluation
on selected con gurations as shown in Table 2. The evaluation was performed
choosing Training 1 and 3 (Table 1) for training and Training 2 for testing.
        </p>
        <p>The rst column gives a description of the used training method. The
descriptions of the con gurations are straight forward, for example,
Basic-BoVWSIFT+HSV+RGB means a con guration of a combination of RGB and HSV
color histograms and Basic-BoVW-SIFT a bag of visual words formed with
SIFT feature vectors. The chosen measure distances were decided like this:
Jeffrey Divergence for RGB histograms, Bhattacharyya for HSV histograms and
Minkowski for SIFT feature vectors. The second column gives the recall values
for the training data, the third - the precisions. The F-measure is computed and
represented in the fourth column of the table. The table also shows that feature
selection only is not su cient to increase the recognition rate but more exibility
is needed here and this fact led to di erent combinations.</p>
        <p>The results are improved by the addition of the SVM classi cation step. We
also add the observation that a SVM classi cation of SIFT mapped on visual
words can get to a maximum of 52% accuracy, but these results are very assuring
in the context of a con guration in which are implied the usage of other feature
descriptors. Thereby, the con guration that combines SIFT words, HSV and
RGB histograms and a classi cation with a SVM with a RBF kernel yielded the
most satisfying result.
4.4</p>
      </sec>
      <sec id="sec-4-4">
        <title>ImageCLEF 2012 Robot Vision Task</title>
        <p>The fourth edition of the Robot Vision challenge focused on the problem of
multi-modal place classi cation. We had to classify functional areas on the basis
of image sequences, captured by a perspective camera and a kinect mounted on
a mobile robot within an o ce environment with nine rooms. We ranked third
out of seven registered groups.
# Group Score
1 CIII UTN FRC, Universidad Tecnologica Nacional, Ciudad Universitaria, Cor- 2071.0
doba, Argentina
2 NUDT, Department of Automatic Control, College of Mechatronics and Au- 1817.0
tomation, National University of Defense Technology, China
3 Faculty of Computer Science, Alexandru Ioan Cuza University 1348.0
(UAIC), Iasi, Rom^ania
4 USUroom409, Yekaterinburg, Russian Federation 1225.0
5 SKB Kontur Labs, Yekaterinburg, Russian Federation 1028.0
6 CBIRITU, Istanbul Technical University, Turkey 551.0
7 Intelligent Systems and Data Mining Group (SIMD), University of Castilla-La 462.0</p>
        <p>Mancha, Albacete, Spain
8 Bu aloVision, University at Bu alo, NY, United States -70.0
Our approach on topological localization is currently applied on an o ce
environment of nine sections: Corridor, ProfessorO ce, StudentO ce, LoungeArea,
PrinterRoom, Toilet, VisioConference, ElevatorArea and TechnicalRoom. To
address the problem of recognizing these sections separately, we approached the
classi cation with speci c thresholds in taking the nal decision over the selected
room. These thresholds create constraints that have to be loosened in order to
obtain an accurate result in treating situations of great similarity between two
di erent rooms. As an example, note that one of the main inconvenient that can
appear in this case is that the rooms are very connected and di cult situations
can rise as the robot moves around the o ce. For example, if the robot is in the
Corridor, it looks to its right and sees the LoungeArea but its position is still
in the Corridor. This type of situation creates noise that cannot be neglected,
therefore a proper threshold needs to treat these results that correspond to a
humanized interaction with the medium. The threshold on the nal decision
quality was chosen to avoid erroneous localizations, thus favoring a result that
doesn't specify any room and giving less correct localizations but also, less false
assumptions.
6</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>In this work, we approached the task of topological localization without using a
temporal continuity of the images and involving a broad variety of features for
image recognition. The provided information about the environment is contained
in images taken with a perspective color camera mounted on a robot platform
and it represents an o ce environment dataset o ered by ImageCLEF.</p>
      <p>The main contribution of this work stays in quanti able examinations of a
wide variety of di erent con gurations for a computer vision-based system and
signi cant results. The experiments show that the con gurations from di erent
feature descriptors and distance measures depend on the proper combinations.</p>
      <p>From the fact that most of the works cited are from the last couple of years,
topological localization is a new and active area of research, which is increasingly
producing interest and enforces further development. An important contribution
to this eld is given in this paper, along with notable experimental results, but
there is still room for improvement and further research.
7</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgement</title>
      <p>The research presented in this paper was funded by the Sector Operational
Program for Human Resources Development through the project \Development
of the innovation capacity and increasing of the research impact through
postdoctoral programs"POSDRU/89/1.5/S/49944.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>E.</given-names>
            <surname>Boros</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Rosca, and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Iftene</surname>
          </string-name>
          . Uaic:
          <article-title>Participation in imageclef 2009 robotvision task</article-title>
          .
          <source>Proceedings of the CLEF 2009 Workshop.</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>E.</given-names>
            <surname>Boros</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Rosca, and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Iftene</surname>
          </string-name>
          .
          <article-title>Using sift method for global topological localization for indoor environments</article-title>
          .
          <source>Multilingual Information Access Evaluation II. Multimedia Experiments [Lecture Notes in Computer Science Volume 6242 Part II]</source>
          ,
          <volume>6242</volume>
          :
          <fpage>277</fpage>
          {
          <fpage>282</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>A.</given-names>
            <surname>Bosch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zisserman</surname>
          </string-name>
          , and
          <string-name>
            <given-names>X.</given-names>
            <surname>Munoz</surname>
          </string-name>
          .
          <article-title>Image classi cation using random forests and ferns</article-title>
          . ICCV,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>M.</given-names>
            <surname>Brown</surname>
          </string-name>
          and D.G Lowe.
          <article-title>Invariant features from interest point groups</article-title>
          .
          <source>The 13th British Machine Vision Conference</source>
          , Cardi University, UK, pages
          <volume>253</volume>
          {
          <fpage>262</fpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>G. J.</given-names>
            <surname>Burghouts</surname>
          </string-name>
          and
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Geusebroek</surname>
          </string-name>
          .
          <article-title>Performance evaluation of local color invariants</article-title>
          .
          <source>CVIU</source>
          ,
          <volume>13</volume>
          (
          <issue>113</issue>
          ):
          <fpage>4862</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>E.</given-names>
            <surname>Choi</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Lee</surname>
          </string-name>
          .
          <article-title>Feature extraction based on the bhattacharyya distance</article-title>
          .
          <source>Pattern Recognition</source>
          ,
          <volume>36</volume>
          :
          <fpage>1703</fpage>
          {
          <fpage>1709</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>H.</given-names>
            <surname>Choset</surname>
          </string-name>
          and
          <string-name>
            <given-names>K.</given-names>
            <surname>Nagatani</surname>
          </string-name>
          .
          <article-title>Topological simultaneous localization and mapping (slam): toward exact localization without explicit localization</article-title>
          .
          <source>IEEE Trans. Robot</source>
          . Automat.,
          <volume>17</volume>
          (
          <issue>2</issue>
          ):
          <volume>125</volume>
          {
          <fpage>137</fpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>C.</given-names>
            <surname>Dance</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Willamowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bray</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Csurka</surname>
          </string-name>
          .
          <article-title>Visual categorization with bags of keypoints</article-title>
          .
          <source>ECCV International Workshop on Statistical Learning in Computer Vision</source>
          , Prague,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>T.</given-names>
            <surname>Deselaers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Keysers</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Ney</surname>
          </string-name>
          .
          <article-title>Features for image retrieval: An experimental comparison</article-title>
          .
          <source>Information Retrieval</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>G.</given-names>
            <surname>Dudek</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Jugessur</surname>
          </string-name>
          .
          <article-title>Robust place recognition using local appearance based methods</article-title>
          .
          <source>IEEE Intl. Conf. on Robotics and Automation (ICRA)</source>
          , pages
          <fpage>1030</fpage>
          {
          <fpage>1035</fpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>A. L. G</surname>
          </string-name>
          <article-title>^nsca and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Iftene</surname>
          </string-name>
          .
          <article-title>Using a genetic algorithm for optimizing the similarity aggregation step in the process of ontology alignment</article-title>
          .
          <source>Proceedings, of 9th International Conference RoEduNet IEEE.</source>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>D.</given-names>
            <surname>Gokalp</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Aksoy</surname>
          </string-name>
          .
          <article-title>Scene classi cation using bag-of-regions representations</article-title>
          .
          <source>Proceedings of CVPR, pages 1{8</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>J.-J.</surname>
            Gonzalez-Barbosa and
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Lacroix</surname>
          </string-name>
          .
          <article-title>Rover localization in natural environments by indexing panoramic images</article-title>
          .
          <source>Proceedings of the 2002 IEEE International Conference on Robotics and Automation (ICRA)</source>
          , pages
          <fpage>1365</fpage>
          {
          <fpage>1370</fpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>J. Hare</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Samangooei</surname>
            , and
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Dupplaw</surname>
          </string-name>
          .
          <article-title>Openimaj and imageterrier: Java libraries and tools for scalable multimedia analysis and indexing of images</article-title>
          .
          <source>ACM Multimedia</source>
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ke</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Sukthankar</surname>
          </string-name>
          .
          <article-title>Pca-sift: A more distinctive representation for local image descriptors</article-title>
          .
          <source>Proceedings of the Conference on Computer Vision and Pattern Recognition</source>
          , pages
          <volume>511</volume>
          {
          <fpage>517</fpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>J. Kittler</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Hatef</surname>
            ,
            <given-names>R.P.W.</given-names>
          </string-name>
          <string-name>
            <surname>Duin</surname>
            , and
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Matas</surname>
          </string-name>
          .
          <article-title>On combining classi ers</article-title>
          .
          <source>IEEE Trans. Pattern Anal. Mach</source>
          . Intell.,
          <volume>20</volume>
          (
          <issue>3</issue>
          ):
          <volume>226</volume>
          {
          <fpage>239</fpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>A. B. Kurhe</surname>
            ,
            <given-names>S. S.</given-names>
          </string-name>
          <string-name>
            <surname>Satonka</surname>
            , and
            <given-names>P. B.</given-names>
          </string-name>
          <string-name>
            <surname>Khanale</surname>
          </string-name>
          .
          <article-title>Color matching of images by using minkowski- form distance</article-title>
          .
          <source>Global Journal of Computer Science and Technology, Global Journals Inc. (USA)</source>
          ,
          <volume>11</volume>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <given-names>D.</given-names>
            <surname>Larlus</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Jurie</surname>
          </string-name>
          .
          <article-title>Latent mixture vocabularies for object categorization</article-title>
          .
          <source>BMVC</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <given-names>D.</given-names>
            <surname>Larlus</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Jurie</surname>
          </string-name>
          .
          <article-title>Latent mixture vocabularies for object categorization and segmentation</article-title>
          .
          <source>Journal of Image &amp; Vision Computing</source>
          ,
          <volume>27</volume>
          (
          <issue>5</issue>
          ):
          <volume>523</volume>
          {
          <fpage>534</fpage>
          ,
          <string-name>
            <surname>April</surname>
          </string-name>
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <given-names>N.</given-names>
            <surname>Lazic</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Aarabi</surname>
          </string-name>
          .
          <article-title>Importance of feature locations in bag-of-words image classi cation</article-title>
          .
          <source>Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing</source>
          ,
          <volume>1</volume>
          :I641{
          <fpage>I644</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <given-names>L.</given-names>
            <surname>Ledwich</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Williams</surname>
          </string-name>
          .
          <article-title>Reduced sift features for image retrieval and indoor localisation</article-title>
          .
          <source>Australasian Conf. on Robotics and Automation</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22. H.
          <string-name>
            <surname>Lejsek</surname>
            ,
            <given-names>F.H.</given-names>
          </string-name>
          <string-name>
            <surname>Asmundsson</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Thor-Jonsson</surname>
            , and
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Amsaleg</surname>
          </string-name>
          .
          <article-title>Scalability of local image descriptors: A comparative study</article-title>
          .
          <source>ACM Int. Conf. on Multimedia</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <given-names>D.</given-names>
            <surname>Lowe</surname>
          </string-name>
          .
          <article-title>Distinctive image features from scale-invariant keypoints</article-title>
          .
          <source>International Journal of Computer Vision</source>
          ,
          <volume>2</volume>
          (
          <issue>60</issue>
          ):
          <volume>91</volume>
          {
          <fpage>110</fpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <given-names>D. G.</given-names>
            <surname>Lowe.</surname>
          </string-name>
          <article-title>Object recognition from local scale-invariant features</article-title>
          .
          <source>Proceedings of the 7th International Conference on Computer Vision</source>
          , pages
          <volume>1150</volume>
          {
          <fpage>1157</fpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <given-names>W.</given-names>
            <surname>Lucetti</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Luchetti</surname>
          </string-name>
          .
          <article-title>Combination of classi ers for indoor room recognition, cgs participation at imageclef2010 robot vision task</article-title>
          .
          <source>Conference on Multilingual and Multimodal Information Access Evaluation (CLEF</source>
          <year>2010</year>
          ),
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <given-names>K.</given-names>
            <surname>Mikolajczyk</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Schmid</surname>
          </string-name>
          .
          <article-title>An a ne invariant interest point detector</article-title>
          .
          <source>Proceedings of the 7th European Conference on Computer Vision</source>
          , pages
          <volume>128</volume>
          {
          <fpage>142</fpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <given-names>K.</given-names>
            <surname>Mikolajczyk</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Schmid</surname>
          </string-name>
          .
          <article-title>Scale &amp; a ne invariant interest point detectors</article-title>
          .
          <source>IJCV</source>
          ,
          <volume>60</volume>
          (
          <issue>1</issue>
          ),
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <given-names>K.</given-names>
            <surname>Mikolajczyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Schmid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Harzallah</surname>
          </string-name>
          , and J. van de Weijer.
          <article-title>Learning object representations for visual object class recognition</article-title>
          .
          <source>Visual Recognition Challange</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <given-names>J.</given-names>
            <surname>Morel</surname>
          </string-name>
          and
          <string-name>
            <given-names>G.</given-names>
            <surname>Yu. Asift</surname>
          </string-name>
          :
          <article-title>A new framework for fully a ne invariant image comparison</article-title>
          .
          <source>SIAM Journal on Imaging Sciences</source>
          ,
          <volume>2</volume>
          (
          <issue>2</issue>
          ):
          <volume>438</volume>
          {
          <fpage>469</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <given-names>A.</given-names>
            <surname>Pronobis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. M.</given-names>
            <surname>Mozos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Caputo</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Jensfelt</surname>
          </string-name>
          <article-title>. Multi-modal semantic place classi cation</article-title>
          .
          <source>Int. J. Robot. Res.</source>
          ,
          <volume>29</volume>
          (
          <issue>2-3</issue>
          ):
          <fpage>298320</fpage>
          ,
          <year>February 2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31.
          <string-name>
            <surname>J. Puzicha</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Rubner</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Tomasi</surname>
            , and
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Buhmann</surname>
          </string-name>
          .
          <article-title>Empirical evaluation of dissimilarity measures for color and texture</article-title>
          .
          <source>Proc. International Conference on Computer Vision</source>
          , Vol.
          <volume>2</volume>
          ,
          <string-name>
            <surname>page</surname>
            <given-names>11651173</given-names>
          </string-name>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          32.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Rubner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Tomasi</surname>
          </string-name>
          , and
          <string-name>
            <surname>L. Guibas.</surname>
          </string-name>
          <article-title>The earth mover's distance as a metric for image retrieval</article-title>
          .
          <source>International Journal of Computer Vision</source>
          ,
          <volume>40</volume>
          (
          <issue>2</issue>
          ):
          <volume>99</volume>
          {
          <fpage>121</fpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          33.
          <string-name>
            <given-names>O.</given-names>
            <surname>Saurer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Fraundorfer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Pollefeys</surname>
          </string-name>
          .
          <article-title>Visual localization using global visual features and vanishing points</article-title>
          .
          <source>Conference on Multilingual and Multimodal Information Access Evaluation (CLEF</source>
          <year>2010</year>
          ),
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          34.
          <string-name>
            <surname>C. R. Shyu</surname>
            ,
            <given-names>C. E.</given-names>
          </string-name>
          <string-name>
            <surname>Brodley</surname>
            ,
            <given-names>A. C.</given-names>
          </string-name>
          <string-name>
            <surname>Kak</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Kosaka</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Aisen</surname>
            , and
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Broderick</surname>
          </string-name>
          .
          <article-title>Local versus global features for content-based image retrieval</article-title>
          .
          <source>Proc. IEEE Workshop of Content-Based Access of Image and Video Databases</source>
          , pages
          <volume>30</volume>
          {
          <fpage>34</fpage>
          ,
          <year>June 1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          35.
          <string-name>
            <given-names>J.</given-names>
            <surname>Sivic</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Zisserman</surname>
          </string-name>
          .
          <article-title>Video google: A text retrieval approach to object matching in videos</article-title>
          .
          <source>Proceedings of the 9th International Conference on Computer Vision</source>
          , pages
          <volume>1470</volume>
          {
          <fpage>1478</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          36.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Takeuchi</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Hebert</surname>
          </string-name>
          .
          <article-title>Finding images of landmarks in video sequences</article-title>
          .
          <source>Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition</source>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          37.
          <string-name>
            <given-names>S.</given-names>
            <surname>Thrun</surname>
          </string-name>
          .
          <article-title>Learning metric-topological maps for indoor mobile robot navigation</article-title>
          .
          <source>Arti cial Intelligence</source>
          ,
          <volume>99</volume>
          :
          <fpage>21</fpage>
          {
          <fpage>71</fpage>
          ,
          <year>February 1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          38. I.
          <article-title>Ulrich and I. Nourbakhsh. Appearance-based place recognition for topological localization</article-title>
          .
          <source>IEEE Intl. Conf. on Robotics and Automation</source>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          39.
          <string-name>
            <given-names>V.</given-names>
            <surname>Vapnik</surname>
          </string-name>
          .
          <article-title>Statistical learning theory</article-title>
          .
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          40.
          <string-name>
            <surname>C. Wallraven</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Caputo</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Graf</surname>
          </string-name>
          .
          <article-title>Recognition with local features: the kernel recipe</article-title>
          .
          <source>Proc. ICCV</source>
          , pages
          <volume>257</volume>
          {
          <fpage>264</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          41.
          <string-name>
            <surname>J. Wolf</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Burgard</surname>
            , and
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Burkhardt</surname>
          </string-name>
          .
          <article-title>Robust vision-based localization for mobile robots using an image retrieval system based on invariant features</article-title>
          .
          <source>Proc. of the IEEE International Conference on Robotics and Automation (ICRA)</source>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          42.
          <string-name>
            <given-names>L.</given-names>
            <surname>Wolf</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Shashua</surname>
          </string-name>
          .
          <article-title>Kernel principal angles for classi cation machines with applications to image sequence interpretation</article-title>
          .
          <source>Proc. CVPR, I:635{640</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          43. J.
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Y. G.</given-names>
          </string-name>
          <string-name>
            <surname>Jiang</surname>
            ,
            <given-names>A. G.</given-names>
          </string-name>
          <string-name>
            <surname>Hauptmann</surname>
            , and
            <given-names>C. W.</given-names>
          </string-name>
          <string-name>
            <surname>Ngo</surname>
          </string-name>
          .
          <article-title>Evaluating bag-of-visualwords representations in scene classi cation</article-title>
          .
          <source>ACM MIR</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          44.
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Marszalek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lazebnik</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Schmid</surname>
          </string-name>
          .
          <article-title>Local features and kernels for classi cation of texture and object categories: A comprehensive study</article-title>
          .
          <source>IJCV</source>
          ,
          <volume>73</volume>
          (
          <issue>2</issue>
          ):
          <volume>213</volume>
          {
          <fpage>238</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>