<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Multi Cue Discriminative Approach to Semantic Place Classi cation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marco Fornoni</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jesus Martinez-Gomez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Barbara Caputo??</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Idiap Research Institute Centre Du Parc</institution>
          ,
          <addr-line>Rue Marconi 19 P.O. Box 592, CH-1920 Martigny</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes the participation of Idiap-MULTI to the Robot Vision Task at imageCLEF 2010. Our approach was based on a discriminative classi cation algorithm using multiple cues. Speci cally, we used an SVM and combined up to four di erent histogram-based features with the kernel averaging method. We considered as output of the classi er, for each frame, the label and its associated margin, which we took as a measure of the con dence of the decision. If the margin value is below a threshold, determined via cross-validation during training, the classi er abstains from assigning a label to the incoming frame. This method was submitted to the obligatory task, obtaining a maximum score of up to 662, which ranked second in the overall competition. We then extended this algorithm for the optional task, where it is possible to exploit the temporal continuity of the sequence. We implemented a door detector so to infer when the robot has entered a new room. Then, we designed a stability estimation algorithm for determining the label of the room where the robot has entered, and we used this knowledge as a prior for the upcoming frames. Our approach obtained a score of up to 2052 in the obligatory task, ranking rst.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>This paper describes the algorithms used by the Idiap-MULTI team at the third
edition of the Robot Vision task, held under the umbrella of the ImageCLEF
2010 evaluation challenge. The focus of the Robot Vision task has been, since its
rst edition in 2009, semantic place localization for mobile robots, using visual
information. This year, the task posed two distinctive research questions to
participants: (1) can we design visual recognition algorithms able to recognize room
categories, and (2) can we equip robots with methods for detecting unknown
rooms?</p>
      <p>The Idiap-MULTI team took a multi cue discriminative approach for
addressing both issues. The core of our methods, both in the obligatory and optional
tasks, is an SVm classi er, trained on a large number of visual features, combined
together via a at average of kernels [Gheler09]. This outputs, for each frame
of the testing sequence, a classi cation label and a measure of the con dence in
the decision. These two informations are then used to evaluate if the perceived
room is one of those already seen, or if it is unknown to the system. Figure 1
gives an overall overview of the training and classi cation steps.</p>
      <p>In the rest of the paper we provide a detailed description of each step outlined
in the diagrams: section 2 gives an overview of the oversampling strategy, devised
to increase robustness. Section 3 describes the features used, and section 4 the
cue integration approach. Section 5 and 6 describes into details the algorithms
used for the obligatory and optional tasks. We report the experimental results
in section 7. The paper concludes with an overall discussion.</p>
      <p>Training Process</p>
      <p>
        Classi cation Process
The capability to recognize room categories implies robustness to slight
variations in the rooms' appearance. To achieve this, we propose, as a pre-processing
step, to increase the number of training frames by applying simulated
illumination changes to the original training frames. We generate new frames using
the original training frames as templates. We apply colour modi cations to that
templates, trying to emulate the e ect of extreme low/high lighting
environments ( gure 2). We increased the original training set adding an additional
sequence (the size was 30% of the original training sequence) with images
generated increasing or decreasing the luminance component for all the pixels. Even
though in principle categorical variations are di erent from lighting variations,
preliminary experiments indicate that this pre-processing step is bene cial.
As features, we chose a variety of global descriptors representing di erent
features of the images. We opted for histogram-based global features, mostly in
the spatial-pyramid scheme introduced in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. This representation scheme was
chosen because it combines the structural and statistical approaches: it takes
into account the spatial distribution of features over an image, while the local
distribution is in turn estimated by mean of histograms; moreover it has proven
to be more versatile and to achieve higher accuracies in our experiments.
      </p>
      <p>
        The descriptors we have opted to extract belong to ve di erent families:
Pyramid Histogram of Orientated Gradients (PHOG) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], Sift-based Pyramid
Histogram Of visual Words (PHOW) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], Pyramid histogram of Local Binary
Patterns (PLBP) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], Self-Similarity-based PHOW (SS-PHOW) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], and
Compose Receptive Field Histogram (CRFH) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Among all these descriptors, CRFH
is the only one which is not computed pyramidly. For the remaining families we
have extracted an image descriptor for every value of L = f0; 1; 2; 3g, so that the
total number of descriptors extracted per image is equal to 25 (4+4 PHOG, 4+4
PHOW, 4 PLBP, 4 SS-PHOW, 1 CRFH). The exact settings for each descriptor
are summarized in Table 1.
      </p>
      <p>DESCRIPTOR</p>
      <p>PHOG180
PHOG360
PHOWgray
PHOWcolor
PLBPr8;i1u2
SS-PHOW</p>
      <p>CRFH</p>
      <p>SETTINGS L
range= [0; 180] and K = 20 f0; 1; 2; 3g
range= [0; 360] and K = 40 f0; 1; 2; 3g
M = 10, V = 300 and r = f4; 8; 12; 16g f0; 1; 2; 3g</p>
      <p>M = 10, V = 300 and r = f4; 8; 12; 16g f0; 1; 2; 3g</p>
      <p>P = 8, R = 1, RotationInvariantUniform2 version f0; 1; 2; 3g
M = 5, V = 300, S = 5 ,R = 40, nRad = 4 and nT heta = 20 f0; 1; 2; 3g
Gaussian-Derivatives=fLx; Lyg, K = 14 and s = f1; 2; 4; 8g</p>
      <p>Table 1. Settings of the image descriptors</p>
    </sec>
    <sec id="sec-2">
      <title>Cue Integration</title>
      <p>
        Categorization is a di cult task and this is particularly true for indoor visual
place categorization. Indoor environments are indeed characterized by an high
variability in the visual appearance within each category, mainly due to
clutter, occlusion, partial visibility of the rooms and local illumination changes. To
further complicate matters, a robot is supposed to interact responsively with
its environment and therefore a strong requirement is e ciency. We decided to
combine these two issues by using a very e cient cue integration scheme, namely
kernel averaging [
        <xref ref-type="bibr" rid="ref5 ref6">6, 5</xref>
        ]. Our approach consists of two steps:
1. pre-select the visual cues which are found to maximize the performances,
when integrated together.
2. compute the average-kernel over the preselected features as an e ective and
e cient cue-integration method
In order to select the best visual cues to be combined together we have performed
a pre-selection of the corresponding pre-computed kernel matrices, by using a
simple depth- rst search on the tree of all possible combinations of features.
Since the number of all possible combinations of n features is 2n 1, we have
adopted the following e ciency measures to make the computation feasible:
{ estimate the accuracy of a given combination of kernels using only a
subsample of the training and validation set (10 and 30 percent)
{ prune down the tree of all possible combinations, by imposing a condition
on the improvement which has to be satis ed in order to explore a branch:
if the improvement achieved by averaging the kernel k2 with the kernel k1 is
less then or equal to a threshold (ratio accuracy of k1), the branch is not
further explored. However if averaging k3 with k1 does satisfy the condition,
the branch k1-k3-k2 is explored. Finally if the branch k1-k2 has already been
explored, obviously the branch k1-k3-k2 is not explored again.
      </p>
      <p>An example exploration is shown in gure 3.
54 55 56 57 58 59 60 61 62 63
Fig. 4. Performance of best combinations returned by the algorithm (ordered with
respect to the number of kernels used), as measured by the accuracy on validation set</p>
      <p>Best performing combinations selected are shown in gure 4, where we have
sorted them with respect to the number of image descriptors used and we have
taken into account only the best combinations with a maximum of four cues.
For our nal runs we used the following combinations of visual cues:
{ PHOG360 L3, CRHF
{ PHOG180 L3, CRFH
{ PHOG360 L0, PHOG180 L3, CRFH
{ PHOG180 L0, PHOG360 L3, CRFH
{ PHOG180 L1, PHOG180 L3, CRFH
{ PLBPL0, PHOG180 L2, PHOG180 L1, CRFH
which correspond to the two best combinations with 2 cues, the three best
combinations with 3 cues and the best combination with 4 cues.
5</p>
    </sec>
    <sec id="sec-3">
      <title>Obligatory Task: The Algorithm</title>
      <p>For the obligatory task, each test frame has to be classi ed without taking into
account the continuity of the test sequence. Each test frame will be classi ed just
using the SVM after the feature extraction step with the cue integration. Our
rst approach was just label each test frame with the room (class) that obtained
the highest output value, but wrong classi cations obtain negative values for the
task score (as will be observed in section 7).</p>
      <p>Our algorithm post-process the output obtained by the SVM to avoid
classifying a test frame if it is not very con dence with the correct class. We normalize
the output obtained with the SVM classi er for the test sequence, obtaining (for
each test frame) 8 numeric values between 1:0 and +1:0 corresponding to each
one of the training rooms. A test frame fi will be labelled with class Cj only
when the normalized output value for that class Oi;j is above a threshold value
and Oi;j clearly overcomes all the others output values.</p>
      <p>All thresholds were obtained with the preliminary experiments using the
validation sequence provided by the task organizers. For these preliminary
experiments, we observed that for a big percentage of the validation sequence, just
a class obtained a positive output value. Moreover, large and small o ce
presented as most problematic rooms and Printer Area, Recycle Area and Toiled
obtained best accuracy.</p>
      <p>For the parameter tuning, we used a classical Hill Climbing algorithm for
all thresholds (we have 8 thresholds for each feature combination). A threshold
value of 0:0 means that none of the test frames will be classi ed using that class
and 1:0 will be used if we highly trust the classi cation algorithm for a selected
class.</p>
      <p>For the Hill Climbing algorithm, we tested positive and negative variations for
the threshold values. These variations will be performed if the score obtained for
the obtained run with the selected threshold (and using the validation sequence
as test sequence) does not decrease. This greedy method has high risks of failing
into local optima, and so we perform three executions using 0:25, 0:5 and 0:75 as
initial values, selecting as nal threshold value that achieving the highest score.
6</p>
    </sec>
    <sec id="sec-4">
      <title>Optional Task: The Algorithm</title>
      <p>For the optional task, we are allowed to exploit the temporal continuity in the
sequence. We therefore implemented a door detector for estimating when the robot
moves into a new room. This information, coupled with a stability estimation
algorithm, can be useful for classifying a sequence of consecutive test frames. We
estimate the stability of the classi cation process using the last n frames and
their associated labels, obtained with the classi cation algorithm used for the
obligatory task. A room is selected as the most probable label for the incoming
data if at least the last n frames were classi ed with that label. This method
is used for labeling frames for which the classi cation algorithm has a low level
of con dence and therefore abstains to take a decision. The process is initiated
every time the door detector signals that the robot has entered a new room.</p>
      <sec id="sec-4-1">
        <title>Door detection algorithm</title>
        <p>
          We developed a basic door detection algorithm for indoor o ce environments
as those used for the Robot Vision task. When the robot moves from a room to a
new one, acquired images show two vertical rectangles with the same colour. The
width of both rectangles increases when the robot gets closer to the door. The
image processing algorithm consists of a Canny lter [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] to extract all the edges of
the images. After this step, we use the Hough transform [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] for lines detection and
we discard all the non vertical lines. Finally, we measure the average colour value
between each two vertical lines, removing non homogeneous colour distributions
(blobs). All these process can be observed in Fig. 5, where we detect three colour
homogeneous blobs (two of them can be used to detect the door crossing)
        </p>
        <p>Once we have extracted all the key blobs from a frame, we have to study
the time correspondence for these blobs between this frame and the last frames.
If two blobs with the same average colour are increasing for new frames we are
reaching a door and both blobs are marked as candidates. Preliminary
candidates will be selected as de nitive ones if one of the two blobs starts decreasing
after reaching the largest size at the left (right) of the image. Figure 6 shows four
consecutive training frames, where white rectangles represent blobs, preliminary
candidates are labelled with a P and de nitive candidates with a P. Green
rectangles for the bottom images represent the time correspondence for each blob in
the last frames.</p>
      </sec>
      <sec id="sec-4-2">
        <title>Additional Processing</title>
        <p>For the additional processing we use the output obtained with the algorithm
for the obligatory task. For each test frame, that algorithm obtains a class value
or leaves it without classifying. Our processing tries to detect two situations:
stable process and unstable process. All the internal threshold values were
obtained from preliminary experimented developed with the validation sequence
as testing set:
{ Stable estimation The process is stable if, after crossing the last door, most
of the frames were preliminary labelled with the same class Ci. In such
situation we will use Ci to label test frames not labelled by the classi cation
algorithm. The process will be considered stable when at least the last 18
frames have been classi ed with the same label.
{ Unstable estimation Instability will appear when preliminary class values for
the last frames is not the same or they were not labelled. If this situation
continues for a large number of frames, the process will not be able to achieve
a stable situation. We assume that the process is unstable when the number
of frames since we achieved a stable situation is greater than 50. Facing
an unstable situation, the additional processing will label with the special
label \Unknown" all the frames not labelled previously. This label is used
for classifying new rooms not imaged in the training/validation sequences.
7</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experiments</title>
      <p>Our algorithms were evaluated following the procedure proposed by the
organizers of the RobotVision@ImageCLEF 2010 competition. A training set containing
2741 frames had to be classi ed using a room label, marked as unknown or not
classi ed. Performance was evaluated according to a pre-de ned score function.
7.1</p>
      <sec id="sec-5-1">
        <title>Obligatory Task: Results</title>
        <p>We submitted a total of twelve runs to the obligatory task. These runs were
divided into two sets, one with parameters determined via cross validation, one
without, as described in section 5. Each of the two sets comprises six experiments
using the same exact feature combinations.</p>
        <p>Rank Feature Combination Score Cross-Validation
2 PHOG180 L0 PHOG360 L3 CRFH 662 X
3 PHOG360 L0 PHOG180 L3 CRFH 657
4 PHOG180 L1 PHOG180 L3 CRFH 645 X
5 PHOG180 L0 PHOG360 L3 CRFH 644
9 PHOG360 L3 CRFH 637 X
10 PHOG360 L0 PHOG180 L3 CRFH 636 X
11 PHOG180 L1 PHOG180 L3 CRFH 629
12 PLBP L0 PHOG180 L2 PHOG360 L1 CRFH 628 X
13 PHOG360 L3 CRFH 620
14 PLBP L0 PHOG180 L2 PHOG360 L1 CRFH 612
15 PHOG180 L3 CRFH 605
17 PHOG180 L3 CRFH 596 X</p>
        <p>Our best score ranked second in the competition, with a di erence with
respect to the winner of only 2:22%. It was obtained using the cue combination:
PHOG180 L0 PHOG360 L3 CRFH, with and C estimated via cross-validation.
Also our second best score (ranked third) was obtained with a combination of
the type: PHOG L0 PHOG L3 CRFH, but the quantization of the orientation
space was in this case swapped: 360 for the PHOG L0 and 180 for the PHOG L3.</p>
        <p>As shown in gure 8, most of the experiments where the cross-validation step
was performed obtained a higher performance.This improvement is con rmed
also by computing the average score in the two sets.
PHOG360_ L 3 CRFH</p>
        <p>PHOG180_ L 3 CRFH
PHOG180_ L 0 PHOG360_ L 3 CRFH
PHOG180_ L 1 PHOG180_ L 3 CRFH</p>
        <p>PHOG360_ L 0 PHOG180_ L 3 CRFH
PLBP_ L 0 PHOG180_ L 2 PHOG360_ L 1 CRFH</p>
        <p>Average
560
580
600
620
640
660
680</p>
        <p>This can be explained using the parameters values obtained: the parameters
estimated via cross-validation (0 &lt; &lt; 4) were in general much lower than
the corresponding values obtained as the average pairwise 2 distance between
histograms (1:5 &lt; &lt; 16). The SVM C parameter obtained by the
crossvalidation (5 &lt; C &lt; 35) was also not far from the default value of 10, which
is an unusually low value for a classi cation task (common values are often
between 100 and 1000). The low value of the C parameter enforced a stronger
regularization of the solution, thus improving the generalization capability of the
classi er.</p>
        <p>It is important to say, however, that our second best performance on this task
was obtained without executing any cross-validation step and nonetheless turned
out to outperform the corresponding cross-validated one. Also in this case one
explanation for the failure of the cross-validation could be found by looking again
at the values obtained: one of the three kernels averaged (PHOG360 L0)
obtained a very low value (0:000031). With this setting the kernel matrix is almost
1 for most of the couples and when computing the average kernel it only plays
the role of a (smoothing) constant. The cross-validation algorithm in this case
got stuck in a local minima in which the information added by the PHOG360 L0
kernel to the combination was very limited and the nal performance was not
improved.
7.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Optional Task: Results</title>
        <p>For the optional task we submitted twelve runs, using the same combination of
features and cross-validations that for the obligatory task. Fig. 9 shows all the
results, where it can be observed that all our runs achieved rst positions for
this task.</p>
        <p>Rank Feature Combination Score Cross-Validation
1 PLBP L0 PHOG180 L2 PHOG360 L1 CRFH 2052 X
2 PHOG180 L3 CRFH 1770 X
3 PHOG180 L0 PHOG360 L3 CRFH 1361 X
4 PHOG180 L1 PHOG180 L3 CRFH 1284 X
5 PHOG360 L0 PHOG180 L3 CRFH 1262 X
6 PHOG180 L1 PHOG180 L3 CRFH 1090
7 PHOG180 L0 PHOG360 L3 CRFH 1028
8 PHOG360 L0 PHOG180 L3 CRFH 1019
9 PHOG360 L3 CRFH 963 X
10 PHOG360 L3 CRFH 916
11 PLBP L0 PHOG180 L2 PHOG360 L1 CRFH 886
12 PHOG180 L3 CRFH 682</p>
        <p>Only other two groups (CAOR and DYNILSIS) submitted runs for this task,
and their best scores were consistently smaller than all our runs (62:0 and 67:0
respectively). Therefore, our group was the winner of the optional task. If we
compare gures 9 and 7, our algorithm for the optional task allows us to increase
the nal score for all the feature combinations. This increase proves the goodness
of our proposal for exploiting the continuity of the test sequence.</p>
        <p>Fig. 10 shows a complete comparison for the score obtained for each feature
combination, with and without using a cross-validation step. It is worth to note
that the nal score was always noticeably improved by using the cross-validation
step.
8</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>This paper describes the participation of Idiap-MULTI to the Robot Vision task
at ImageCLEF 2010. We participated to both the obligatory and optional tracks
with algorithms based on an SVM cue integration approach. Our best runs in
the two tracks ranked respectively second (obligatory track) and rst (optional
track), showing the e ectiveness of our approach.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>D.H.</given-names>
            <surname>Ballard</surname>
          </string-name>
          .
          <article-title>Generalizing the Hough transform to detect arbitrary shapes</article-title>
          .
          <source>Pattern recognition</source>
          ,
          <volume>12</volume>
          (
          <issue>2</issue>
          ):
          <volume>111</volume>
          {
          <fpage>122</fpage>
          ,
          <year>1981</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>A.</given-names>
            <surname>Bosch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zisserman</surname>
          </string-name>
          , and
          <string-name>
            <given-names>X.</given-names>
            <surname>Munoz</surname>
          </string-name>
          .
          <article-title>Image classi cation using random forests and ferns</article-title>
          .
          <source>In International Conference on Computer Vision</source>
          , pages
          <fpage>1</fpage>
          <lpage>{</lpage>
          8.
          <string-name>
            <surname>Citeseer</surname>
          </string-name>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>A.</given-names>
            <surname>Bosch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zisserman</surname>
          </string-name>
          , and
          <string-name>
            <given-names>X.</given-names>
            <surname>Munoz</surname>
          </string-name>
          .
          <article-title>Representing shape with a spatial pyramid kernel</article-title>
          .
          <source>In Proceedings of the 6th ACM international conference on Image and video retrieval, page 408. ACM</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>J</given-names>
            <surname>Canny</surname>
          </string-name>
          .
          <article-title>A computational approach to edge detection. Readings in computer vision: issues, problems</article-title>
          , principles, and paradigms,
          <volume>184</volume>
          ,
          <year>1987</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>C.</given-names>
            <surname>Cortes</surname>
          </string-name>
          .
          <article-title>Can learning kernels help performance?</article-title>
          <source>In invited talk at International Conference on Machine Learning</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>P.</given-names>
            <surname>Gehler</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Nowozin</surname>
          </string-name>
          .
          <article-title>On feature combination for multiclass object classi cation</article-title>
          .
          <source>In Proc. ICCV</source>
          , volume
          <volume>1</volume>
          , page 6,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>S.</given-names>
            <surname>Lazebnik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Schmid</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Ponce</surname>
          </string-name>
          .
          <article-title>Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories</article-title>
          .
          <source>In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition</source>
          , volume
          <volume>2</volume>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>O.</given-names>
            <surname>Linde</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Lindeberg</surname>
          </string-name>
          .
          <article-title>Object recognition using composed receptive eld histograms of higher dimensionality</article-title>
          .
          <source>In Proc. ICPR. Citeseer</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>T.</given-names>
            <surname>Ojala</surname>
          </string-name>
          , M. Pietikainen, and T. Maenpaa.
          <article-title>Gray scale and rotation invariant texture classi cation with local binary patterns</article-title>
          .
          <source>Computer Vision-ECCV</source>
          <year>2000</year>
          , pages
          <fpage>404</fpage>
          {
          <fpage>420</fpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10. E. Shechtman and
          <string-name>
            <given-names>M.</given-names>
            <surname>Irani</surname>
          </string-name>
          .
          <article-title>Matching local self-similarities across images and videos</article-title>
          .
          <source>In IEEE Conference on Computer Vision and Pattern Recognition</source>
          ,
          <year>2007</year>
          . CVPR'
          <volume>07</volume>
          , pages
          <issue>1{8</issue>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>