<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>SVM classification of moving objects tracked by Kalman filter and Hungarian method</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Gábor Szűcs</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dávid Papp</string-name>
          <email>pappdavid27@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dániel Lovas</string-name>
          <email>lovas.daniel@simonyi.bme.hu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics</institution>
          ,
          <addr-line>Magyar Tudósok krt. 2., H-1117, Budapest</addr-line>
          ,
          <country country="HU">Hungary</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Fishery video data often require laborious visual analysis, therefore a video-based fish identification challenge is announced in the LifeCLEF campaign for automatic fish categorization and enumeration. We have elaborated a complex system to detect, classify and track objects (fishes) in underwater video by examining each image frame of it. For the detection process we used background subtraction and morphologic methods, and then our solution calculated bounding boxes based on object contours. We used Kalman filter to track the moving objects, but an additional matching method was required to pair the objects in consecutive time periods because of many fishes. We used Hungarian method for this matching problem. We categorized the detected fishes with C-SVC classifier, as an advanced SVM (Support Vector Machine) classifier. The classifier used high level descriptors, which are based on the extracted SURF vectors in each object. For optimization the C-SVC classifier we conducted a preliminary test, and we used the best parameters for teaching the classifier. We predicted the fish species in the official test video set, and our predictions were evaluated officially by NCS (Normalized Counting Score).</p>
      </abstract>
      <kwd-group>
        <kwd>SVM method</kwd>
        <kwd>fish classification</kwd>
        <kwd>tracking</kwd>
        <kwd>Kalman filter</kwd>
        <kwd>Hungarian method</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The analysis of video data usually requires very time-consuming and
expensive input by human observers, and this is true for underwater videos as well,
although the statistics of data collection would be very useful for exploratory
applications, in particular for fisheries and biological areas. This analytical
"bottleneck" greatly restricts the use of the powerful video technologies and
demands effective methods for automatic content analysis to enable
proactive provision of analytical information; and in order to solve this problem a
challenge is announced in the LifeCLEF [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] campaign of ImageCLEF [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
In this challenge two datasets (training data set with ground truth and a test
set) were released for the video-based fish identification task [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The goal
was to automatically count fish per species in video segments (e.g., video X
contains N1 instances of fish of species 1, ..., Nn instances of fish species n).
We have divided the problem into subtasks: object detection, classification
and tracking, where the objects were the fishes; and we have implemented a
video analysis system to solve these tasks. The applied methods and our
solution are described below.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Object detection, classification and tracking system</title>
      <p>Object classification and tracking are different tasks, but both of them based
on object detection (fish detection) in images of videos. We have
implemented fish detection in OpenCV in such way that the bounding boxes of
detected fishes are stored as small images with corresponding information
(actual timestamp, identifier, etc). The common subtask in both of the
problems is mapping, i.e. interconnection of bounding box images and fish
identifiers (these identifiers are generated for only distinguishable aim), because
the results of the mapping can be used for classification and tracking as well.
Thus the bounding box images are input for classification method, which
estimates the species of fishes. The consecutive images with common fish
identifiers can be classified into different species; therefore the final decision
of classification in our solution is based on majority voting.
2.1</p>
    </sec>
    <sec id="sec-3">
      <title>Detection of many fishes</title>
      <p>
        The one of the challenges in the fish detection was the observation of many
objects in an image. For object detection at first we have used background
subtraction [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], in order to separate the foreground from background. The
most common morphological methods (erosion, dilation and the
combination of them, the closing) have been applied in order to get smooth and solid
edges [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. After this smoothing an algorithm for contours of objects has been
applied, that is evolved by Suzuki and Abe [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Using the contours our
solution calculates the bounding boxes and the object centres. Some of detected
objects were too small to substantively use in classification, hence those
objects that were smaller then 15x15 pixel were filtered out (deleted).
2.2
      </p>
    </sec>
    <sec id="sec-4">
      <title>Fish tracking with Kalman filter and Hungarian method</title>
      <p>
        After the detection in our solution Kalman filter [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ][
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] has been used to track
objects in three steps: (i) initialization, and after that there is a cycle process
with (ii) prediction and (iii) correction.
      </p>
      <p>Initialization: at the first frame, where any detection was, the Kalman filter
was initialized; and for every detected object an identity number and a
confidence value (CFV) were attached.</p>
      <p>Prediction: In this step a prediction was made by Kalman filter on each
detected object (using the calculated object centre) to forecast the future
position of the investigated fish.</p>
      <p>
        Correction: After prediction the new detections (in next frame of the video)
give the measurements (which are used in the comparison of the
measurements with predictions). These measurements were used for correct the
Kalman filter objects. In order to reach the best tracked-measured coupling
we applied the Hungarian method [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ][
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], completed with a restriction that
we removed those objects that not belong to a new measurement. We
present this applied method below.
      </p>
      <p>Let vi be a tracked object, where i = 0...k and let wj be a newly detected
object where j = 0...l. Before our solution used the Hungarian method a Mkxl
matrix was calculated, where M[i,j] denotes the Euclidean distance
(measured in pixels in the image) of vi and wj objects. The rows and columns with
higher elements than a given threshold were removed to prevent false
matching. If a row corresponding to vi was removed from M, then we
lowered the confidence value (CFV) of that particular tracking. On the other side
removing wj means that we should track this object, i.e. probably this is a
new object (a new Kalman filter has been created for this object), because
this is far from all others.</p>
      <p>At this point the Hungarian method were performed on M which resulted
the optimal object tracking (v,w) pairs, i.e. minimal sum of distances.
The Fig. 1. illustrates the mechanism of the applied Hungarian method,
where the vi objects are representing by a set of black and grey vertices (T)
and wj object by red and pink ones (N). The thick edges connect the matched
(v,w) pairs, whereas grey and pink colour denote those nodes that have no
pair. After the Hungarian method the matched object pairs will be the input
of the correction phase of Kalman filter (black vertices are tracked objects
and the red ones are the measurements).
2.3
2.3.1</p>
    </sec>
    <sec id="sec-5">
      <title>Fish classification</title>
    </sec>
    <sec id="sec-6">
      <title>Elaboration of image descriptors</title>
      <p>The first part of the classification process is the representation of each image
based on their visual content. This consists of three steps: (i) feature
detection, (ii) feature description, (iii) image description as usual phases in
computer vision.</p>
      <p>
        Feature detection and description: Lots of different feature types can be
detected in an image, e.g. corners, edges, ridges, as “interesting” part of an
image. In our solution we have used Fast-Hessian Detector [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] to determine
the “key points” in each image, and SURF (Speed Up Robust Features) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]
descriptor to extract local information at each key point. The SURF is based
on Lowe’s SIFT (Scale Invariant Feature Transform) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ][
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] with the
expectation to be faster and more robust. Both of these methods are widely used in
practice and in theoretical works (as well) with some possible further
development; but we have chosen SURF, because in videos the processing speed
is more important. A SURF descriptor vector belongs to only one interesting
point of an image, but an image possesses many feature descriptor vectors,
which should be aggregated into an image descriptor.
      </p>
      <p>
        Image description: The next step of creating the representation is the
completion of high-level representation of each image. We have applied BoW
(bag-of-words) model [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ][
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] for this purpose, where images are treated as
text documents. According to this, “visual words” (so called “codewords”) in
images need to be defined from feature descriptors. The whole set of
codewords gives the codebook (similarly to dictionary in text tasks). To determine
the codebook we clustered the SURF descriptors with K-means [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]
algorithm, and the resulting cluster centers will represent the codewords, since a
centroid represents similar feature descriptors. We have experimented with
different cluster sizes (k=5000 and k=10000), these are discussed later. At
this point, the codebook with codewords was available for further
calculations, which can be considered as a concise representation of the whole
image set. According to the codebook the next step is to create a descriptor
that specifies the distribution of the visual codewords in any image, called
high-level descriptor. We have built histograms as high-level descriptors for
each image:


      </p>
      <p>Let ( ) ( ) ( ) be the initial histogram of the rth image,
where denotes the size of the codebook (each element represent
a codeword in H).</p>
      <p>We performed 1NN (1-nearest neighbour) algorithm for each
SURF descriptor to find the closest codeword (based on Euclidean
distance), then the corresponding element of H was incremented by
1, where i cycles through the descriptors created for the rth image
2.3.2</p>
    </sec>
    <sec id="sec-7">
      <title>Training the classifier</title>
      <p>
        For the classification task we have divided the labelled image set into two
subsets: training and test set. We used the first one to create the codebook,
and train the classifiers, and the latter for preliminary testing. The
histograms were created for each training image, then we performed a variation
of SVM (Support Vector Machine), the C-SVC (C-support vector classification)
[
        <xref ref-type="bibr" rid="ref17">17</xref>
        ][
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] with linear kernel function to train the classifier (classification
model). The SVM is basically a binary linear classifier, thus in order to extend it to
a number of classified categories, the one-against-all technique was used.
During this method a binary classifier was created for each category in the
training set. We have executed kFold cross-validation technique before the
preliminary test to determine the parameter of the C-SVC classifier. (After
the training, the codebook was already available.)
2.4
      </p>
    </sec>
    <sec id="sec-8">
      <title>Preliminary test of classification</title>
      <p>For the maximization the goodness of classification part we have conducted
a preliminary test, as validation phase of the learning process. The labelled
image set was divided into train set (11221 images) and preliminary test set
(11220 images). The training phase was executed by different parameters
according to the number of codewords sizes (5000 and 10000), the number
of dimensions of SURF vectors (64 and 128), and the number of SURF vectors
in an image (80, 200, and 500). Besides the accuracy we have measured the
speed of the algorithm as well, and the results can be seen in Table 1 and 2.
After the preliminary testing at 5000 codewords the case of 10000
codewords was testing with only 200 and 500 SURF vectors, because 80 vectors
has resulted in very poor accuracy.
The best result of the preliminary testing is the 0.663 at case of 128
dimensions, 500 vectors and 5000 codewords. The run time of the training phase
was long, but the largest part (92-94%) of this was the SVM teaching, which
was important for good accuracy, while the other parts are fast (creating
codewords: 3-6%, histogram calculation: less, than 1%); furthermore the test
phase was also quick (10 minutes). The run time values were measured on a
PC (with Intel Core i7-4770K processor, 16 GB RAM and SSD).</p>
      <p>For the prediction of official test we have chosen the SVM model with the
best parameters (128 dimensions, 500 vectors and 5000 codewords), and we
have submitted 2 prediction files (as results of 2 runs). The used method of
the runs was the same (as described above), only small difference was
between the two submissions. The first run (BME TMIT RUN1) contains false
detections, because of the glances (blinks) and the low level threshold in
detection (higher level threshold could avoid these); while at the second run (BME
TMIT RUN2) these detections were filtered out, and corresponding
predictions in the crucial videos were deleted.
3
3.1</p>
    </sec>
    <sec id="sec-9">
      <title>Evaluation</title>
    </sec>
    <sec id="sec-10">
      <title>Evaluation metrics</title>
      <p>In the official evaluation the normalised counting score is measured (instead
of accuracy as in our preliminary testing). The counting score (CS) is defined
as can be seen in Equation (1), where d is the difference between the
number of occurrences in the run (per species) and the number of occurrences in
the ground truth (Ngt).
The precision (Pr) is defined as Pr= TP/(TP+FP) with TP and FP being,
respectively, the true positives and the false positives. The normalised counting
score (NCS) is defined as NCS = CS x Pr.
3.2</p>
    </sec>
    <sec id="sec-11">
      <title>Final official results</title>
      <p>Our final official results for each fish species can be seen in Table 3., where
the occurrences of fish species in the test set and the NCS (Normalized
Counting Score) results are presented.</p>
      <p>The aggregated official results of different fish species can be seen in Table 4.
(the last column is the product of previous ones), and we can conclude that
the second run of our submissions was better.
We have elaborated a complex system to detect and track objects (fish) in
underwater video by examining each image frame of it. For the detection
process we used background subtraction and morphologic methods, and
then our solution calculated bounding boxes based on object contours. The
first time we detect an object, we assign a Kalman filter to it, which is able to
predict the possible location of that object in the next frame. We likely
detect that particular object (the same fish) in the next frame also, but we
should not apply a new Kalman filter. To deal with this, we used Hungarian
method to pair the existing Kalman filters with the new “candidate” ones.
Then we erased the candidate Kalman filters with matching pair, and kept
the single ones. This way we were able to track the detected objects. We also
categorized the detected fishes with C-SVC classifier; however this required
representing the objects based on visual information. For this purpose, our
system calculated SURF descriptors for each object, and then clustered them
with K-means algorithm. Our solution built histograms for each fish based on
the resulting cluster centres (according to BoW model), and these histograms
were the input of the C-SVC. For optimization the C-SVC classifier we
conducted a preliminary test, and we used the best parameters for teaching the
classifier. We predicted the fish species in the official test video set based on
our implemented model, and the number of occurrences of each fish species
was enumerated. At the official evaluation (by the second submission) we
reached 0.34 value of NCS (Normalized Counting Score).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Joly</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Müller</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goeau</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glotin</surname>
            ,
            <given-names>H</given-names>
          </string-name>
          , Spampinato,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Rauber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Bonnet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Vellinga</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. P.</surname>
          </string-name>
          , Fisher, B.:
          <article-title>LifeCLEF 2015: multimedia life species identification challenges</article-title>
          ,
          <source>Proceedings of CLEF</source>
          <year>2015</year>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Villegas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Müller</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gilbert</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Piras</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mikolajczyk</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Herrera</surname>
            ,
            <given-names>A. G. S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bromuri</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Amin</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mohammed</surname>
            ,
            <given-names>M. K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Acar</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Uskudarli</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marvasti</surname>
            ,
            <given-names>N. B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aldana</surname>
            ,
            <given-names>J. F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>García</surname>
            ,
            <given-names>M. M.</given-names>
          </string-name>
          R.:
          <source>General Overview of ImageCLEF at CLEF2015 Labs. Lecture Notes in Computer Science</source>
          , Springer (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Spampinato</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fisher</surname>
          </string-name>
          , B and
          <string-name>
            <surname>Boom</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <source>LifeCLEF Fish Identification Task</source>
          <year>2015</year>
          , CLEF working notes
          <year>2015</year>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. KaewTraKulPong P. and
          <string-name>
            <surname>Bowden</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>An Improved Adaptive Background Mixture Model for Real- time Tracking with Shadow Detection</article-title>
          ,
          <source>In Proc. 2nd European Workshop on Advanced Video Based Surveillance Systems, AVBS01. Sept</source>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. Fisher, R.,
          <string-name>
            <surname>Perkins</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Walker</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolfart</surname>
          </string-name>
          , E.: Mathematical Morphology, (
          <year>2003</year>
          ) http://homepages.inf.ed.ac.uk/rbf/HIPR2/matmorph.htm
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Suzuki</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Abe</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,:
          <article-title>Topological Structural Analysis of Digitized Binary Images by Border Following</article-title>
          .
          <source>Computer Vision</source>
          , Graphics, and Image Processing,
          <volume>30</volume>
          (
          <issue>1</issue>
          ),
          <fpage>32</fpage>
          -
          <lpage>46</lpage>
          . (
          <year>1985</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Kalman</surname>
          </string-name>
          , R. E.:
          <article-title>A New Approach to Linear Filtering and Prediction Problems</article-title>
          .
          <source>Journal of Basic Engineering</source>
          ,
          <volume>82</volume>
          (
          <issue>1</issue>
          ),
          <fpage>35</fpage>
          -
          <lpage>45</lpage>
          . (
          <year>1960</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Welch</surname>
            ,
            <given-names>G. F.: Kalman</given-names>
          </string-name>
          <string-name>
            <surname>Filter</surname>
          </string-name>
          .
          <source>Computer Vision: A Reference Guide</source>
          ,
          <fpage>435</fpage>
          -
          <lpage>437</lpage>
          . (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Kuhn</surname>
            ,
            <given-names>H. W.:</given-names>
          </string-name>
          <article-title>The Hungarian method for the assignment problem</article-title>
          .
          <source>Naval research logistics quarterly</source>
          ,
          <volume>2</volume>
          (
          <issue>1‐2</issue>
          ),
          <fpage>83</fpage>
          -
          <lpage>97</lpage>
          . (
          <year>1955</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Frank</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>On Kuhn's Hungarian method-a tribute from Hungary</article-title>
          .
          <source>Naval Research Logistics (NRL)</source>
          ,
          <volume>52</volume>
          (
          <issue>1</issue>
          ),
          <fpage>2</fpage>
          -
          <lpage>5</lpage>
          . (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Bay</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <article-title>and</article-title>
          <string-name>
            <surname>Tuytelaars</surname>
          </string-name>
          , T. and
          <string-name>
            <surname>Van Gool</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>SURF: Speeded Up Robust Features</article-title>
          ,
          <source>9th European Conference on Computer Vision</source>
          , (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Lowe</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Object recognition from local scale-invariant features</article-title>
          . In: ICCV (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Lowe</surname>
            ,
            <given-names>D. G.</given-names>
          </string-name>
          :
          <article-title>Distinctive Image Features from Scale-Invariant Keypoints</article-title>
          ,
          <source>International Journal of Computer Vision</source>
          ,
          <volume>60</volume>
          ,
          <issue>2</issue>
          , pp.
          <fpage>91</fpage>
          -
          <lpage>110</lpage>
          , (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Fei-Fei</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fergus</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , &amp; A.
          <string-name>
            <surname>Torralba</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Recognizing and Learning Object Categories</article-title>
          ,
          <source>IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          , (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Lazebnik</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmid</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Ponce</surname>
          </string-name>
          , J.:
          <article-title>Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories</article-title>
          ,
          <source>Proceedings of the IEEE Conference on Computer Vision</source>
          and Pattern Recognition, New York, Vol.
          <volume>2</volume>
          , pp.
          <fpage>2169</fpage>
          -
          <lpage>2178</lpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>MacQueen</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>Some methods for classification and analysis of multivariate observations</article-title>
          ,
          <source>Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability</source>
          , Vol.
          <volume>1</volume>
          , pp.
          <fpage>281</fpage>
          -
          <lpage>297</lpage>
          (
          <year>1967</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Boser</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guyon</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vapnik</surname>
          </string-name>
          , V.:
          <article-title>A Training Algorithm for Optimal Margin Classifier</article-title>
          ,
          <source>Proc. of the 5th Annual ACM Workshop on Computational Learning Theory</source>
          , pp.
          <fpage>144</fpage>
          -
          <lpage>152</lpage>
          (
          <year>1992</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Cortes</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vapnik</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Support-vector networks</article-title>
          ,
          <source>Machine Learning</source>
          , Vol.
          <volume>20</volume>
          , No.
          <issue>3</issue>
          , pp.
          <fpage>273</fpage>
          -
          <lpage>297</lpage>
          (
          <year>1995</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>