<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Reduction dimension of bags of visual words with FCA</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ngoc Bich Dao</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Karell Bertet</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Arnaud Revel</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Laboratoire L3i, University of La Rochelle</institution>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In image retrieval involving bag of visual words, reduction dimension is a fundamental task of data preprocessing. In recent years, several methods have been proposed for supervised and unsupervised cases. In the supervised case, the problem has been addressed with encouraging results. However, in the unsupervised case, reduction dimension is still an unavoidable challenge. In this article, we propose an application of a logic reduction dimension method which is based on Formal Concept Analysis for image retrieval. This method is the reduction of a closure system without, theoretically, loss of information. In our context, combining our proposed method with bag of visual words is original. Experimental results on ve data sets such as COREL, CALTECH256, VOC2005, VOC2012 and MIR ickr are analyzed to show the in uence of the data structures and the parameters on the reduction factor.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Thanks to the generalization of multimedia devices, huge collections of digital
images are available today. As far as mining in multimedia documents is
concerned, web search engines usually give poor results. Hence, such results are far
from expected regarding the semantics of the documents. Content Based Image
Retrieval (CBIR)[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] has been investigated in order to give an answer to this
problem for decades. The main idea is to build a description based on the image
content, and to nd similarities between descriptions. Classically, visual features
are extracted from images and then compiled into an index or signature to give
a dense description of images. To perform the retrieval, a similarity function is
computed to compare the index of the query with those of collection. A ranking
of the results according to the calculated similarity is proposed to the users. The
detection of visual features can be performed by a SIFT detector[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] or a dense
grid which both select an important number of interest points (up to several
thousands) from the images. Each of these points is then described thanks to a
SIFT-like descriptor. However, to limit the dimension of the description space, a
vector quantization (usually k-means) is performed in order to cluster similar
interest points into "visual words", and to generate a dictionary of "visual words"
(usually up to 1000 words). Then, the signature of the image is composed of
the set of all the visual words corresponding to each feature point detected into
the image (what formed a "bag of visual words"[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]). The comparison between
the images then consists in comparing the bags of visual words of each image
in a dataset. The processing cost introduced by these techniques makes them
di cult to use with large amounts of images such as a query on the Internet.
      </p>
      <p>On the other hand, supervised data is labeled (the data has ground truth)
and classi cation methods are required to deal with the categorization problem.
Data in the case unsupervised is unlabeled, hence clustering methods are used
to gather the similar observations in the same cluster. There are many
applications for classi cation and clustering on many domains of computer science
such as bioinformatics, numerical analysis, machine learning, data mining,
pattern recognition, etc., where data may contain a grand set of features, means
the description of the data is high dimension, and therefore it need to be
reduced. However, reduction them while preserving the quality of the data is still
challenging.</p>
      <p>To be able to manage high dimensional description spaces, reduction
techniques have been proposed. These techniques are much used as a data
preprocessing step in machine learning and pattern recognition. This step can usually
increase the accuracy of the results in the next steps such as classi cation or
clustering while the computational cost and time cost of the former step may be
signi cantly decreased. Regarding statistics and machine learning literature, we
distinguish two main strategies: feature extraction and feature selection. These
methods can be used for supervised case or unsupervised case. The main idea of
feature transformation consists in transforming the given set of features into a
new one. In case that the size of the new feature set is greater than the original
feature set, we called it the feature generation. And when new feature set size
is smaller than the original feature set, feature extraction is mentioned. Feature
selection methods propose a manipulation of data to select features from the
original set. This approach is interesting in some domains when they prefer the
existing features in order to maintain their physical properties.</p>
      <p>In this article, we propose a logic and unsupervised feature reduction method
issued from FCA to address the visual word reduction problem in a CBIR
system. In FCA, data are organised into a "context" by a set of observations (called
"objects", "samples" or "experimental units" in other elds) and a set of features
(also known as "attributes", "parameters", or "variables" in computer science,
machine learning and statistic communities) that are associated with each
observation.</p>
      <p>Context reduction is a simple and polynomial treatment in FCA classically
applied on the whole context, thus both reducing observations and features. This
treatment is based on a nice result establishing that the concept lattice of the
context can be reduced to a minimal one while preserving its graph structure
by deleting some redundant observations and features. For example, when two
attributes are shared by the same objects, then they belong to the same concepts
of the concept lattice, thus they are redundant and one of these two objects can
be deleted while preserving the concept lattice structure. In our case, we focus
on feature reduction of a context. Our algorithm accepts as input the closure
operator of the context on attributes set, and returns the redundant attributes.
Thus, this algorithm extends the classical attributes reduction of a context to the
more general case of data described by a closure operator. Moreover, we propose
a new application in image analysis for features reduction of visual words.</p>
      <p>This paper is organized as follows: In order to introduce our approach, we
recall some de nitions of formal concepts in the section 2.1. Section 2.2 shows
details our proposed method. Section 3 shows some experimental results with
real data. Finally, section 4 ends this paper with a conclusion and perspectives.
2</p>
    </sec>
    <sec id="sec-2">
      <title>The proposed features selection method</title>
      <p>The feature reduction algorithm we propose is a logic and unsupervised method
stemming from FCA where a concept lattice, de ning from a binary table,
represents the description of all object-attribute combinations. When the concept
lattice structure is preserved after the deletion of some attributes and objects,
then these attributes are "redundant" for the lattice structure and can be deleted
from the initial data without a ecting the structure of object-attributes
combinations. Therefore, from a theoretical point of view, the description of data is
equivalently represented by a concept lattice where "redundant" attributes and
objects are deleted.</p>
      <p>The reduction is a simple and polynomial treatment in FCA, classically
decomposed into two steps: attribute and object reduction. In this article, we focus
on attributes/features reduction, thus on the detection of redundant attributes
for the concept lattice structure reduced to attributes. A nice result establishes
that each subset of a concept (A,B) is a closure de ned on the objects and
attributes set, and the concept lattice reduced to the attributes/objects is denoted
a closure lattice.</p>
      <p>In the rst subsection, we introduce the notions of closure lattice according
to a closure operator, reduced closure lattice and redundant attributes. In the
second section, we presents the reduction algorithm aiming at removing
redundant attributes, with a closure operator as input. This algorithm is thus a generic
algorithm that can be applied either on attributes or on objects of a binary table,
but also on any closure system.
2.1</p>
      <sec id="sec-2-1">
        <title>Reduced lattice</title>
        <p>
          In FCA, the relationship between a set of attributes I and a set of objects O
are described by a formal context (O; I; ( ; )) where (A) the set of attributes
sharing by a subset A of objects, and (B) the set of objects sharing a subset
B of attributes. One can derive two closure systems from a context. The rst
one is de ned on the set of attributes I, with as closure operator. The
second one is de ned on the set of objects O with as closure operator[
          <xref ref-type="bibr" rid="ref18">18</xref>
          ].
A closure system ('; S) is de ned by a closure operator ' on a set S, i.e. a map
on P(S) satisfying the three following properties: ' is isotone, extensive and
idempotent. A subset X S is called closed if '(X) = X (see Table 2). The set
system F of all closed subsets, tted out with the inclusion relation , forms a
lattice usually called the closure lattice (see Fig. 1a). See the survey of Caspard
and Monjardet[
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] for more details about closure systems. There are in nitely
set systems whose closure lattice are isomorphic. A reduced closure lattice is a
closure lattice de ned on a set S of the smallest size among all isomorphic closure
lattices. A nice result[
          <xref ref-type="bibr" rid="ref18 ref20">20,18</xref>
          ] establishes that a closure system is reduced when,
for each x 2 S, the closure '(x) is a join irreducible (Equation 1).
8x 2 S; 8Y
        </p>
        <p>S so that x 62 Y; then '(x) 6= '(Y )
(1)</p>
        <p>Therefore, a non-reduced closure system contains reducible elements -
elements which do not satisfy Equation 1 - each reducible element x 2 S is then
equivalent to a set Ex S of equivalent elements with x 62 Ex and '(x) = '(Ex).
Reducible elements can be removed without a ecting the structure of the closure
lattice. The reduction of a closure system consists then in removing or replacing
each reducible element x 2 S by its equivalent set Ex.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Proposed reduction algorithm</title>
        <p>The algorithm we propose is a generic reduction algorithm since it only needs a
closure operator as input. Thus it can be applied with the same complexity on
any closure system, and in particular on a context by considering the attributes
- using as closure operator.</p>
        <p>a b c d e f g h
a b c d e f
(a) The context</p>
        <p>A direct application of the de nition (see Eq. 1) would imply an exponential
cost by checking if any subset Y S is equivalent to each x 2 S. We use the
precedence relation (precedence graph) for a polynomial reduction. The
precedence graph is de ned on the set S, with an edge between two elements x; y 2 S
when '(x) '(y). This graph is clearly acyclic for a reduced closure system.
We propose a generic algorithm in 3 steps:
Step 1: Standardization. Check if there exists x; y 2 S such that '(x) =
'(y). When '(x) = '(y), then x and y belong to the same strongly connected
components of the graph. Each strongly connected components X S
include the elements xi; xj so that '(xi) = '(xj ); 8xi 6= xj 2 X. Thus, we
can delete all elements except one representative element x 2 X of the
component. The obtained precedence graph is then an acyclic graph.
Step 2: Clari cation. Check if there exists x 2 S such that '(x) = '(;).</p>
        <p>When such an x exists, then '(x) is included into '(y) for any y 2 S, thus
x is the only source of the precedence graph. The clari cation test has only
to be performed for graphs with one source.</p>
        <p>Step 3: Reduction. Check, for any x 2 S, if there exists a set Ex S such
that x 2= Ex and '(x) = '(Ex). One can observe that an attribute x
with only one immediate predecessor y is not reducible, because it would
be equivalent to y, and thus belong to the same strongly connected
component already removed in the previous step. If there exists Ex S such
that '(x) = '(Ex), then elements of Ex are clearly predecessors of x in the
precedence graph since, for 8y 2 Ex, '(x) = \'(y). Moreover, this test can
be reduced to maximal predecessors of x. Therefore, this treatment has only
to be performed for elements with more than one immediate predecessors,
and the equality has to be checked with the set of immediate predecessors
of x.</p>
        <p>This algorithm takes into account a closure operator ' on a set S as input.
The output of the alforithm is the reducible element set X S and the equivalent
elements set Ex for each x 2 X.</p>
        <p>Alg. 1 reduces a closure system in O(jSj:c' + jSj2 log jSj) where c' is the cost
of a closure generation and |S| is the number of nodes. Indeed, the precedence
graph can be initialized in O(jSjc' + jSj2logjSj) by computing the closures in
O(jSjc'), and then comparing two closures in O(jSj2logjSj). Then, the SCCs can
be computed using Kosaraju's algorithm by two passes of depth rst search, thus
a complexity in O(jSj + jAj) O(jSj2), with jAj nb of edges in the graph.
Standardization and clari cation are clearly in O(jSj) by a simple pass into the graph.
Finaly, reduction considers the immediate predecessors of each x 2 S in O(jSj2),
and then computes and compare two closures in O(jSjc' +jSj2logjSj). Therefore,
Alg. 1 computes the attribute reduced context in O(jIj2jOj + jIj2logjIj). since a
closure can be obtained in O(jIj:jOj).</p>
        <p>Input: a closure operator ' on a set S
Output: the reducible elements set X</p>
        <p>for each x 2 X
init a set Res with ;;
init a graph G with S as set of node;
nn Precedence graph;
foreach (x; y) 2 S S do
if '(x) '(y) then</p>
        <p>add the edge (x; y) in G;
end</p>
        <p>S, and the equivalent elements set Ex
end
end
end
compute the set CF C of the strongly connected components of G;
let source be the sources of the graph G;
nn Step (1): Standardization;
foreach C 2 CF C do
choose y 2 C;
foreach x 2 C such that x 6= y do</p>
        <p>add x in Res with Ex = fyg; delete x from the graph G;
end
nn Step (2): Clari cation;
if jsourcej = 1 and '(source) = '(;) then</p>
        <p>add source in Res with Esource = ;; delete source from G;
end
nn Step (3): Reduction;
foreach x 2 G do
let P the set of immediate predecessors x in the graph G;
if jP j 6= 1 and '(x) = '(P ) then</p>
        <p>add x in Res with Ex = P ; delete x from the graph G;
end
return Res, (Ex)x2Res;</p>
        <p>Algorithm 1: Reduction of a closure system</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experimentation</title>
      <sec id="sec-3-1">
        <title>Datasets</title>
        <p>In our experiments, we compare the performance of the method we propose on
di erent image data sets. Each image in a data set is described by a vector
composed of the occurrence frequencies of its visual words, where a set of visual
words is de ned for each data set. Table 3 describes the di erent data sets we
used in our experiments, and the methods applied to generate the whole bag of
visual words.</p>
        <p>Database</p>
        <p>Images nb Features
nb</p>
        <p>Detector</p>
        <p>Descriptor
CMI (Colour Random</p>
        <p>
          Moment selection of
Invariants)[
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] all key points
        </p>
        <p>
          CMI1
SIFT[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]
SIFT2
SIFT
        </p>
        <p>Dictionary of
visual words</p>
        <p>
          Random
selection of
all key points
K-means[
          <xref ref-type="bibr" rid="ref25">25</xref>
          ]
(OpenCV)
K-means
(OpenCV)
K-means
(OpenCV)
VOC2012[
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]
        </p>
        <p>
          17124
MIR ickr[
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]
        </p>
        <p>
          24991
COREL[
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]
        </p>
        <p>4998
CALTECH</p>
        <p>
          256[
          <xref ref-type="bibr" rid="ref26">26</xref>
          ]
        </p>
        <p>
          Dataset 1
(VOC2005)[
          <xref ref-type="bibr" rid="ref27">27</xref>
          ]
30607
1354
4096
4096
500
500
262
        </p>
        <p>HarrisLaplace
HarrisLaplace
SIFT
SIFT</p>
        <p>HarrisLaplace and</p>
        <p>Laplacian3
As mentioned earlier, the algorithm we propose requires binary values indicating
for each object whether it possesses a given attribute or not. Since each image
is described by a visual word occurence frequency vector, its values can vary
from 0 to a max value depending on the image size and the quantity of visual
words in the image. For instance, if an image is black painted, there is only one
visual word "black" for the whole image with a big frequency, and the vector
1 http://koen.me/research/colordescriptors/
2 http://www.robots.ox.ac.uk/ vgg/research/a ne/#software
3 http://lear.inrialpes.fr/people/dorko/downloads.html
will be sparse. Conversely, an image with a patchwork of colors is described by
a frequency vector mainly composed of low but not zero values. To be able to
compare several images, it is thus necessary to normalize their frequency vector
before binarization.</p>
        <p>Normalization As mentioned before, the visual word occurrence frequency
can be very important in some images, and insigni cant in others. In order to
compare the visual words, several strategies can be adopted.</p>
        <p>First of all, it is necessary to nd out a "max" value in the data set and then
divide the visual word frequency by this max value to transform the values in a
range 0 to 1. Two manners to de ne the max value have been considered into
this article.</p>
        <p>Normalization by line (image) With this type of normalization, a max value is
computed for each image as being the maximum frequency value of the
corresponding image. The interpretation of this normalization is that we consider as
signi cant the ratio between the di erent attributes of a given image. This kind
of normalization does not depend on the database size and on the image size.
However, the normalized values do not account for the ratio measurement of the
same attribute between the images in the database.</p>
        <p>Normalization by column (feature) Normalization by column nds out the
maximum values of the frequency for each attribute in the database. With this
approach, the correspondence between the images in the database is taken into
account. The drawback is that each time a new image is inserted into the database,
the normalized values must be recomputed. Besides, the image size must also be
taken into account. Table 4 gives an illustrated example.</p>
        <p>f 1 f 2 f 3 f 4
img1 1 0 50 5
img2 10 9 1 8
img3 0 0 0 99
(a) Initial data</p>
        <p>f 1 f 2 f 3 f 4
img1 0.02 0 1 0.1
img2 1 0.9 0.1 0.8
img3 0 0 0 1</p>
        <p>f 1 f 2 f 3 f 4
img1 0.1 0 1 0.05
img2 1 1 0.02 0.08
img3 0 0 0 1
(b) After normalization (c) After normalization
by line by column
Binarization After the normalization, we simply binarize the normalized values
by comparing these values with a threshold varying from 0 to 0.9. At the highest
threshold one, in the normalization by line case, it is possible that most of the
attributes in an image should be below the threshold. To avoid removing all the
visual words from an image, the highest threshold has been assigned to 0.9.
Reduction The next phase in the algorithm is to apply our reduction method
which is itself composed of three steps (clari cation, standardisation, reduction).
Indeed, before applying the proposed method to bag of visual words, we must
remove all the visual words that appear (resp. do not appear) in each (resp.
any) image. This step corresponds to the clari cation. The standardization step
reduces the feature that the vector of images of a given feature equivalent to
the vector of images of another feature. At last, in the reduction step, all the
features which are the combination of other features are removed.
3.3</p>
      </sec>
      <sec id="sec-3-2">
        <title>Results</title>
        <p>In this section, we detail the results obtained with our reduction method for 5
data sets, described in section 2.2. To analyze the behavior of our method, and
the contribution of each step of the algorithm, we introduce the ratio of removed
features for each step of the reduction algorithm as follows:</p>
        <p>a ,
1 = Natt
2 = Natbt a ,</p>
        <p>c
3 = Natt a b</p>
        <p>Where a (resp. b and c) is the number of removed attributes in the
standardization (resp. clari cation and reduction) step; Natt is the attribute number in
total. Figure 2 shows the evolution of 1, 2, 3 with regard to the threshold
level, for both normalization types: line and column.</p>
        <p>The maximum ratio of removed attributes of the data sets (CALTECH,
COREL, VOC2005, MIR ickr, VOC2012) are approximately equal to 0.67%,
2.6%, 22.5%, 95%, 96% respectively. The impact of the reduction is more
interesting in the last three datasets. This phenomenon can be explained by the
bag of visual words generation since the two data sets MIR ickr and VOC2012
are composed of randomly selected visual words stemming from the keypoints
set. Conversely, the data sets CALTECH, COREL and VOC2005, are composed
of bags of visual words de ned by the SIFT detector and descriptor, and by a
K-means clustering. Thus, the randomly selected visual words are less consistent.</p>
        <p>We can also observe that the percentage of removed attributes increases while
the binarization threshold increases. With an increasing threshold, only the most
frequent words are kept, thus more attributes are potentially equivalent and
removed.</p>
        <p>At last, there is no attribute reduction in the step 1 ( 1 value) with a
normalization by column because this kind of normalization can not generate empty
columns. Morover, a normalization by line keeps the most frequent attributes in
each image whereas a normalization by column keeps the most frequent images
for each attribute. To summarize, the number of removed attributes depends
both on the visual words generation, on the chosen threshold of binarization
and on the normalization process (by line or column). However, care should be
taken, that the greater the binarization threshold is, the smaller the number of
images remaining. Except in the case normalization by line.</p>
        <p>CALT ECH</p>
        <p>COREL
V OC2005
M IRf lickr
V OC2012
(a) Normalization by line
(b) Normalization by column</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion and perspective</title>
      <p>In this article, we present a logic feature selection method of bags of visual
words. This method, stemming from Formal Concept Analysis, is a closure
system reduction without, theoretically, loss of information. That means that the
data description lattice is preserved by the reduction treatment. In our
context, combining our proposed method with a bag of visuals words is original.
The experimentations show that the number of deleted features can be
interesting, depending on the data set and the binarization treatment. Moreover, it is
possible to perform both an object and an attribute reduction.</p>
      <p>A ner analysis should be obtained in the supervised case, by comparing
classi cation performance before and after reduction. Moroever, the number of
potentially deleted objects could also be usefull to autmatically de ne a good
binarization thresold in the supervised case: while suppression of objects
belonging to the same class is to promote, we must avoid removing objects of di erent
classes. Objects reduction can easily be performed by applying our reduction
algorithm on the objects set.</p>
      <p>At last, we plan to study the number of deleted attributes and deleted objects
(of the same class / of di erent class) to evaluate the complexity of a data set,
and the quality of its visuals words.</p>
      <p>Acknowledgment: We would like to thank Thierry URRUTY, Nhu Van
NGUYEN and Dounia AWAD who extracted the bag of visual words we used
in this paper.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Smeulders</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Worring</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Santini</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gupta</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jain</surname>
          </string-name>
          , R.:
          <article-title>Content-based image retrieval at the end of the early years</article-title>
          .
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          <volume>22</volume>
          (
          <year>2000</year>
          )
          <volume>1349</volume>
          {
          <fpage>1380</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Lowe</surname>
            ,
            <given-names>D.G.</given-names>
          </string-name>
          :
          <article-title>Object recognition from local scale-invariant features</article-title>
          .
          <source>In: Proceedings of the International Conference on Computer Vision</source>
          , Kerkyra (
          <year>1999</year>
          )
          <volume>1150</volume>
          {
          <fpage>1157</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bosch</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zisserman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Munoz</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          :
          <article-title>Scene Classi cation Via pLSA</article-title>
          . In Leonardis,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Bischof</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Pinz</surname>
          </string-name>
          , A., eds.
          <source>: 9th European Conference on Computer Vision</source>
          . Volume
          <volume>3954</volume>
          of Lecture Notes in Computer Science.,
          <string-name>
            <surname>Graz</surname>
          </string-name>
          , Austria, Springer Berlin Heidelberg (
          <year>2006</year>
          )
          <volume>517</volume>
          {
          <fpage>530</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. Tu ery, S.:
          <article-title>Data mining et statistique decisionnelle: L'intelligence des donnees</article-title>
          .
          <source>Technip edn. Volume</source>
          <year>2010</year>
          . (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Belohlavek</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kruse</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vychodil</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Discovery of optimal factors in binary data via a novel method of matrix decomposition</article-title>
          .
          <source>Journal of Computer and System Sciences</source>
          <volume>76</volume>
          (
          <year>2010</year>
          )
          <volume>3</volume>
          {
          <fpage>20</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6. Fisher, R.A.:
          <article-title>The use of multiple measurements in taxonomic problems</article-title>
          .
          <source>The Annals of Eugenics</source>
          <volume>7</volume>
          (
          <year>1936</year>
          )
          <volume>179</volume>
          {
          <fpage>188</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Hotelling</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          :
          <article-title>Analysis of a complex of statistical variables into principal components</article-title>
          .
          <source>Journal of Educational Psychology</source>
          <volume>24</volume>
          (
          <year>1933</year>
          )
          <volume>417</volume>
          {
          <fpage>441</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
          </string-name>
          , H.:
          <article-title>Feature selection for high-dimensional data: A fast correlationbased lter solution</article-title>
          .
          <source>In: Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003)</source>
          , Washington DC (
          <year>2003</year>
          )
          <volume>856</volume>
          {
          <fpage>863</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Hall</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          :
          <article-title>Correlation-based feature subset selection for machine learning</article-title>
          .
          <source>Doctor of philosophy</source>
          , University of Waikato, Hamilton,
          <source>NewZealand</source>
          (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Battiti</surname>
          </string-name>
          , R.:
          <article-title>Using mutual information for selecting features in supervised neural net learning</article-title>
          .
          <source>IEEE transactions on neural networks / a publication of the IEEE Neural Networks Council</source>
          <volume>5</volume>
          (
          <year>1994</year>
          )
          <volume>537</volume>
          {
          <fpage>550</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Rakotomalala</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lallich</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Construction d'arbres de decision par optimisation</article-title>
          .
          <source>Revue Extraction des Connaissances et Apprentissage</source>
          <volume>16</volume>
          (
          <year>2002</year>
          )
          <volume>685</volume>
          {
          <fpage>703</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12. Kononenko, I.:
          <article-title>Estimating attributes: Analysis and extensions of RELIEF</article-title>
          . In
          <string-name>
            <surname>Bergadano</surname>
          </string-name>
          , F.,
          <string-name>
            <surname>Raedt</surname>
          </string-name>
          , L., eds.:
          <source>Machine Learning: ECML-94. Volume 784 of Lecture Notes in Computer Science</source>
          . Springer Berlin Heidelberg, Berlin, Heidelberg (
          <year>1994</year>
          )
          <volume>171</volume>
          {
          <fpage>182</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>He</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cai</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Niyogi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Laplacian score for feature selection</article-title>
          .
          <source>In: Neural Information Processing Systems</source>
          Foundation, MIT Press (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Devaney</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ram</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Efcient Feature Selection in Conceptual Clustering</article-title>
          .
          <source>In: Machine Learning: Proceedings of the Fourteenth International Conference</source>
          , Nashville,
          <string-name>
            <surname>TN</surname>
          </string-name>
          (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Dy</surname>
            ,
            <given-names>J.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brodley</surname>
            ,
            <given-names>C.E.</given-names>
          </string-name>
          :
          <article-title>Feature Selection for Unsupervised Learning</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>5</volume>
          (
          <year>2004</year>
          )
          <volume>845</volume>
          {
          <fpage>889</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Wolf</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shashua</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Feature Selection for Unsupervised and Supervised Inference: The Emergence of Sparsity in a Weight-Based Approach</article-title>
          .
          <source>The Journal of Machine Learning Research</source>
          <volume>6</volume>
          (
          <year>2005</year>
          )
          <year>1855</year>
          {
          <fpage>1887</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Elghazel</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aussem</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Unsupervised feature selection with ensemble learning</article-title>
          .
          <source>Machine Learning</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Barbut</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Monjardet</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Ordre et classi cation: algebre et combinatoire</article-title>
          . Hachette, Paris (
          <year>1970</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Caspard</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Monjardet</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>The lattices of closure systems, closure operators, and implicational systems on a nite set: a survey</article-title>
          .
          <source>Discrete Applied Mathematics</source>
          <volume>127</volume>
          (
          <year>2003</year>
          )
          <volume>241</volume>
          {
          <fpage>269</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Birkho</surname>
          </string-name>
          , G.:
          <source>Lattice Theory. 1st edn. American Mathematical Society</source>
          (
          <year>1940</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Everingham</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Van Gool</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Williams</surname>
            ,
            <given-names>C.K.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Winn</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zisserman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <string-name>
            <surname>The PASCAL Visual Object Classes (VOC) Challenge</surname>
          </string-name>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Mindru</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tuytelaars</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gool</surname>
            ,
            <given-names>L.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moons</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Moment invariants for recognition under changing viewpoint and illumination</article-title>
          .
          <source>Computer Vision and Image Understanding</source>
          <volume>94</volume>
          (
          <year>2004</year>
          )
          <volume>3</volume>
          {
          <fpage>27</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Huiskes</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lew</surname>
            ,
            <given-names>M.S.:</given-names>
          </string-name>
          <article-title>The MIR ickr retrieval evaluation</article-title>
          .
          <source>In: Proceeding of the 1st ACM international conference on Multimedia information retrieval - MIR '08</source>
          , New York, USA, ACM Press (
          <year>2008</year>
          )
          <volume>39</volume>
          {
          <fpage>43</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Carneiro</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chan</surname>
            ,
            <given-names>A.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moreno</surname>
            ,
            <given-names>P.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vasconcelos</surname>
          </string-name>
          , N.:
          <article-title>Supervised Learning of Semantic Classes for Image Annotation and Retrieval</article-title>
          .
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          <volume>29</volume>
          (
          <year>2007</year>
          )
          <volume>394</volume>
          {
          <fpage>410</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Macqueen</surname>
            ,
            <given-names>J.B.</given-names>
          </string-name>
          :
          <article-title>Some Methods for classi cation and Analysis of Multivariate Observations</article-title>
          .
          <source>In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability</source>
          . (
          <year>1967</year>
          )
          <volume>281</volume>
          {
          <fpage>297</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Gri n</surname>
          </string-name>
          , G.,
          <string-name>
            <surname>Holub</surname>
            ,
            <given-names>A.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perona</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          : Caltech-256
          <string-name>
            <given-names>Object</given-names>
            <surname>Category</surname>
          </string-name>
          <article-title>Dataset</article-title>
          .
          <source>Technical report</source>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Everingham</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zisserman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Williams</surname>
            ,
            <given-names>C.K.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Van Gool</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , Al.,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>The 2005 PASCAL Visual Object Classes Challenge</article-title>
          .
          <source>In: First PASCAL Machine Learning Challenges Workshop</source>
          ,
          <string-name>
            <surname>MLCW</surname>
          </string-name>
          <year>2005</year>
          . Volume
          <volume>3944</volume>
          of Lecture Notes in Computer Science., Berlin, Heidelberg, Springer Berlin Heidelberg (
          <year>2005</year>
          )
          <volume>117</volume>
          {
          <fpage>176</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>