<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Pivot Selection for Dimension Reduction Using Annealing by Increasing Resampling*</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yasunobu Imamura</string-name>
          <email>imamura.kit@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Naoya Higuchi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tetsuji Kuboyama</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kouichi Hirata</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Takeshi Shinohara</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Gakushuin University</institution>
          ,
          <addr-line>Mejiro 1-5-1, Toshima, Tokyo 171-8588</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Kyushu Institute of Technology</institution>
          ,
          <addr-line>Kawazu 680-4, Iizuka 820-8502</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <abstract>
        <p>In order to select an optimal set of pivots for dimension reduction, such as Simple-Map and sketches based on ball partitioning, we propose a method named Annealing by Increasing Resampling (AIR, for short). AIR assumes that every state is evaluated by using a sample set. Starting from an arbitrary initial state, AIR repeats to transit states by hill climbing, with evaluating the resampled sets whose size initially is small and gradually increases. Experiments verify that AIR can find better sets of pivots than the conventional method and in shorter time than simulated annealing.</p>
      </abstract>
      <kwd-group>
        <kwd>Similarity Search</kwd>
        <kwd>Dimension Reduction</kwd>
        <kwd>Pivot Selection</kwd>
        <kwd>Simulated Annealing</kwd>
        <kwd>Annealing by Increasing Resampling</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Similarity search is one of the most important tasks for information retrieval of
multidimensional data. In this paper, we deal with similarity search in metric spaces,
where objects within smaller distance are considered similar. Thus, similarity search
is a task to find objects near to a given query object.</p>
      <p>
        When the dimensionality of objects is m, the computational cost to measure
distance between two objects is O(m), and when the number of database objects is n, a
naïve similarity search by sequential manner needs O(mn) cost, which is unrealistic
for larger m and n. In order to weaken the effect of n, hierarchical index structures
such as R-Tree [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and M-Tree [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ] have been developed. On the other hand, the
dimension reduction is a method to avoid influence of m.
      </p>
      <p>
        Dimension reductions for Euclidean spaces include K-L transformation (or
principal component analysis, PCA) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and FastMap [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. On the other hand, dimension
reductions such as H-Map [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and Simple-Map (S-Map) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] are applicable to any
metric spaces metricized by L1 distance, Hamming distance, string edit distance and
so on [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        Sketches [
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref13 ref9">9–13</xref>
        ] are representations of objects in multidimensional data by
compact bit strings to reduce search spaces. In conventional search with sketches,
Hamming distance between sketches is adopted. However, the mapping to sketches on
Hamming distance can never imply a dimension reduction. On the other hand, since
quantization of S-Map based on Lp distance [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] is regarded as a kind of sketches and
provides a method to compute the distance lower bound between queries and
sketches, it can provide the sketch mapping implying a dimension reduction.
      </p>
      <p>The dimension reduction not only reduces distance computation cost but also
avoids so-called “the curse of dimensionality”. For example, it is known that the
efficiency of R-Tree is decreasing when the dimension is increasing, but the
performance can be improved if R-Tree is constructed on projected objects into lower
dimensional space by S-Map.</p>
      <p>In the dimension reduction, the object is projected to a low dimensional data or a
compact bit string so that the projected distance does not extend with respect to the
original distance. Although the projected objects of low dimensions cannot
completely maintain the original distance relationship, it is important to reduce the information
loss. Because the projection distance does not extend the original distance, it is
guaranteed that distant objects in the projection space are far from the original space, so
“safely pruning” can be done by searching in the projection space. However, if the
shrinkage of the distance is large, the object outside the retrieval range actually
becomes closer in the projection space, resulting in deterioration in retrieval efficiency.
For PCA, analytically optimal projection can be obtained. On the other hand, for
HMap, S-Map and sketches, it has been known no analytically optimal solution, and
therefore, it is necessary to use random selection with evaluation function as a clue or
heuristic method such as annealing method.</p>
      <p>
        In S-Map, the reference object is selected as a pivot [
        <xref ref-type="bibr" rid="ref15 ref16 ref17">15–17</xref>
        ], and the distance
between each object and the pivot is set as a coordinate value, thereby the number of
coordinates is given as the number of pivots. Then, the number of pivots at this time
is the dimensionality of the projection space and the distance between objects in the
projection space is given as the L∞ distance. Here, a ball partitioning (BP) is to assign
0 and 1 to the inside and the outside of a ball of radius r centered on the reference
object p, respectively. Then, the sketch using BP can be regarded as the quantization
of S-Map image to 0 or 1 depending on whether the distance from the pivot p of the
S-Map is not less than the radius r [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>
        Note that conventional search methods such as random selection, local search and
simulated annealing and binary quantization method using distribution characteristics
of data [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] have been adopted to search a set of pivots for S-Map and BP sketches.
All of them are optimized by evaluating values concerned with samples. In the
SMap, the distance preservation ratio is adopted as an evaluation value to maximize it.
In BP, the collision probability is adopted as the evaluation value to minimize it. In
this paper, we propose a new method named annealing by increasing resampling
(AIR) as an optimization method and verify the effectiveness in pivot selection.
      </p>
      <p>The simulated annealing (SA) is a search method to transit stochastically
according to temperature with evaluating values from the current provisional solution to its
neighbor. At the beginning, it starts from a state of high temperature and gradually
lowers the temperature. At high temperature, the probability of transition to low
evaluation value is high. When it has low temperature, it transits only according to
the evaluation value, that is, it behaves as a local search. On the other hand, this
paper proposes a method named Annealing by Increasing Resampling (AIR, for short),
where a hill climbing is carried out by using subsample resampled from the sample
used for evaluation, and the resampling number is gradually increased. While the
number of resampling is small, the evaluation error for the entire sample is large, so
the probability of making a transition to a low evaluation is high. That is, the
transition using a small number of samples is similar to the random transition at high
temperature in SA. As the number of resampling increases, the error of evaluation
gradually decreases and approaches the local search. In this way, the behavior of AIR is
very similar to SA.</p>
      <p>Empirically, in order to obtain a good solution in a wide area by SA, it is necessary
to increase the number of transitions at high temperature, so it takes much time to
process at high temperature. On the other hand, in AIR, process to high temperature
in SA corresponds to transition using a small number of resamples, and evaluation
with a small number of samples is low in cost. Therefore, AIR is possible to realize
processing at high temperature in low cost, which needs high cost in conventional SA.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Preliminaries</title>
      <p>In this section, we briefly introduce dimension reduction, Simple-Map, and ball
partitioning (BP) sketch to which the optimization method proposed in this paper is
applied.</p>
      <p>Let (U, D) and (U′, D′) be two metric spaces, where D and D′ are distance
functions satisfying the triangle inequality. The dimensionality of data x is denoted
dim(x). A mapping φ:U→U′ is called a dimension reduction, if the following
conditions are satisfied for any x, y  U.</p>
      <p>dim(  (x))  dim( x) (1)</p>
      <p>D( (x), ( y))  D(x, y) (2)
Condition (1) means that it reduces the dimensionality, and condition (2) means that
D′ gives a lower bound of D, respectively.</p>
      <p>A Simple-Map (S-Map) is based on the projection p , using a point p called a
pivot, defined as follows.</p>
      <p>From the triangle inequality, the following formula holds for x, y  U.</p>
      <p> p (x)  D( p, x)
 p (x)  p ( y)  D(x, y)
Using a set P  { p1,, pm} of pivots, we define an S-Map  P and a distance D′ as
follows.</p>
      <p> P (x)  ( p1 (x),, pm (x))</p>
      <p>m</p>
      <p>D( P (x), P ( y))  mia1x  pi (x)   pi (y)
Thus, when m is smaller than the original dimension,  P becomes a dimension
reduction.</p>
      <p>Projecting objects with S-Map, the distance between them may shrink. This
shrinkage, that is, the distance deficiency, is desired to be small for similarity search.
Increasing the projective dimension reduces the shrinkage of the distance, but it is
strongly influenced by “the curse of dimensionality.” Thus, it is important to
minimize the shrinkage of the distance in a lower dimension. The distance preservation
ratio for a set S of pairs ( xi , yi ) of points is the following ratio of sums of distances.
 D( (xi ), ( yi ))</p>
      <p> D(xi , yi )</p>
      <p>
        Sketches [
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref13 ref9">9–13</xref>
        ] are compact bit sequences representing multidimensional data. In
this paper, we consider sketches based on ball partitioning (BP). A pivot for BP is a
pair (p, r) of a point p and a radius r. A BP projection  ( p,r) using a pivot (p, r) is
defined as follows.
      </p>
      <p>0 if D( p, x)  r,
 ( p,r) (x)  1 otherwise.</p>
      <p>A sketch mapping  P of width w bits is defined by a set of pivots P  ( p1, r1), ,
( pw , rw ) as follows.</p>
      <p> P (x)  ( p1,r1) (x) ( pw,rw) (x)</p>
      <p>For example, let consider 4 points A, B, C and D in a Euclidian plane as in Figure
1. Then, sketches using pivots P  {( p1, r1), ( p2 , r2 )} are  P (A)  01 ,  P (B)  00 ,
 P (C)  10 and P ( A)  11 .</p>
      <p>The conventional similarity search using sketches consists of two stages. First,
candidates are selected based on Hamming distances between sketches. Then, the
A:01
p1</p>
      <p>B:00
r1
p2
C:10
r2</p>
      <p>
        D:11
answer is selected from candidates using actual distance. As long as Hamming
distance is used, sketch mapping can never imply a dimension reduction. Ohno et.al.
[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] proposed a method to compute lower bound of distance using sketches.
Therefore, such a sketch mapping can imply a kind of dimension reduction. We use the
collision probability of a sketch mapping using a set P of pivots as the evaluation
function in optimization. We say a collision occurs when two distinct points share the
same sketch.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Annealing by Increasing Resampling</title>
      <p>First we give several assumptions for optimization problem. Let  be the search
space of possible solutions. We call an element of  a state. A cost function f gives
the evaluation value of a state x with respect to a sample set S. Roughly speaking, an
optimization problem is to find a solution x from  whose evaluation value is the
smallest. The cost function f is desirable to satisfy the following formula for any
sample sets S1, S2  S and any state x  .</p>
      <p>min  f (S1, x), f (S2 , x)  f (S1  S2 , x)  max  f (S1, x), f (S2 , x)
(3)
For example, if f is defined by the average of evaluation values for individual samples
such as the distance preservation ratio of S-Map, f satisfies the formula (3). The
collision probability of sketch approximately satisfies the formula (3) except smaller
sample sets. Further, we assume that the neighbor N(x) of a state x always satisfies the
following statement.</p>
      <p>x, y Φ, y  N * (x)
Here, N* is a reflexive and transitive closure of N. Then, the above statement claims
that we can get any state y from any x by finitely many applications of N.</p>
      <p>We present the algorithm of Annealing by Increasing Resampling (AIR, for short)
in Figure 2. Here, the iteration number i of the loop is considered as time, t: ℕ→(0, 1]
is a monotonic increasing function to give the ratio of resampling number with respect
to total samples S and T is the total number of state transitions. Note that resampling
function AIR(S:samples):state;
begin
x := any state;
for i := 1 to T
begin
end
end
return x;</p>
      <p>R := randomly selected samples of size t(i)×|S| from S;
if f(x, R) &gt; f(y, R) for some yN(x) then
x := y;
at the i-th iteration should be done independently to the preceding resampling.
Because it is not appropriate for AIR to make the larger set by adding samples to the
previous smaller set in incremental manner.</p>
      <p>Note that, when t(i) = 1 for any i, AIR always uses total samples for state
evaluation, thus, it behaves like so called local search. We do not care about detail of
method to select a state from the neighbor N(x). In practice, we may select a state with the
best evaluation value within a subset of N(x) in a steepest descent manner.</p>
      <p>Since, at the beginning stage, the number of resampled samples R is small, the
error of f(x, R) with respect to f(x, S) becomes large with high probability, and therefore,
AIR may make state transition to a state with lower evaluation value. Thus, AIR
makes random walks as SA at high temperature. Finally when t(i) becomes close to
1, AIR behaves as local search because f(x, R)≒f(x, S).</p>
      <p>As for the advantage of AIR, it can make search at the beginning stage faster,
because state evaluation using smaller samples can be done in low cost. On the other
hand, a conventional SA needs high cost for state transitions in high temperature.
There is no significant difference of AIR and SA in convergence speed, because AIR
can behave almost same as SA by using resampling sizes corresponding to the
annealing schedule.
4</p>
    </sec>
    <sec id="sec-4">
      <title>EXPERIMENTS</title>
      <p>In this section, we give experimental results on optimization of dimension reductions
S-Map and BP sketch by the proposed method. We use two kinds of data, feature
data of images (images) and SISAP colors database (colors). The number of data in
images is 6.8 million extracted from 1,700 videos and dimensionality n of data in
images is 64. On the other hand, the number of data in is about 0.1 million and
dimensionality n of data in colors is 112. For both data of images and colors, each axis
has integer value from 0 to 255 and distances between them are L1.
4.1</p>
      <p>Simple-Map
In this experiment, we adopt m = 8 for the dimensionality of S-Map, which shows the
best performance in similarity search using R-Tree constructed by S-Map images.
We use the average value (Ave.) and the standard derivation (S.D.) for distance
preservation ratio (DPR) to evaluate pivot sets using randomly selected 5,000 pairs of
features. AIR finds a pivot set with maximum distance preservation ratio. A pivot set
P = {p1, … , pm} consists of mn integers corresponding to m pivots of n dimension.
The neighbor N(P) of a pivot set P is defined as the set of pivot sets such that any P’
in N(P) is the same as P but at one of mn integers. For data of images, features
consist of 8-bit integers from 0 to 255. Therefore, N(P) consists of 256mn = 256×8×64
pivot sets. In our experiments, we implement AIR to randomly choose one of mn
integers of P, change it from 0 to 255, and move to the best of 256 neighbors of P.
That is, AIR makes a hill climbing using subsets of neighbors.</p>
      <p>
        We compare AIR with conventional simulated annealing (SA), binary quantization
(BQ)[
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] and local search (LS). BQ is a heuristic method using stochastic property of
data which can find relatively good pivot set within a small computation time. Table
1 shows the results for images. We repeat each method at 10 times. The computing
times of BQ and LS are about 50 and 100 seconds, respectively. For SA and AIR, we
tuned parameters of the number of state transition trials, which is corresponding to T
in Figure 2, to compare their computing times with BQ and LS. We also run SA and
AIR in about 500 seconds.
      </p>
      <p>From Table 1, we can observe that AIR can find better pivot sets than BQ in about
50 seconds and LS in about 100 seconds. On the other hand, pivot sets by SA are
almost comparable with BQ and LS. For every case of computing time about 50, 100
and 500 seconds, the number of state transition trials by AIR is about 8 times as large
as one by SA. This experimentally shows the AIR’s merit to SA pointed out in
Section 3.</p>
      <p>From Table 2, which shows the results for colors, we can observe the similar
behavior of AIR to images in Table 1.
In this experiment, we adopt w = 32 bits as the width of sketch. Neighbors of pivot
set are similarly defined as for S-Map. Radius of a pivot is selected to equally divide
space by the ball. The set S of samples for evaluating pivot sets consists of randomly
selected 10,000 points from database. We use collision probability (CP) to evaluate
pivot set to be minimized.</p>
      <p>We compare AIR with a conventional ball partitioning with random selection (BP),
BP using binary quantization (QBP). As for observation of the search performance,
we show their precision. Nearest neighbor search using sketches consists of two
stages. At first stage, we select candidates using Hamming distance, that is, we select
the top K nearest data in the meaning of Hamming distance. At the second stage, we
select the nearest neighbor from the K candidates. Search precision is the probability
that top K candidates include the exact nearest neighbor. For both databases images
and colors, we adopt K as the 0.1% of the database size, which is reasonable from
both viewpoints of speed and precision.</p>
      <p>Table 3 and 4 show results for sketches on images and colors, respectively.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Concluding Remarks</title>
      <p>In this paper, we have proposed a method of Annealing by Increasing Resampling
(AIR, for short) to select an optimal set of pivots for dimension reduction. As shown
in Table 1, 2, 3 and 4, AIR can efficiently find better sets of pivots than the
conventional method from the viewpoint of evaluation function used for optimization.
However, from both Table 3 and 4, from the viewpoint of search precision, the best pivot
set is found by the conventional method QBP. However, this is completely the matter
of evaluation function. It is a future work to explain the behavior of AIR
theoretically. For example, we expect that the solution found by AIR will eventually converge
to the optimum one. It is also an important future work for similarity search to
inves</p>
      <sec id="sec-5-1">
        <title>Time</title>
        <p>(sec)
116
106
97.8</p>
      </sec>
      <sec id="sec-5-2">
        <title>Time</title>
        <p>(sec)
174
163
306
tigate other evaluation functions than distance preservation ratio for S-Map and
collision probability for sketch.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Guttman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>R-trees: A dynamic index structure for spatial searching</article-title>
          ,
          <source>Proc. SIGMOD'84</source>
          , pp.
          <fpage>47</fpage>
          -
          <lpage>57</lpage>
          (
          <year>1984</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Ciaccia</surname>
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patella</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zezula</surname>
            <given-names>P.:</given-names>
          </string-name>
          <article-title>M-tree: An Efficient Access Method for Similarity Search in Metric Spaces</article-title>
          ,
          <source>Proc. VLDB'97</source>
          , pp.
          <fpage>426</fpage>
          -
          <lpage>435</lpage>
          (
          <year>1997</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Zezula</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Savino</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Amato</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rabitti</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Approximate similarity retrieval with mtrees</article-title>
          ,
          <source>VLDB J</source>
          . vol.
          <volume>7</volume>
          , pp.
          <fpage>275</fpage>
          -
          <lpage>293</lpage>
          (
          <year>1998</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Fukunaga</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Statistical pattern recognition. (Second edition</article-title>
          ), Academic Press (
          <year>1990</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Faloutsos</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>K.I.:</given-names>
          </string-name>
          <article-title>FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets</article-title>
          ,
          <source>Proc. ACM SIGMOD'95</source>
          vol.
          <volume>24</volume>
          , pp.
          <fpage>163</fpage>
          -
          <lpage>174</lpage>
          (
          <year>1995</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Shinohara</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ishizaka</surname>
          </string-name>
          , H.
          <string-name>
            <surname>: H-Map</surname>
          </string-name>
          :
          <article-title>A dimension reduction mapping for approximate retrieval of multi-dimensional data</article-title>
          ,
          <source>Proc. DS'99</source>
          , LNAI vol.
          <volume>1721</volume>
          , pp.
          <fpage>299</fpage>
          -
          <lpage>305</lpage>
          (
          <year>1999</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Shinohara</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ishizaka</surname>
          </string-name>
          , H.:
          <article-title>On dimension reduction mappings for approximate retrieval of multi-dimensional data, Progress of Discovery Science</article-title>
          , LNCS vol.
          <volume>2281</volume>
          , pp.
          <fpage>89</fpage>
          -
          <lpage>94</lpage>
          (
          <year>2002</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Deza</surname>
            ,
            <given-names>M.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deza</surname>
          </string-name>
          , E.:
          <article-title>Encyclopedia of distances (Second edition</article-title>
          ), Springer (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Dong</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Charikar</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Asymmetric distance estimation with sketches for similarity search in high-dimensional spaces</article-title>
          ,
          <source>Proc. ACM~SIGIR'08</source>
          , pp.
          <fpage>123</fpage>
          -
          <lpage>130</lpage>
          (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Mic</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Novak</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zezula</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Improving sketches for similarity search</article-title>
          ,
          <source>Proc. MEMICS'15</source>
          , pp.
          <fpage>45</fpage>
          -
          <lpage>57</lpage>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Mic</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Novak</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zezula</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Speeding up similarity search by sketches</article-title>
          ,
          <source>Proc. SISAP'16</source>
          , pp.
          <fpage>250</fpage>
          -
          <lpage>258</lpage>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Müller</surname>
            ,
            <given-names>A.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shinohara</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Efficient similarity search by reducing I/O with compressed sketches</article-title>
          ,
          <source>Proc. SISAP'09</source>
          , pp.
          <fpage>30</fpage>
          -
          <lpage>38</lpage>
          (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dong</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Josephson</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lv</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Charikar</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Sizing sketches: A rankbased analysis for similarity search</article-title>
          ,
          <source>Proc. ACM SIGMETRICS'07</source>
          , pp.
          <fpage>157</fpage>
          -
          <lpage>168</lpage>
          (
          <year>2007</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Onho</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Murakami</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kawaguchi</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kishikawa</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shinohara</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Similarity search of multi-dimensional database by quantization dimension reduction mapping</article-title>
          ,
          <source>Hinokuni Information Processing Symposium</source>
          <year>2012</year>
          ,
          <string-name>
            <surname>ISPJ</surname>
          </string-name>
          <article-title>Kyushu (in Japanese) (</article-title>
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Chavez</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Navarro</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baeza-Yates</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marroqu</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>Searching in metric spaces</article-title>
          .
          <source>ACM Comput. Surv</source>
          . vol.
          <volume>33</volume>
          , pp.
          <fpage>273</fpage>
          --
          <lpage>321</lpage>
          (
          <year>2001</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Mao</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miranker</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miranker</surname>
            ,
            <given-names>D. P.</given-names>
          </string-name>
          :
          <article-title>Pivot Selection: dimension reduction for distancebased indexing</article-title>
          .
          <source>J. Discret. Algo</source>
          . vol.
          <volume>13</volume>
          ,
          <fpage>32</fpage>
          --
          <lpage>46</lpage>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Mao</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , Zhang,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            ,
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            ,
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          :
          <article-title>Pivot selection for metric-space indexing</article-title>
          ,
          <source>Internat. J. Mach. Learn. Cybernet</source>
          . vol.
          <volume>7</volume>
          , pp.
          <fpage>311</fpage>
          --
          <lpage>323</lpage>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Hau</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shinohara</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Pivot selection in dimension reduction projection Simple-Map with quantization</article-title>
          ,
          <source>Hinokuni Information Processing Symposium</source>
          <year>2015</year>
          ,
          <string-name>
            <surname>ISPJ</surname>
          </string-name>
          <article-title>Kyushu (in Japanese) (</article-title>
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>