<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Benchmarking Nearest Neighbor Search: In uence of Local Intrinsic Dimensionality and Result Diversity in Real-World Datasets</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Martin Aumuller</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matteo Ceccarello</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IT Unversity of Copenhagen</institution>
          ,
          <country country="DK">Denmark</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>corpus a d-dimensional real-valued vector. Common This paper reconsiders common benchmarking choices for d are between 50 and 300 dimensions. approaches to nearest neighbor search. It is shown Finding the true nearest neighbors in such a highthat the concept of local intrinsic dimensionality dimensional space is di cult, a phenomenon often (LID) allows to choose query sets of a wide range referred to as the \curse of dimensionality". In of di culty for real-world datasets. Moreover, the practice, it means that nding the true nearest e ect of di erent LID distributions on the running neighbors, in general, cannot be solved much more time performance of implementations is empirically e ciently than by a linear scan through the dataset studied. To this end, di erent visualization concepts (requiring time O(n) for n data points) or in space are introduced that allow to get a more ne-grained that is exponential in the dimensionality d, which overview of the inner workings of nearest neighbor is impractical for large values of d. search principles. The paper closes with remarks While we cannot avoid these general hardness about the diversity of datasets commonly used results, most datasets that are used in applications for nearest neighbor search benchmarking. It is are not truly high-dimensional. This means that the shown that such real-world datasets are not diverse: dataset can be embedded into a lower-dimensional results on a single dataset predict results on all space without too much distortion. Intuitively, other datasets well. the intrinsic dimensionality (ID) of the dataset is the minimum number of dimensions that allows 1 Introduction for such a representation [6]. There exist many Nearest neighbor (NN) search is a key primitive in explicit ways of nd good embeddings for a given many computer science applications, such as data dataset. For example, the Johnson-Lindenstrauss mining, machine learning and image processing. transformation [11] allows us to embed n data points For example, Spring and Shrivastava very recently in Rd into ((log n)="2) dimensions such that all showed in [18] how nearest neighbor search methods pairwise distances are preserved up to a (1+") factor can yield large speed-ups when training neural with high probability. Another classical embedding network models. In this paper, we study the often employed in practice is given by principal classical k-NN problem. Given a dataset S Rd, component analysis (PCA). the task is to build an index on S to support the In this paper, we set our focus on the \local infollowing type of query: For a query point x 2 Rd, trinsic dimensionality" (LID), a measure introduced return the k closest points in S under some distance by Houle in [6]. We defer a detailed discussion of measure D. this measure and its estimation to Section 2. InIn many practical settings, a dataset consists tuitively, the LID of a data point x at a distance of points represented as high-dimensional vectors. threshold r &gt; 0 measures how di cult it is to disFor example, word representations generated by the tinguish between points at distance r and distance glove algorithm [17] associate with each word in a (1 + ")r in a dataset. Most importantly for this study, LID is a local measure that can be associated with a single query. It was stated in [7] that the LID</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>might serve as a characterization of the di culty of For a given query, they either nd all true nearest
k-NN queries. One purpose of this paper is to shed neighbors or not a single one. This behavior is
light on this statement. not shared by the two other approaches that we</p>
      <p>A focus of this paper is an empirical study of consider; both yield a continuous transition from
how the LID in uences the performance of NN \ nding no nearest neighbors" to \ nding all nearest
algorithms. To be precise, we will benchmark neighbors".
four di erent implementations [12] which employ Finally, we will challenge two more common
di erent approaches to NN search. Three of them benchmarking approaches to NN search. First, we
(HNSW [15], FAISS-IVF [10], Annoy [3]) stood out as will empirically observe how using small query sets
most performant in the empirical study conducted (less than 1% of the dataset) a ects the robustness
by Aumuller et al. in [2]. Another one (ONNG) of measurements. Second, we will discuss how
diwas proposed very recently [8] and shown to be verse results on the considered real-world datasets
competitive to the former approaches there. We are. Ideally, we want to benchmark on a collection
base our experiments on [2] and describe their of \interesting" datasets that show the strengths
benchmarking approach and the changes we made and weaknesses of individual approaches. We will
to their system in Section 3. We analyze the LID conclude that there is little diversity among the
distribution of real-world datasets in Section 4. We considered datasets: While the individual
perforwill see that there is a substantial di erence between mance observations change from dataset to dataset,
the LID distributions among these datasets. We the relative performance between implementations
will then conduct two experiments: First, we x a stays the same.
dataset and choose as query set the set of points
with smallest, medium, and largest estimated LIDs. 1.1 Our Contributions The main
contribuAs we will see, the larger the LID, the more di cult tions of this paper are
the query for all implementations. Next, we will
study how the di erent LID distributions between a detailed evaluation of the LID distribution of
datasets in uence the running time performance. many real-world datasets used in benchmarking
In a nutshell, it cannot be concluded that LID by frameworks,
itself is a good indicator for the relative performance an evaluation of the in uence of the LID to the
of a xed implementation over datasets. These performance of NN search implementations,
statements will be made precise in the evaluation
that is discussed in Section 5. considerations about the diversity of results</p>
      <p>In the rst part of our evaluation, we will over many di erent datasets, and
work in the \classical evaluation setting of nearest an exploration of di erent visualization
techneighbor search". This means that we relate niques that shed light on individual properties
a performance measure (such as the achieved of certain implementation principles.
throughput measured in queries per second) to a
quality measure (such as the average fraction of true 1.2 Related Work on Benchmarking
Framenearest neighbors found over all queries). While works for NN We use the benchmarking system
this is the most commonly employed evaluation described in [2] as the starting point for our study.
method, we will reason that this way of representing Di erent approaches to benchmarking nearest
neighresults in fact hides interesting details about the bor search are described in [4, 5, 14]. We refer to [2]
inner workings of an implementation. Using non- for a detailed comparison between the frameworks.
traditional visualization techniques will provide
new insights into their query behavior on real- 2 Local Intrinsic Dimensionality
world datasets. As one example, we will see We consider a distance-space (Rd; D) with a
disthat reporting average recall on the graph-based tance function D : Rd Rd ! R. As described in [1],
approaches from [15, 8] hides an important detail: we consider the distribution of distances within this
space with respect to a reference point x. Such a dis- building) and search (query) methods with certain
tribution is induced by sampling n points from the parameter choices. Implementations are tested on
space Rd under a certain probability distribution. a large set of parameters usually provided by the
We let F : R ! [0; 1] be the cumulative distribution original authors of an implementation. The answers
function of distances to the reference point x. to queries are recorded as the indices of the points
returned. Ann-benchmarks stores these parameters
De nition 2.1 ([6]). The local continuous intrinsic together with further statistics such as individual
dimension of F at distance r is given by query times, index size, and auxiliary information
ln(F ((1 + ")r)=F (r)) provided by the implementation. We refer to [2] for
IDF (r) = "li!m0 ln((1 + ")r=r) ; more details.</p>
      <p>Compared to the system described in [2], we
whenever this limit exists. added tools to estimate the LID based on (2.1),</p>
      <p>The measure relates the increase in distance pick \challenging query sets" according to the LID
to the increase in probability mass (the fraction of individual points, and added new datasets and
imof points that are within the ball of radius r and plementations. Moreover, we implemented a
mech(1 + ")r around the query point). Intuitively, the anism that allows an implementation to provide
larger the LID, the more di cult it is to distinguish further query statistics after answering a query. To
true nearest neighbors at distance r from the rest showcase this feature, all implementations
considof the dataset. As described in [7], in the context ered in this study report the number of distance
of k-NN search we set r as the distance of the k-th computations performed to answer a query.1
nearest neighbor to the reference point x.</p>
      <p>Estimating LID We use the Maximum- 4 Algorithms and Datasets
Likelihood estimator (MLE) described in [13, 1] 4.1 Algorithms Nearest neighbor search
algoto estimate the LID of x at distance r. Let rithms for high dimensions are usually graph-, tree-,
r1 : : : rk be the sequence of distances of the or hashing-based. We refer the reader to [2] for
k-NN of x. The MLE I^Dx is then an overview over these principles and available
implementations. In this study, we concentrate on
(2.1) I^Dx = k1 Xi=k1 ln rrki ! 1 : ftFohAreImStaShn-rtIeVeiFni
m[[12p0],l]e(mnIaeVmnFteaflrytoiomHnNsSnWocow[n1so5in]d,)e.ArenWdnoemyco[os3nt]siapdneerdrAmsaleg et al. showed in [1] that MLE estimates the very recent graph-based approach ONNG [8] in
the LID well. this study as well.</p>
      <p>HNSW and ONNG are graph-based approaches.
3 Overview over the Benchmarking This means that they build a k-NN graph during</p>
      <p>Framework the preprocessing step. In this graph, each vertex
We use the ann-benchmarks system described is a data point and a directed edge (u; v) means
in [2] to conduct our experimental study. Ann- that the data point associated with v is \close" to
benchmarks is a framework for benchmarking NN the data point associated with u in the dataset.
search algorithms. It covers dataset creation, At query time, the graph is traversed to generate
performing the actual experiment, and storing candidate points. Algorithms di er in details of the
the results of these experiments in a transparent graph construction, how they build a navigation
and easy-to-share way. Moreover, results can be structure on top of the graph, and how the graph
explored through various plotting functionalities, is traversed.
e.g., by creating a website containing interactive Annoy is an implementation of a random
projecplots for all experimental runs.</p>
      <p>Ann-benchmarks interfaces with a NN search 1We thank the authors of the implementations for their help
implementation by calling its preprocess (index and responsiveness in adding this feature to their library.</p>
    </sec>
    <sec id="sec-2">
      <title>Dimensions</title>
      <p>avg(LID)
median(LID)</p>
    </sec>
    <sec id="sec-3">
      <title>Metric</title>
      <p>tion forest, which is a collection of random
projection trees. Each node in a tree is associated with a
set of data points. It splits these points into two
subsets according to a chosen hyperplane. If the dataset
in a node is small enough, it is stored directly and
the node is a leaf. Annoy employs a data-dependent
splitting mechanism in which a splitting hyperplane
is chosen as the one splitting two \average points"
by repeatedly sampling dataset points. In the query
phase, trees are traversed using a priority queue
until a prede ned number of points is found.</p>
      <p>IVF builds an inverted le based on clustering
the dataset around a prede ned number of centroids.</p>
      <p>It splits the dataset based on these centroids by
associating each point with its closest centroid.</p>
      <p>During query it nds the closest centroids and Figure 1: LID distribution for each dataset. Ticks
checks points in the dataset associated with those. below the distribution curves represent single</p>
      <p>We remark that both IVF and HNSW implemen- queries. Lines within each distribution curve
corretations are the ones provided with FAISS2. spond to the 25, 50 and 75 percentile. The red line
marks the 10 000 largest estimated LID, which we
use as a threshold value to de ne hard query sets.
4.2 Datasets Table 1 presents an overview over
the datasets that we consider in this study. We
restrict our attention to datasets that are usually
used in connection with Euclidean distance and k = 1 000. While the datasets di er widely in their
Angular/Cosine distance. For each dataset, we original dimensionality, the median LID ranges from
compute the LID distribution with respect to the around 13 for MNIST to about 23 for GLOVE-2M.
100-NN as discussed in Section 2. SIFT, MNIST, and The distribution of LID values is asymmetric and
GLOVE are among the most-widely used datasets for shows a long tail behavior. MNIST, Fashion-MNIST,
benchmarking nearest neighbor search algorithms. SIFT, and GNEWS are much more concentrated
Fashion-MNIST is considered as a replacement for around the median compared to the two
gloveMNIST, which is usually considered too easy for based datasets.
machine learning tasks [19].</p>
      <p>Figure 1 provides a visual representation of 5
the estimated LID distribution of each dataset, for</p>
    </sec>
    <sec id="sec-4">
      <title>Evaluation</title>
      <p>This section reports on the results of our
experiments. Due to space constraints, we only present
some selected results. More results can be explored
2https://github.com/facebookresearch/faiss
via interactive plots at http://ann-benchmarks. Choosing Query Sets All runs are based on
com/edml19/, which also contains a link to the 10 000 queries chosen from the dataset. Depending
source code repository. For a xed implementation, on the question we try to answer, we employ
the plots presented here consider the pareto frontier di erent selection strategies for choosing the query
over all parameter choices [2]. Tested parameter set.
choices and the associated plots are available on the To answer Q1 we employ random splits as done
website. in the cross-validation technique: 10 000 randomly</p>
      <p>Experimental Setup Experiments were run selected points are used as queries, while the rest
on 2x 14-core Intel Xeon E5-2690v4 (2.60GHz) with are used as data points. Repeating the procedure
512GB of RAM using Ubuntu 16.10 with kernel 10 times yields 10 queryset/dataset pairs.
4.4.0. Index building was multi-threaded, queries For the discussion of Q2{Q5 we select three
where answered sequentially in a single thread. di erent querysets from each dataset: (i) The</p>
      <p>Quality and Performance Metrics As qual- dataset that contains the 10 000 points with the
ity metric we measure the individual recall of each lowest estimated LID (which we denote easy ), 10 000
query, i.e., the fraction of points reported by the points around the data point with median estimated
implementation that are among the true k-NN. As LID (denoted medium), and the 10 000 points with
performance metric, we record individual query the largest estimated LID (dubbed hard ). Figure 1
times and the total number of distance computa- marks with a red line the LID used as a threshold
tions needed to answer all queries. We usually to build the hard queryset.
report on the throughput (the average number of
queries that can be answered in one second, in the 5.1 Robustness of Random Split Figure 2
plots denoted as QPS for queries per second ), but shows the result of ten cross-validation runs on
we will also inspect individual query times. random splits of GLOVE for ONNG and Annoy. The</p>
      <p>Objectives of the Experiments Our experi- plot relates the average number of queries that can
ments are tailored to answer the following questions: be answered in one second to the average recall
achieved with a certain parameter con guration.
(Q1) How robust are measurements when splitting This means that we want data points to be up and
query set and dataset at random multiple to the right. The plot shows that the 10 splits give
times? (Section 5.1) very similar results and represent the performance
of the implementation accurately. It is interesting
(Q2) How does the LID of a query set in uence the to see that a query set representing less than 1%
running time performance? (Section 5.2) of the dataset points shows such a robust behavior.</p>
      <p>We conclude that the random split allows for robust
measurements.
(Q3) How diverse are measurements obtained on
datasets? Do relative di erences between the
performance of di erent implementations stay 5.2 In uence of LID on Performance
Figthe same over multiple datasets? (Section 5.3) ure 3 shows results for the in uence of using only
points with low, middle, and large estimated LID
(Q4) How well does the number of distance compu- as query points, in SIFT and GLOVE-2M. We
obtations re ect the relative running time per- serve a clear in uence of the LID of the query set
formance of the tested implementations? (Sec- on the performance: the larger the LID, the more
tion 5.3) down and to the left the graphs move. This means
that, for higher LID, it is more expensive, in terms
(Q5) How concentrated are quality and performance of time, to answer queries with good recall. For
measures around their mean for the tested all datasets except GLOVE-2M, all implementations
implementations? (Section 5.4) were still able to achieve close to perfect recall with
the parameters set. This means that all but one</p>
      <p>105
) 104
s
(/1 103
SP 102
Q 101</p>
      <p>105
) 104
s
(/1 103
SP 102
Q 101
105
104
)
s
/
(1S 103
P
Q
102
101
0
of the tested datasets do not contain too many
\noisy queries". Already the queries around the
median prove challenging for most implementations.</p>
      <p>For the most di cult queries (according to LID),
only IVF and ONNG achieve close to perfect recall on
GLOVE-2M.</p>
      <p>Figure 4 reports on the results of ONNG and
Annoy on selected datasets. Comparing results to
the LID measurements depicted in Figure 1, the
estimated median LID gives a good estimate on the
relative performance of the algorithms on the data
sets. As an exception, SIFT (M) is much easier than
predicted by its LID distribution. In particular for
Annoy, the hard SIFT instance is as challenging as
the medium GLOVE version. The easy version of
GLOVE-2M turns out to be e ciently solvable by
both implementations (taking about the same time
as it takes to answer the hard instance of
FashionMNIST, which has a much higher LID). From this,
we cannot conclude that LID as a single indicator
explains performance di erences of an implementation Figure 5: Ranking of algorithms on ve di erent
on di erent datasets. However, more careful experi- datasets, according to recall 0:75 and 0:9, and
mentation is need before drawing a nal conclusion. according to two di erent performance measures:
In our setting, the LID estimation is conducted number of queries per second (top) and number of
for k = 1 000, while queries are only searching for distance computations (bottom). Both plots report
the 10 nearest neighbors. Moreover, the estimation the ratio with the best performing algorithm on
using MLE might not be accurate enough on these each dataset: for the queries per second metric a
datasets. We leave the investigation of these two larger ratio is better, for the number of distance
directions as future work. computations metric, a smaller ratio is better.</p>
      <p>As a side note, we remark that Fashion-MNIST
is as di cult to solve as MNIST for all implemen- with the best performing algorithm on that dataset,
tations, and is by far the easiest dataset for all therefore the best performer is reported with ratio
implementations. Thus, while there is a big di er- 1. Considering di erent dataset, we see that there
ence in the di culty of solving the classi cation is little variation in the ranking of the algorithms.
task [19], there is no measurable di erence between Only the two graph-based approaches trade ranks,
these two datasets in the context of NN search. all other rankings are stable. Interestingly, Annoy
makes much fewer distance computations but is
5.3 Diversity of Results Figure 5 gives an consistently outperformed by IVF.3
overview over how algorithms compare to each Comparing the number of distance
computaother among all \medium di culty" datasets. We tions to running time performance, we see that an
consider two metrics, namely the number of queries increase in the number of distance computations
per second (top plot), and the number of distance is not re ected in a proportional decrease in the
computations (bottom plot). For two di erent number of queries per second. This means that the
average recall thresholds (0.75 and 0.9) we then
select, for each algorithm, the best performing 3We note that IVF counts the initial comparisons to nd the
parameter con guration that attains at least that closest centroids as distance computations, whereas Annoy did not
recall. For each dataset, the plots report the ratio cpolaunnttothreepinornteronpruopdduacttedcormespuulttastiinontsheduwroinrkgshtroepe. traversal. We
candidate set generation is in general more expen- haps surprisingly, for graph-based algorithms this
sive for graph-based approaches, but the resulting shift is very sudden: most queries go from having
candidate set is of much higher quality and fewer recall 0 to having recall 1, taking no intermediate
distance computations have to be carried out. Gen- values. Taking the average recall as a performance
erally, both graph-based algorithms are within a metric is convenient in that it is a single number
factor 2 from each other, whereas the other two to compare algorithms with. However, the same
need much larger candidate lists to achieve a cer- average recall can be attained with very di erent
tain recall. The relative di erence usually ranges distributions: looking at such distributions can
profrom 5x to 30x more distance computations for the vide more insight.
non-graph based approaches, in particular at high For the plot on the right, we observe that
recall. This translates well into the performance individual query times of Annoy, IVF, and ONNG
di erences we see in this setting: consider for in- are well concentrated around their mean. However,
stance Figure 3, where the lines corresponding to for HNSW query times uctuate widely around their
HNSW and ONNG upper bound the lines relative to mean; in fact, the average query time is not well
the other two algorithms. re ected in the query time distribution.
For space reasons, we do not report other
param5.4 Reporting the distribution of perfor- eter con gurations and datasets, which nonetheless
mance In the previous sections, we made extensive show similar behaviours.
use of recall/queries per second plots, where each
con guration of each algorithm results in a single 6 Summary
point, namely the average recall and the inverse In this paper we studied the in uence of LID to the
of the average query time. As we shall see in this performance of nearest neighbor search algorithms.
section, concentrating on averages can hide interest- We showed that LID allows to choose query sets
ing information in the context of k-NN queries. In of a wide range of di culty from a given dataset.
fact, not all queries are equally di cult to answer. We also showed how di erent LID distributions
Consider the plots in Figure 6, which report perfor- in uence the running time performance of the
mance of the four algorithms4 on the GLOVE-2M algorithms. In this respect, we could not conclude
dataset, medium di culty. The left plot reports the that the LID alone can predict running time
recall versus the number of queries per second, and di erences well. In particular, SIFT is usually easier
black dots correspond to the averages. Additionally, for the algorithms, while GLOVE's LID distribution
for each con guration, we report the distribution of would predict it to be the easier dataset of the two.
the recall scores: the baseline of each recall curve is We introduced novel visualization techniques
positioned at the corresponding queries per second to show the uncertainty within the answer to a set
performance. Similarly, the right plot reports on of queries, which made it possible to show a clear
the inverse of the individual query times (the aver- di erence between the graph-based algorithms and
age of these is the QPS in the left plot) against the the other approaches.
average recall. In both plots, the best performance We hope that this study initiates the search for
is achieved towards the top-right corner. more diverse datasets, or for theoretical reasoning</p>
      <p>Plotting the distributions, instead of just re- why certain algorithmic principles are generally
porting the averages, uncovers some interesting be- better suited for nearest neighbor search. From
haviour that might otherwise go unnoticed, in par- a more practical side, we would be interested in
ticular with respect to the recall. The average recall seeing whether the LID can be used in the design
gradually shifts towards the right as the e ect of of NN algorithms to guide the search process or
more and more queries achieving good recalls. Per- parameter selection. For example, algorithms could
adapt to the LID of the current set of k-NN found so
4In order not to clutter the plots, we xed parameters as follows: far, stopping early or spending more time depending
IVF | number of lists 8192; Annoy | number of trees 100; HNSW |
efConstruction 500; ONNG | edge 100, outdegree 10, indegree 120. on the LID of the candidates.</p>
      <p>1e+05
HNSW
ONNG
0.9
ONNG
0.0
[18] Spring, R., Shrivastava, A.: Scalable and
sustainable deep learning via randomized hashing. In:</p>
      <p>KDD'17. pp. 445{454 (2017)
[19] Xiao, H., Rasul, K., Vollgraf, R.: Fashion-mnist:
a novel image dataset for benchmarking machine
learning algorithms. CoRR abs/1708.07747 (2017)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Trans. Pattern</given-names>
            <surname>Anal</surname>
          </string-name>
          . Mach. Intell.
          <volume>33</volume>
          (
          <issue>1</issue>
          ),
          <volume>117</volume>
          {
          <fpage>128</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          (
          <year>2011</year>
          ) [1]
          <string-name>
            <surname>Amsaleg</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chelly</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Furon</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Girard</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , [
          <volume>10</volume>
          ]
          <string-name>
            <surname>Johnson</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Douze</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jegou</surname>
          </string-name>
          , H.:
          <article-title>Billion-scale</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Houle</surname>
            ,
            <given-names>M.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kawarabayashi</surname>
            ,
            <given-names>K.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nett</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Esti- similarity search with gpus</article-title>
          .
          <source>CoRR abs/1702.08734</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <article-title>mating local intrinsic dimensionality</article-title>
          . In: KDD'
          <fpage>15</fpage>
          . (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          pp.
          <volume>29</volume>
          {
          <fpage>38</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2015</year>
          ) [11]
          <string-name>
            <surname>Johnson</surname>
          </string-name>
          , W.B.,
          <string-name>
            <surname>Lindenstrauss</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Schechtman, [2] Aumuller,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Bernhardsson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Faithfull</surname>
          </string-name>
          , A.: G.:
          <article-title>Extensions of lipschitz maps into banach</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>ANN-Benchmarks</surname>
          </string-name>
          :
          <article-title>A Benchmarking Tool for spaces</article-title>
          .
          <source>Israel Journal of Mathematics</source>
          <volume>54</volume>
          (
          <issue>2</issue>
          ),
          <volume>129</volume>
          {
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Approximate</given-names>
            <surname>Nearest</surname>
          </string-name>
          <article-title>Neighbor Algorithms</article-title>
          . In:
          <volume>138</volume>
          (
          <year>1986</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <source>SISAP'17</source>
          . pp.
          <volume>34</volume>
          {
          <issue>49</issue>
          (
          <year>2017</year>
          ) [12]
          <string-name>
            <surname>Kriegel</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schubert</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zimek</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <source>The (black)</source>
          [3]
          <string-name>
            <surname>Bernhardsson</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          : Annoy, https://github.com
          <article-title>/ art of runtime evaluation: Are we comparing</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <article-title>spotify/annoy algorithms or implementations? Knowl</article-title>
          . Inf. Syst. [4]
          <string-name>
            <surname>Curtin</surname>
            ,
            <given-names>R.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cline</surname>
            ,
            <given-names>J.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Slagle</surname>
            ,
            <given-names>N.P.</given-names>
          </string-name>
          ,
          <source>March</source>
          ,
          <volume>52</volume>
          (
          <issue>2</issue>
          ),
          <volume>341</volume>
          {
          <fpage>378</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>W.B.</given-names>
            ,
            <surname>Ram</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Mehta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.A.</given-names>
            ,
            <surname>Gray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.G.</given-names>
            : ML- [13]
            <surname>Levina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Bickel</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.J.</surname>
          </string-name>
          : Maximum likelihood
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>PACK: A scalable C+</surname>
          </string-name>
          <article-title>+ machine learning library. estimation of intrinsic dimension</article-title>
          .
          <source>In: NIPS'15</source>
          . pp.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <source>J. of Machine Learning Research</source>
          <volume>14</volume>
          ,
          <volume>801</volume>
          {
          <fpage>805</fpage>
          (
          <year>2013</year>
          )
          <volume>777</volume>
          {
          <fpage>784</fpage>
          (
          <year>2005</year>
          ) [5]
          <string-name>
            <surname>Edel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soni</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Curtin</surname>
            ,
            <given-names>R.R.</given-names>
          </string-name>
          : An automatic [
          <volume>14</volume>
          ]
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          , Zhang,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>W.</surname>
          </string-name>
          , Zhang, W.,
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <article-title>benchmarking system</article-title>
          .
          <source>In: NIPS 2014 Workshop on Lin</source>
          ,
          <string-name>
            <surname>X.</surname>
          </string-name>
          :
          <article-title>Approximate nearest neighbor search on</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <article-title>Software Engineering for Machine Learning (</article-title>
          <year>2014</year>
          )
          <article-title>high dimensional data - experiments, analyses</article-title>
          , and [6]
          <string-name>
            <surname>Houle</surname>
            ,
            <given-names>M.E.</given-names>
          </string-name>
          : Dimensionality, discriminability,
          <source>improvement (v1.0)</source>
          .
          <source>CoRR abs/1610</source>
          .02455 (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <article-title>density and distance distributions</article-title>
          .
          <source>In: Data Mining</source>
          [
          <volume>15</volume>
          ]
          <string-name>
            <surname>Malkov</surname>
            ,
            <given-names>Y.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yashunin</surname>
            ,
            <given-names>D.A.</given-names>
          </string-name>
          : E cient and
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>Workshops</surname>
          </string-name>
          (ICDMW). pp.
          <volume>468</volume>
          {
          <fpage>473</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2013</year>
          )
          <article-title>robust approximate nearest neighbor search using [7]</article-title>
          <string-name>
            <surname>Houle</surname>
            ,
            <given-names>M.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schubert</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zimek</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>On the Hierarchical Navigable Small World graphs</article-title>
          . ArXiv
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <article-title>correlation between local intrinsic dimensionality e-prints (</article-title>
          <year>Mar 2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <article-title>and outlierness</article-title>
          .
          <source>In: SISAP'18</source>
          . pp.
          <volume>177</volume>
          {
          <issue>191</issue>
          (
          <year>2018</year>
          ) [16]
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          : [8]
          <string-name>
            <surname>Iwasaki</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miyazaki</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Optimization of In- E cient estimation of word representations in</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <article-title>dexing Based on k-Nearest Neighbor Graph for vector space</article-title>
          .
          <source>CoRR abs/1301</source>
          .3781 (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <article-title>Proximity Search in High-dimensional Data</article-title>
          .
          <source>ArXiv</source>
          [17]
          <string-name>
            <surname>Pennington</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
          </string-name>
          , C.D.: Glove:
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          e-prints (
          <year>Oct 2018</year>
          )
          <article-title>Global vectors for word representation</article-title>
          . In: Em[9]
          <string-name>
            <surname>Jegou</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Douze</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmid</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Product pirical Methods in Natural Language Processing</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <article-title>quantization for nearest neighbor search</article-title>
          .
          <source>IEEE (EMNLP)</source>
          . pp.
          <volume>1532</volume>
          {
          <issue>1543</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>