ScaDANN: A Scalable Disk-Based Graph Indexing Method for ANN

ScaDANN: A Scalable Disk-Based Graph Indexing Method for ANN HyeinGu hyein99@kaist.ac.kr Korea Advanced Institute of Science and Technology

Republic of Korea

Min-SooKim minsoo.k@kaist.ac.kr Korea Advanced Institute of Science and Technology

Republic of Korea

ScaDANN: A Scalable Disk-Based Graph Indexing Method for ANN 1613-0073 3E6B94EDF3C29A4F6BF0A2B494801A4F GROBID - A machine learning software for extracting information from scholarly documents Graph ANN Nearest neighbor search Vector search

Approximate Nearest Neighbor (ANN) indexing constitutes a fundamental component of modern vector-based databases, facilitating efficient and accurate information retrieval for applications such as retrieval-augmented generation. However, scaling graph-based ANN indices to billion-scale datasets poses substantial challenges, including high memory demands and inefficiencies in handling partitioned graphs. To address these issues, we propose ScaDANN, a scalable disk-based graph indexing method tailored for large-scale datasets under limited memory conditions. ScaDANN introduces two novel techniques: overlapping block-level insertion and grid block merge, which enable the efficient construction of unified graph indices while preserving high search performance. Our approach achieves notable advancements in index construction speed, search accuracy, and memory efficiency, establishing ScaDANN as a robust and effective solution for scalable ANN indexing in resource-limited environments.

Introduction

Large Language Models (LLMs) have revolutionized natural language processing by excelling in tasks such as text generation [1], summarization [2], and question answering [3]. Their ability to leverage pre-trained knowledge has facilitated applications across diverse domains, including healthcare [4], education [5], and customer service [6]. However, despite their strengths, LLMs face critical limitations, such as hallucination [7], where they generate plausible but incorrect outputs, outdated knowledge due to static training data, and privacy concerns when processing sensitive information [8].

Retrieval-Augmented Generation (RAG) [9] has been introduced as a promising framework to mitigate these limitations by integrating LLMs with external knowledge retrieval systems. By leveraging vector databases (VectorDBs) [10], RAG retrieves relevant and up-to-date information to enhance the accuracy and reliability of generated responses. This framework not only reduces the impact of hallucination but also addresses privacy concerns by accessing only the information necessary for a given query.

Approximate Nearest Neighbor (ANN) indexing serves as a foundational component in RAG systems, enabling efficient information retrieval from VectorDBs. ANN indexes can be broadly categorized into clustering-based [11,12,13], hash-based [14,15,16], and graph-based methods [17,18,19,20,21]. Clustering-based methods partition datasets into smaller clusters, facilitating memory-efficient index construction but often incurring longer construction times and reduced search accuracy. Hash-based methods employ hash functions to compress datasets and reduce dimensionality, which accelerates construction and search processes but at the cost of reduced accuracy due to compression. Graph-based methods, in contrast, represent vectors as nodes and their relationships as edges, connecting nodes based on proximity [22]. These methods construct neighbor lists through vector operations between nodes (as illustrated in Figure 1) and are particularly effective for large-scale datasets, offering superior search accuracy and performance despite higher memory demands during construction.

The need for scalable ANN indexing becomes increasingly apparent as data is segmented into smaller chunks to meet the input constraints of LLMs. For instance, dividing a 100 MB document into 4 KB chunks suitable for LLM input results in approximately 25,600 chunks-a 25,600-fold increase in data chunks relative to the original document count. This exponential growth in the number of chunks significantly increases the size of the ANN index, necessitating the development of methods capable of efficiently handling billion-scale datasets.

Building large-scale ANN indexes, however, poses significant challenges [23,24]. First, the substantial size of datasets and indexes requires considerable memory resources. Second, achieving competitive search performance on disk-based systems, which are necessary for handling such datasets, is particularly challenging. To address these issues, we propose ScaDANN, a Scalable Disk-based Graph Indexing Method for ANN, specifically designed to overcome these limitations.

This study pursues two primary objectives: (1) to enable the construction of large-scale ANN indexes in memoryconstrained environments and (2) to enhance search performance through the use of merged graph structures that improve efficiency and accuracy during the search process. By focusing on these goals, we provide a scalable and efficient solution for constructing and querying ANN indexes in scenarios where both large-scale datasets and memory limitations are critical challenges.

This work makes the following key contributions:

• We propose a graph-based ANN method that leverages grid blocks to efficiently construct and scale billion-scale indexes in memory-constrained environments. • Our approach outperforms existing disk-based methods, achieving index construction speeds 1.3 to 1.5 times faster and search speeds 1.2 to 1.4 times faster, while maintaining comparable accuracy. • We demonstrate that our method enables the construction of single indexes up to 10 times larger than those of existing approaches within the same memory constraints.

These contributions mark a significant advancement in ANN indexing, particularly in terms of efficiency and scalability for large-scale, memory-constrained applications.

Preliminaries

ANN search

Nearest Neighbor (NN) search identifies the data point(s) in a dataset closest to a given query point based on a defined distance metric, such as Euclidean distance or cosine similarity. While exact NN search computes the distance between the query and every data point to determine the nearest neighbor, this approach becomes computationally infeasible for large datasets. ANN search addresses this challenge by finding points that are approximately close to the true nearest neighbors, significantly reducing computational costs. Although ANN sacrifices some accuracy, it delivers faster query response times, which is particularly advantageous in high-dimensional and large-scale datasets.

Definition 1 (ANN).

Given a dataset 𝐷 = {𝑥 1 , 𝑥 2 , … , 𝑥 𝑛 } and a query point 𝑞, the task of nearest neighbor search (whether exact or approximate) is to find the point 𝑥 𝑁 𝑁 ∈ 𝐷 such that the distance 𝑑(𝑞, 𝑥 𝑁 𝑁 ) is minimized, where 𝑑(⋅, ⋅) represents a distance metric. In the case of ANN search, the algorithm guarantees that the returned point 𝑥 * satisfies:

𝑑(𝑞, 𝑥 * ) ≤ (1 + 𝜖) ⋅ 𝑑(𝑞, 𝑥 𝑁 𝑁 ),

where 𝜖 is a small approximation factor, and 𝑥 𝑁 𝑁 is the exact nearest neighbor of the query point 𝑞.

To evaluate ANN algorithms, two primary metrics are used: Recall and Queries Per Second (QPS). Recall quantifies the effectiveness of an algorithm in retrieving true nearest neighbors, defined as the ratio of true nearest neighbors retrieved to the total number of true nearest neighbors. Higher recall indicates greater accuracy but may increase search time. QPS measures the system's throughput, i.e., the number of queries processed per second. Higher QPS reflects greater efficiency, which is critical for large-scale and real-time applications.

Existing graph ANN indexing methods

Vamana graph

Vamana [25] is a widely used graph-based ANN indexing algorithm and serves as the foundational graph structure in many recent ANN studies. Its construction is based on two key techniques: Greedy Search and Robust Pruning.

Greedy Search enables efficient exploration of nearest neighbors within the graph. Beginning from an initial node, the algorithm iteratively identifies and prioritizes nodes closest to the query. It maintains a priority queue of visited nodes, extracting the nearest unvisited node at each step, adding its neighbors to the queue, and repeating this process until all relevant nodes are explored.

𝛼 ⋅ 𝑑(𝑝 * , 𝑝 ′ ) ≤ 𝑑(𝑝, 𝑝 ′ ) (𝛼 > 1)

Robust Pruning optimizes the graph by removing unnecessary edges discovered during Greedy Search. This technique utilizes the Sparse Neighborhood Graph (SNG) property [26] and applies a distance threshold parameter, alpha (𝛼), to determine whether an edge should be retained.

Specifically, an edge between nodes 𝑃 and 𝑃 ′ is removed if the distance between these nodes exceeds the distance between 𝑃 ′ and a third node 𝑃 * . By retaining longer edges under relaxed conditions, Vamana enhances search efficiency while maintaining graph connectivity.

DiskANN

DiskANN [25] extends the Vamana graph to a disk-based environment, making it well-suited for large-scale datasets. It employs a divide-and-conquer approach, partitioning the dataset into subgraphs and merging them into a unified structure.

Figure 1 illustrates the partitioning and merging process of DiskANN. (1) The dataset is first divided into multiple overlapping partitions using a clustering technique, with each node assigned to at least two partitions. (2) Independent subgraphs are constructed within each partition. (3) These subgraphs are then merged by randomly selecting nodes from the partitioned graphs, followed by merging and pruning operations. This process updates the neighbor lists with merged results, ensuring connectivity between partitions while reducing memory and computational costs. DiskANN's partitioning and merging strategy enables scalable graph construction but introduces challenges in maintaining search performance across partitions.

Batch insertion for graph construction

Efficient parallelization is essential for graph construction in large-scale datasets. Batch insertion, as introduced in Par-layANN [27], addresses the challenge of concurrently adding nodes to a graph while ensuring conflict-free operations. Conflicts are avoided by grouping nodes into independent batches and processing each batch in isolation, ensuring that nodes within a batch do not interfere with each other during insertion.

The batch insertion process occurs in two steps. First, nodes in a batch are independently connected to their nearest neighbors within the existing graph. These connections are established in parallel, as nodes within a batch do not share dependencies, preventing conflicts during edge creation. This step ensures that all newly added nodes are integrated into the graph without disrupting its structure. Second, reversed edges are added to the graph. A reversed

𝑣 ! 𝑣"# 𝑣 $ 𝑣 " 𝑣 # 𝑣 % 𝑣& 𝑣 ' 𝑣( 𝑣 "" 𝑣 ) 𝑣 * P1 P2 P3

Partition into

(1) Partitioning based on overlapping clustering

𝑣! 𝑣"# 𝑣" 𝑣$ 𝑣# 𝑣% 𝑣& 𝑣' 𝑣! 𝑣" 𝑣$ 𝑣& 𝑣( 𝑣) 𝑣"# 𝑣* 𝑣 " 𝑣 #$ 𝑣 # 𝑣 % 𝑣 $ 𝑣 & 𝑣 ' 𝑣 ( 𝑣 ) 𝑣 ## 𝑣 * 𝑣 + 6 5 4 3 2 1 𝑣 𝑣 ) 𝑣 & 𝑣 " 𝑣 * 𝑣 #$ 𝑣 # 𝑣 $ 𝑣 $ 𝑣 ' 𝑣 ) 𝑣 " 𝑣 `# … 𝑣 & 𝑣 " 𝑣 $ 𝑣 #$ 𝑣 ( 𝑣 * 𝑣 + 𝑣 ) 𝑣 ## (2) Sub-graph construction (3) Merging 𝑣"# 𝑣# 𝑣% 𝑣( 𝑣"" 𝑣' 𝑣* 𝑣)

* Two overlapping clusters * Each cluster size: 8

Random shuffle and Prune

Neighbor list

Main memory𝑣! 𝑣 " 𝑣 # 𝑣 $ 𝑣 % 𝑣 & 𝑣 ' 𝑣(

Step 1. Build graph with block 0 and 1

𝑣 ! 𝑣!" 𝑣 % 𝑣 ' 𝑣) 𝑣!! 𝑣 ( 𝑣# 𝑣 ! 𝑣 "# 𝑣 " 𝑣 # 𝑣 $ 𝑣 % 𝑣 & 𝑣 ' 𝑣 ( 𝑣 "" 𝑣 ) 𝑣 *

Step 2. Build graph with block 1 and 2

Step By separating the insertion of neighbors and the addition of reversed edges into distinct steps, batch insertion guarantees structural consistency while leveraging parallel processing. This approach is particularly effective in large-scale datasets, as it minimizes computational overhead while maintaining the integrity and quality of the graph. Moreover, the absence of interdependencies between batches ensures that the graph remains stable throughout the construction process, enabling efficient and scalable graph-based ANN indexing.

ScaDANN Method

Grid partitioning

Figure 3 illustrates the structure of Graph ANN using a grid format, where each node in the graph consists of a base vector and a neighbor list. Grid partitioning divides large-scale datasets into grid blocks, with rows representing source vertices and columns representing destination vertices. This approach organizes datasets into manageable segments, enabling memory-and disk-efficient graph construction. For example, constructing a Vamana graph for a 1-billion-point dataset with 100 dimensions, requiring 400GB for vector storage, is infeasible under memory constraints of 256GB or 512GB. However, grid partitioning ensures scalability by allowing the dataset to be partitioned into grid blocks, making it feasible to construct the graph even when the dataset size exceeds the available memory.

Edges from p2 to p1

Grid Base vector𝑣 & 𝑣 !# 𝑣 ! 𝑣+ 𝑣 # 𝑣, 𝑣 $ 𝑣 - 𝑣 % 𝑣 !! 𝑣 " 𝑣' 𝑣 # 𝑣! 𝑣 !! … … … Neighbor list

Each grid block captures intra-and inter-partition relationships. A grid block where the source and destination partitions are the same forms a subgraph, which follows the Vamana graph structure. Diagonal blocks represent subgraphs within individual partitions, while off-diagonal blocks denote connections between partitions. For example, the red block (𝑝2, 𝑝1) in Figure 3 corresponds to connections from partition 𝑝2 (green) to partition 𝑝1 (blue). This structure improves scalability and ensures conflict-free parallel processing during node insertion.

Overlapping block-level insertion

The overlapping block-level insertion method incrementally constructs the graph by processing data in grid block units. This method employs overlapping blocks, where a block from the previous step is reused in the next step, allowing two blocks to be processed simultaneously. As detailed in Algorithm 1, the process begins with the construction of the graph for the first partition (𝐵 1 = 0). In this step(line 3), specifically generates the subgraph for the first partition, ensuring that the intra-partition edges are established within the initial grid block.

In subsequent steps, pairs of overlapping blocks are processed to extend the graph. For each step, the data corresponding to the current blocks is loaded into memory to construct the graph (lines 9-11). Particularly, line 11 involves filling four grid blocks by constructing the subgraphs for two partitions and establishing connections between them. This ensures that intra-and inter-partition edges are created simultaneously, improving efficiency and maintaining the global structure of the graph. Once the graph is constructed, the earlier block is offloaded from memory (lines 12-15), and the next block is loaded (lines 5-10).

By reusing overlapping blocks during the insertion process, ScaDANN integrates graph construction and merging into a single operation, eliminating the need for a separate, computationally expensive merging step. This approach ensures that inter-block connections are captured as the graph is built, thereby avoiding the creation of disjoint subgraphs. Additionally, the overlapping mechanism significantly reduces I/O overhead by combining insertion and merging, which are traditionally distinct steps.

Figure 2 illustrates the construction process. Initially, blocks 0 and 1 are loaded into memory, and edges between the two blocks are created. In the second step, block 0 is offloaded, block 1 is retained, and block 2 is loaded. The graph is extended based on connections between blocks 1 and 2. Finally, in the third step, block 1 is offloaded, block 2 is retained, and block 0 is reloaded to extend the graph further by connecting blocks 2 and 0. This iterative process efficiently captures all relevant connections between blocks, enabling the construction of a unified graph while minimizing memory usage and I/O costs.

Grid block merge

The grid block merge process establishes connections between two blocks loaded into memory and ensures bidirectional connectivity through reverse edges. Reverse edges, where the source and destination vertices are swapped, maintain symmetric inter-block connections, ensuring consistent and efficient search performance.

In conventional methods, merging requires all data to be loaded into memory to calculate distances between nodes. ScaDANN overcomes this limitation by storing distance information directly within the neighbor list of each node. This pre-stored distance means that block 1 retains the neighbor list formed in step 1. For example, after step 1, 𝑣 5 in block 1 (green) has already stored 𝑣 3 and 𝑣 2 from block 0 (yellow) as its neighbors. This allows distance comparisons between existing and new neighbors without reloading offloaded data, reducing memory usage and improving scalability.

Figure 4 illustrates the merging process during step 2 in Figure 2, where blocks 1 and 2 are loaded into memory while block 0 remains stored on disk. When updating the neighbor list of 𝑣 5 , the algorithm uses pre-stored distance data from step 1 to compare block 0 neighbors with block 2 neighbors, avoiding redundant distance calculations. As a result, 𝑣 5 's neighbors are updated to 𝑣 The merging process follows the Vamana graph construction method, incorporating a greedy search to identify candidate neighbors and a robust pruning step to refine the neighbor list. Inter-block edges are established by adding neighbors from the second block to the neighbor list of the first block. The reverse edges are then added, ensuring bidirectional connectivity, where nodes in the first block are also linked as neighbors of nodes in the second block. This comprehensive approach ensures a robust and efficient merging process.

Complexity

Table 1 compares the time and space complexities of Vamana, DiskANN, and ScaDANN in the index-building process, focusing on scalability and efficiency. The space complexity considers storage requirements, including neighbor lists.

Vamana constructs a single large-scale graph, resulting in a time complexity of 𝑂(𝑆 1.16 ) and a space complexity of 𝑂(𝑆 ⋅ 𝑅). Since it does not partition the dataset, the entire graph must fit in memory, making it less scalable for large-scale data. Both DiskANN and ScaDANN partition the dataset to construct subgraphs. Partitioning reduces memory usage and enables disk-based processing. The time complexity is inversely proportional to the number of partitions, as increasing 𝑃 allows for more localized graph construction, reducing the computational cost.

In DiskANN, each node is assigned to multiple overlapping partitions (typically 𝐾 = 2 ), as shown in Figure 1. Since DiskANN performs clustering to determine partitions, the build time complexity includes the clustering time ( 𝐶 ). In contrast, ScaDANN utilizes dynamic partitioning, where the data structure size is determined based on the dataset size. This process does not require separate clustering and is treated as an internal computational step, which is why it is not included in the time complexity. Unlike DiskANN, which incurs overhead due to overlapping partitions, ScaDANN eliminates these redundancies, streamlining subgraph construction and merging. This results in a computationally efficient process that scales well with increasing dataset size.

Methods

Build Time Space Vamana 𝑂(𝑆 1.16 ) 𝑂(𝑆 ⋅ 𝑅) DiskANN 𝑂(𝐾 1.16 ⋅ 𝑃 −0.84 ⋅ 𝑆 1.16 + 𝐶) 𝑂(𝑆 ⋅ 𝑅 ⋅ 𝐾 ⋅ 𝑃 −1 ) ScaDANN 𝑂(𝑃 −0.84 ⋅ 𝑆 1.16 ) 𝑂(𝑆 ⋅ 𝑅 ⋅ 𝑃 −1 )

Table 1

Comparison of time and space complexities for Vamana, DiskANN, and ScaDANN, where 𝑆 is the dataset size, 𝑅 is the maximum degree of the graph, 𝐾 is the number of overlapping partitions per node, 𝑃 is the number of partitions, and 𝐶 is the clustering time.

Experiments

Experimental setup

The experiments were conducted to evaluate the performance of ScaDANN under controlled conditions. The hardware environment included two Gold 6326 16-core processors and 1.9 TB of main memory, while the software environment was based on Ubuntu 20.04.6 LTS. To simulate memory-constrained scenarios, the available memory was deliberately restricted to 256 GB during the experiments. Under these constraints, both Vamana and ParlayANN encountered Out of Memory (OOM) errors, making them unsuitable for comparison. As a result, our evaluation focused on comparing ScaDANN with DiskANN, a state-of-the-art disk-based ANN indexing method.

Table 2 provides a summary of the datasets used in the experiments, including their data types and dimensions. The datasets used were BIGANN, Microsoft SPACEV, Yandex DEEP, and Yandex Text-to-Image, each posing unique challenges for ANN indexing. BIGANN consists of 1 billion 128-dimensional SIFT descriptors from a large-scale image dataset, serving as a benchmark for high-dimensional similarity search. Microsoft SPACEV includes 1 billion 100dimensional document and query vectors reflecting realworld web search scenarios. Yandex DEEP comprises 1 billion 96-dimensional image descriptors designed to evaluate deep learning-based image representations. Yandex Text-to-Image contains 1 billion multi-modal embeddings, where image embeddings serve as the database and text embeddings form the query set, representing a cross-modal retrieval task with distinct distributions for database and query vectors.

Dataset

Table 2

Summary of datasets used in the experiments. The bottom graph in Figure 5 shows the memory usage during the index construction process. Both DiskANN and ScaDANN employed dynamic partitioning, which adjusts the number of partitions based on available memory capacity and efficiently divides the data. Since the graph size and memory usage vary depending on the degree of the neighbor list, dynamic partitioning enables optimized partitioning by adapting to these variations. Notably, ScaDANN required a higher number of partitions than DiskANN. This is because ScaDANN stores additional distance information within the graph, which increases memory consumption per block. As a result, ScaDANN divides the data into more partitions than DiskANN to accommodate the additional memory usage. For instance, for the BIGANN dataset, ScaDANN used around 200GB of memory for indexing, while DiskANN remained under the 256GB limit, but required fewer partitions to store its graph.

Results and Analysis

Billion-scale index in limited memory

We conducted a detailed comparison of the index building process with DiskANN, as illustrated in Figure 6. In the subgraph construction phase, our method completed approximately twice as fast as DiskANN, with the BIGANN dataset showing a significant speedup. However, the subgraph merging phase took longer in our approach. This is due to the finer-grained distance comparisons performed during the merging process, which operates at the level of grid blocks. Unlike DiskANN, which uses a more straightforward merging process, ScaDANN's overlapping block approach requires additional computations during the merging phase to ensure accurate distance calculations between partitioned blocks.

In terms of I/O costs, our method demonstrated greater efficiency. This improvement can be attributed to the use of overlapping block techniques, which optimize data loading between memory and disk. Specifically, ScaDANN's approach minimizes unnecessary disk reads and efficiently manages memory, leading to a faster construction process despite the increased memory overhead. This is evident in the I/O time breakdown in Figure 6, where ScaDANN exhibits a more balanced distribution of time between subgraph building, partitioning, and merging, compared to DiskANN, which spends a significant portion of time in I/O operations. The graph compares Recall@10 (x-axis) and QPS (y-axis), where higher placement indicates lower latency at a given recall level. A method positioned higher in the graph achieves faster query processing while maintaining similar accuracy.

Disk-based search performance

Figure 7 presents the comparison of the search performance based on disks between ScaDANN and DiskANN. The experiments were carried out using the disk search method for evaluation. Our proposed ScaDANN method demonstrated better search performance than DiskANN for most datasets in terms of QPS, especially for the BIGANN-1B dataset. As shown in the graph, ScaDANN achieved a higher QPS while maintaining competitive recall rates. For example, ScaDANN processed queries at a faster rate than DiskANN for BIGANN and MSSPACEV-1B, achieving a significant speedup while still maintaining similar or slightly better recall at top-10 results. However, for the DEEP-1B dataset, ScaDANN showed a slight performance drop when Recall@10 exceeded 95. This discrepancy can be attributed to the disk search method used, which is specifically optimized for DiskANN.

Ablation study 4.3.1. Impact of partition size on index construction

Figure 8 illustrates the effects of varying the number of partitions on the BIGANN 100M dataset. Increasing the number of partitions led to a significant increase in building time, with approximately a 70% increase per step. However, RAM usage decreased by around 40% per step, demonstrating the trade-off between memory efficiency and construction time. Despite the changes in partition numbers, there were no significant differences observed in QPS relative to Recall@10 across the tested configurations. This indicates that increasing the partition number primarily affects the construction phase without adversely impacting search performance.

In-memory graph construction experiments

The in-memory graph construction experiments on BIGANN-100M and Yandex Text-to-Image-100M compare ParlayANN, ScaDANN, and DiskANN in terms of indexing time and query performance. Since ParlayANN encounters out-of-memory (OOM) issues at the billion-scale, the experiments were conducted on 100 million data points without partitioning for ScaDANN and DiskANN. ScaDANN achieves indexing speeds comparable to or faster than ParlayANN by storing distances within the graph, reducing redundant computations. Both methods also benefit from batch insertion, leading to faster construction than DiskANN. ScaDANN not only achieves a faster build time but also maintains strong search performance, striking a balance between efficient indexing and high query throughput.

Conclusion

This study proposed ScaDANN, a scalable disk-based graph indexing method designed for billion-scale ANN datasets in memory-constrained environments. By introducing overlapping block-level insertion and grid block merge, ScaDANN effectively mitigates the high memory demands and inefficiencies of existing methods. These techniques enable seamless integration of blocks and efficient merging of partitioned graphs while minimizing I/O overhead and leveraging stored distance information. Experimental results demonstrated that ScaDANN achieves up to 1.5× faster index construction and 1.4× higher query throughput compared to DiskANN, while maintaining competitive accuracy across diverse datasets. Furthermore, ScaDANN supports the construction of single indexes 10 times larger than existing approaches within the same memory constraints, making it a robust solution for large-scale ANN indexing. This work establishes ScaDANN as an efficient and scalable approach, with significant potential for real-world applications requiring high-performance ANN indexing.

Figure 1 :1Figure 1: DiskANN partitioning and merging process.

Figure 3 :3Figure 3: Graph ANN with grid format. The grid format represents the structure of Graph ANN, where each node consists of a base vector and a neighbor list.

Figure 4 :4Figure 4: Grid block merge process with distance information.

Figure 55Figure 5 presents the results of indexing at a billion-scale under a 256GB memory constraint. The top graph in Figure 5 illustrates the index building time across different datasets. ScaDANN significantly reduced the index-building time compared to DiskANN across all datasets. For example, in the BIGANN dataset, ScaDANN completed the indexing task in 10.7 hours, compared to 16.5 hours for DiskANN, achieving approximately a 35% reduction in build time. This efficiency improvement can be attributed to ScaDANN's optimized data partitioning and overlapping block techniques, which minimize disk I/O and improve memory access patterns.The bottom graph in Figure5shows the memory usage during the index construction process. Both DiskANN and ScaDANN employed dynamic partitioning, which adjusts the number of partitions based on available memory capacity and efficiently divides the data. Since the graph size and memory usage vary depending on the degree of the neighbor list, dynamic partitioning enables optimized partitioning by adapting to these variations. Notably, ScaDANN required a

Figure 5 :Figure 6 :56Figure 5: Billion-scale index building experiment results, including building time and memory usage, conducted under a 256GB memory constraint. The memory usage values indicate the number of partitions dynamically determined during the indexing process.

Figure 7 :7Figure7: Disk-based search performance. The graph compares Recall@10 (x-axis) and QPS (y-axis), where higher placement indicates lower latency at a given recall level. A method positioned higher in the graph achieves faster query processing while maintaining similar accuracy.

10 QPS / Recall@ 10 Figure 8 :10108Figure 8: Experimental results of BIGANN 100M index building with varying numbers of partitions.

Figure 9 :9Figure 9: In-memory experimental results of BIGANN 100M and Yandex Text-to-Image 100M. The numbers in parentheses next to each method indicate the index building time in minutes.

3. Build graph with block 2 and 0 Block 1 Block 2ScaDANN building process with three partitions and corresponding grid blocks at each step. The graph is constructed by loading two partitions into memory at a time, ensuring efficient graph generation. In the next step, the previously used partition is overlapped with a new partition to continue the building process.edge is defined as an edge where the source vertex and destination vertex are swapped, ensuring bidirectional connectivity between nodes. For example, if an edge connects node 𝑣 𝑖 (source) to node 𝑣 𝑗 (destination), the reversed edge connects 𝑣 𝑗 back to 𝑣 𝑖 . During this step, the neighbors of each newly added node are updated with reversed edges, completing the bidirectional connections required for robust graph-based indexing.Main memoryMain memoryBlock 2Block 0𝑣 #𝑣"𝑣&𝑣!"𝑣$𝑣)𝑣!!𝑣

*Final graphFigure 2:

first block number 𝐵 1 = 0, second block number 𝐵 2 = 1, block interval 𝑖𝑡𝑣 = 1, neighbor list 𝑁 𝐵𝑅 𝑖 = ∅;Algorithm 1: ScaDANN ConstructionInput: block size 𝑃 and dataset 𝐷Output: Graph/* Initialize first block and interval*//* Read dataset for the first block*/2 𝐷 1 = ReadDataset(𝐵 1 );3 𝑁 𝐵𝑅 1 = vamana(𝐵 1 , 𝑁 𝐵𝑅 1 );4 while 𝑖𝑡𝑣 < 𝑃 do5𝐵 2 ≡ (𝐵 1 + 𝑖𝑡𝑣) (mod 𝑃);/* Increase interval size*/6if 𝐵 1 = 𝐵 2 then7𝑖𝑡𝑣 = 𝑖𝑡𝑣 * 2;8continue/* Read the second dataset and graph */9𝐷 2 = ReadDataset(𝐵 2 );10𝑁 𝐵𝑅 2 = ReadGraph(𝐵 2 );

1 11 𝑁 𝐵𝑅 1 , 𝑁 𝐵𝑅 2 = vamana((𝐷 1 , 𝐷 2 ), (𝑁 𝐵𝑅 1 , 𝑁 𝐵𝑅 2 )); 12 WriteGraph(𝑁 𝐵𝑅 1 , 𝐵 1 ); 13 𝐷 1 = 𝐷 2 ; 14 𝑁 𝐵𝑅 1 = 𝑁 𝐵𝑅 2 ; 15 𝐵 1 = 𝐵 2 ; 16 WriteGraph(𝑁 𝐵𝑅 1 , 𝐵 1 );

11 , 𝑣 3 , and 𝑣 2 .𝑣 !𝑣 "𝑣 $Main memoryDisk𝑣 "#𝑣 #𝑣 (𝑣 & 𝑣 ""𝑣 'Block 1 Block 2Block 0𝑣 %𝑣 )𝑣 *Stored on disk (not loaded in main memory)𝑣 !Neighbor list (Neighbor ID, Distance)Max degree:3( 𝑣" , 236) 𝑣 &( 𝑣! , 387) 𝑣 $( 𝑣$$ , 219) 𝑣 $$( 𝑣# , 473) 𝑣 *( 𝑣$$ , 219) 𝑣$$( 𝑣" , 236) 𝑣 &( 𝑣! , 387) 𝑣 $Added neighborsUpdated neighbor list

A survey of knowledge-enhanced text generation WYu CZhu ZLi ZHu QWang HJi MJiang ACM Computing Surveys 54 11s 2022 HJin YZhang DMeng JWang JTan arXiv:2403.02901 A comprehensive survey on process-oriented automatic text summarization with exploration of llm-based methods 2024 arXiv preprint Toolqa: A dataset for llm question answering with external tools YZhuang YYu KWang HSun CZhang Advances in Neural Information Processing Systems 36 2023 Evaluating the feasibility of chatgpt in healthcare: an analysis of multiple clinical and research scenarios MCascella JMontomoli VBellini EBignami Journal of medical systems 47 1 33 2023 Detecting llm-generated text in computing education: Comparative study for chatgpt cases MSOrenstrakh OKarnalim CASuarez MLiut IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC) IEEE 2024. 2024 KPandya MHolia arXiv:2310.05421 Automating customer service using langchain: Building custom open-source gpt chatbot for organizations 2023 arXiv preprint LHuang WYu WMa WZhong ZFeng HWang QChen WPeng XFeng BQin A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions 2023 ACM Transactions on Information Systems YYao JDuan KXu YCai ZSun YZhang A survey on large language model (llm) security and privacy: The good, the bad, and the ugly, High-Confidence Computing 2024 100211 Retrieval-augmented generation for knowledge-intensive nlp tasks PLewis EPerez APiktus FPetroni VKarpukhin NGoyal HKüttler MLewis W-T. Yih TRocktäschel Advances in Neural Information Processing Systems 33 2020 Survey of vector database management systems JJPan JWang GLi The VLDB Journal 33 5 2024 Spann: Highly-efficient billion-scale approximate nearest neighborhood search QChen BZhao HWang MLi CLiu ZLi MYang JWang Advances in Neural Information Processing Systems 34 2021 Hierarchical clustering-based graphs for large scale approximate nearest neighbor search JVMunoz MAGonçalves ZDias RD STorres Pattern Recognition 96 106970 2019 SYu JEngels JHuang Shun arXiv:2312.03940 Pecann: Parallel efficient clustering with graph-based approximate nearest neighbor search 2023 arXiv preprint Product quantization for nearest neighbor search HJegou MDouze CSchmid IEEE transactions on pattern analysis and machine intelligence 33 1 2010 Locally optimized product quantization for approximate nearest neighbor search YKalantidis YAvrithis Proceedings of the IEEE conference on computer vision and pattern recognition the IEEE conference on computer vision and pattern recognition 2014 The inverted multi-index ABabenko VLempitsky IEEE transactions on pattern analysis and machine intelligence 37 6 2014 Efficient k-nearest neighbor graph construction for generic similarity measures WDong CMoses KLi Proceedings of the 20th international conference on World wide web the 20th international conference on World wide web 2011 CFu CXiang CWang DCai arXiv:1707.00143 Fast approximate nearest neighbor search with the navigating spreadingout graph 2017 arXiv preprint Approximate nearest neighbor algorithm based on navigable small world graphs YMalkov APonomarenko ALogvinov VKrylov Information Systems 45 2014 Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs YAMalkov DAYashunin IEEE transactions on pattern analysis and machine intelligence 42 4 2018 Fast approximate knn graph construction for high dimensional data via recursive lanczos bisection JChen H-R. Fang YSaad Journal of Machine Learning Research 10 9 2009 MWang XXu QYue YWang arXiv:2101.12631 A comprehensive survey and experimental comparison of graph-based approximate nearest neighbor search 2021 arXiv preprint New trends in high-d vector similarity search: al-driven, progressive, and distributed KEchihabi KZoumpatianos TPalpanas Proceedings of the VLDB Endowment 14 12 2021 Results of the neurips'21 challenge on billion-scale approximate nearest neighbor search HVSimhadri GWilliams MAumüller MDouze ABabenko DBaranchuk QChen LHosseini RKrishnaswamny GSrinivasa NeurIPS 2021 Competitions and Demonstrations Track PMLR 2022 Diskann: Fast accurate billion-point nearest neighbor search on a single node SJayaramSubramanya FDevvrit HVSimhadri RKrishnawamy RKadekodi Advances in Neural Information Processing Systems 32 2019 Approximate nearest neighbor queries in fixed dimensions SArya DMMount SODA 93 1993 Citeseer Parlayann: Scalable and deterministic parallel graph-based approximate nearest neighbor search algorithms MDManohar ZShen GBlelloch LDhulipala YGu HVSimhadri YSun Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming 2024