=Paper= {{Paper |id=Vol-3741/paper13 |storemode=property |title=Efficient and Effective Multi-Vector Dense Retrieval with EMVB |pdfUrl=https://ceur-ws.org/Vol-3741/paper13.pdf |volume=Vol-3741 |authors=Franco Maria Nardini,Cosimo Rulli,Rossano Venturini |dblpUrl=https://dblp.org/rec/conf/sebd/NardiniRV24 }} ==Efficient and Effective Multi-Vector Dense Retrieval with EMVB== https://ceur-ws.org/Vol-3741/paper13.pdf
                                Efficient and Effective Multi-Vector Dense Retrieval
                                with EMVB
                                Franco Maria Nardini1 , Cosimo Rulli1,* and Rossano Venturini2
                                1
                                    ISTI-CNR, Pisa, Italy
                                2
                                    University of Pisa, Italy


                                               Abstract
                                               Dense retrieval techniques utilize large pre-trained language models to construct a high-dimensional
                                               representation of queries and passages. These representations assess the relevance of a passage con-
                                               cerning a query through efficient similarity measures. Multi-vector representations, while enhancing
                                               effectiveness, cause a one-order-of-magnitude increase in memory footprint and query latency by encod-
                                               ing queries and documents on a per-token level. The current state-of-the-art approach, namely PLAID,
                                               has introduced a centroid-based term representation to mitigate the memory impact of multi-vector
                                               systems. By employing a centroid interaction mechanism, PLAID filters out non-relevant documents,
                                               reducing the cost of subsequent ranking stages. This paper 1 introduces "Efficient Multi-Vector dense
                                               retrieval with Bit vectors" (EMVB), a novel framework for efficient query processing in multi-vector
                                               dense retrieval. Firstly, EMVB utilizes an optimized bit vector pre-filtering step for passages, enhancing
                                               efficiency. Secondly, the computation of centroid interaction occurs column-wise, leveraging SIMD
                                               instructions to reduce latency. Thirdly, EMVB incorporates Product Quantization (PQ) to decrease the
                                               memory footprint of storing vector representations while facilitating fast late interaction. Lastly, a
                                               per-document term filtering method is introduced, further improving the efficiency of the final step.
                                               Experiments conducted on MS MARCO and LoTTE demonstrate that EMVB achieves up to a 2.8Γ— speed
                                               improvement while reducing the memory footprint by 1.8Γ—, without compromising retrieval accuracy
                                               compared to PLAID.

                                               Keywords
                                               Dense Retrieval, Multi-Vector, Efficiency, Bit Vectors.


                                1. Introduction
                                The widely acknowledged capability of Large Language Models (LLMs) to model semantic and
                                context has been extensively used in Information Retrieval. In Dense Retrieval, LLMs are used
                                to encode documents and queries into 𝑑-dimensional vectors. This enables the modeling of
                                document-query relevance using simple metrics like Euclidean distance. In this line, a successful
                                strategy involves using multi-vector representations for documents and queries, where a 𝑑-
                                dimensional vector is produced for each token in the text. In this context, the similarity between
                                the query and the passage is measured using the so-called late interaction mechanism. This
                                mechanism works by computing the sum of the maximum similarities between each term of the
                                query and each term of a candidate passage. Although multi-vector representations enhance
                                effectiveness, they come at the cost of increased computational burden, including a larger
                                memory footprint and longer retrieval time.

                                1
                                 This paper is an extended abstract of Nardini et al. [1].
                                SEBD 2024: 32nd Symposium on Advanced Database Systems, June 23-26, 2024, Villasimius, Sardinia, Italy
                                *
                                  Corresponding author.
                                             Β© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).




CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
   Various approaches have been proposed to enhance the efficiency and reduce memory
demands in multi-vector systems. ColBERT [2] exploits an inverted index to store all the
terms embeddings and retrieve the candidate passages, but it necessitates maintaining the
full-precision representation of each document term in memory, which can be substantial (e.g.,
140 GB for MSMARCO). ColBERTv2 [3] introduces a centroid-based compression technique
where each embedding is stored by saving the id of the closest centroid and then compressing
the residual (i.e., the element-wise difference) by using 1 or 2 bits per component. ColBERTv2
saves up to 10Γ— space compared to ColBERT but sacrifices retrieval efficiency requiring up to 3
seconds to perform query processing on CPU. PLAID [4] builds on the embedding compressor
of ColBERTv2 and leverages the centroid-based representation to discard non-relevant passages
(centroid interaction), thus performing the late interaction exclusively on a carefully selected
batch of passages. PLAID allows for massive speedup compared to ColBERTv2, but its average
query latency can be up to 400 msec. on CPU with single-thread execution [4].
   This paper introduces EMVB, a novel framework designed for efficient query processing in
multi-vector dense retrieval. The key focus is on addressing the most time-consuming steps
identified in PLAID, which include: i) extracting the top-nprobe closest centroids during candi-
date passage selection, ii) computing the centroid interaction mechanism, and iii) decompressing
quantized residuals. To address the first two steps, we propose a highly efficient passage filtering
approach based on optimized bit vectors. This approach significantly reduces the cost of top-
nprobe extraction by identifying a small set of crucial centroid scores. Additionally, it decreases
the number of passages for which centroid interaction computation is necessary. We further
enhance efficiency in the second step by introducing a highly efficient column-wise reduction
leveraging SIMD instructions. For the third step, late interaction efficiency is improved by
introducing Product Quantization (PQ) [5]. This method provides comparable or superior per-
formance compared to PLAID’s bitwise compressor, while being up to 3Γ— faster. Additionally,
we introduce a dynamic passage-term-selection criterion for late interaction, reducing the cost
of this step by up to 30%.
   Experimental evaluations on MS MARCO [6] passage (in-domain) and LoTTE [3] (out-of-
domain) datasets demonstrate the effectiveness of EMVB compared to PLAID. On MS MARCO,
EMVB achieves up to a 2.8Γ— speed improvement while reducing the memory footprint by 1.8Γ—
without compromising retrieval accuracy. In the out-of-domain evaluation, EMVB delivers up
to a 2.9Γ— speedup compared to PLAID with minimal loss in retrieval quality.

2. PLAID
In a multi-vector dense retrieval scenario, an LLM encodes a passage 𝒫 into a collection of 𝑛𝑑
dense 𝑑-dimensional vector 𝑇𝑗 where 𝑛𝑑 is the number of tokens in the passage. Encoding each
token in each passage generates large collection, e.g., almost 600M of vectors for the 8.8M of
passages in MSMARCO. In virtue of this, ColBERTv2 [3] and successively PLAID [4] exploit a
centroid-based vector compression technique. First, the K-Means algorithm is used to identify
a set of π‘˜ centroids π’ž = {𝐢𝑖 }𝑛𝑖=1
                                 𝑐
                                   . The residual π‘Ÿ between a vector π‘₯ and its closest centroid
Β―
𝐢 is computed so that π‘Ÿ = π‘₯ βˆ’ 𝐢     Β― is computed, and then. compressed into Λœπ‘Ÿ using a 𝑏-bit
encoder that represents each dimension of π‘Ÿ using 𝑏 bits, with 𝑏 ∈ {1, 2}. This way, the memory
footprint of each vector is given by ⌈log2 |𝐢|βŒ‰ bits for the centroid index and 𝑑 Γ— 𝑏 bits for the
compressed residual. At scoring time, decompressing the residual encoding is inefficient. For
this reason, PLAID aims at decompressing as few candidate documents as possible by hinging on
the so-called centroid interaction filtering step [4]. We now detail the PLAID retrieval system [4].
After the K-Means algorithm, each centroid is linked with a posting list containing the ids of the
candidate passages. A passage belongs to a centroid 𝐢𝑖 candidate list if at lest one of its tokens
have 𝐢𝑖 as its closest centroid. The query processing starts by computing the top-nprobe closest
centroids for each query term π‘žπ‘– , with 𝑖 = 1, . . . , π‘›π‘ž , according to the dot product similarity
measure. From the set of set of closest centroids, the candidates passages are retrieved, thanks
to the previously built posting lists. In the centroid interaction step, the distance between the
𝑖-th query term π‘žπ‘– and a token embedding 𝑇𝑗 with 𝑗 = 1, . . . , 𝑛𝑝 is computed as
                                                       𝑇
                                                   Β― 𝑗 = π‘‡Λœ 𝑖,𝑗 .
                                    π‘žπ‘– Β· 𝑇𝑗 ≃ π‘žπ‘– Β· 𝐢                                            (1)
         𝑇
      Β― 𝑗 is the closest centroid to 𝑇𝑗 . We estimate the score of a passage 𝑃 with 𝑛𝑝 terms as
where 𝐢
                                             π‘›π‘ž
                                   Β― π‘ž,𝑃 =                   Β― 𝑇𝑗
                                             βˆ‘οΈ
                                   𝑆                max π‘žπ‘– Β· 𝐢                                  (2)
                                                   𝑗=1...𝑛𝑑
                                             𝑖=1

In the decompression phase the full-precision representation of 𝑃 is reconstructed by combin-
ing the centroids and the residuals. Only the top-ndocs passages from the previous centroid
interaction step move to this step. Finally, PLAID applied late interaction [2, 3] to computed
the score of a re-constructed candidate passage against a query π‘ž. The late interaction measure
is defined by Equation 3. Passages are then ranked according to their similarity score and the
top-π‘˜ passages are selected.
                                           π‘›π‘ž
                                           βˆ‘οΈ
                                   π‘†π‘ž,𝑃 =       max π‘žπ‘– Β· 𝑇𝑗 .                                (3)
                                                   𝑗=1...𝑛𝑑
                                             𝑖=1

PLAID execution time. We present a detailed analysis of PLAID’s execution time, delineat-
ing it into distinct phases such as retrieval, filtering, decompression, and late interaction. The
experimentation adheres to the settings outlined in Section 4. The resulting execution times are
reported for various values of π‘˜, representing the number of retrieved passages.

             k = 10                                                        Retrieval
                                                                           Filtering
                                                                           Decompression
         k = 100                                                           Late Interaction

        k = 1000
                      0     50         100          150         200       250         300
                                             Execution time (msec.)
Figure 1: Breakdown of the PLAID average query latency (in milliseconds) on CPU across its four
phases.
3. EMVB
Fast Closest Centroids Selection. The retrieval phase in PLAID is time consuming, as
shown in Figure 1. This step concists of i) matrix multiplication between the query matrix and
centroids for distance computation, ii) identify top-nprobe closest centroids, for each query
term. Maybe surprisingly, the former step is the most time consuming (3Γ— slower than matrix
multiplication), even when performed with asymptotically linear algorithms such as quickselect.
Our pre-filtering strategy, as explained in the subsequent paragraph, effectively accelerate the
selection of the top-nprobe by minimizing the number of evaluated elements. In practice, we
efficiently discard centroids with scores below a predefined threshold, and then exclusively apply
quickselect to the remaining ones. As a result, in EMVB, the cost associated with extracting the
top-nprobe becomes negligible, showcasing a speed improvement of two orders of magnitude
when compared to extracting the top-nprobe from the complete set of centroids.
Pre-filtering using bitvectors. Let us recall the definition of π‘‡Λœ 𝑖,𝑗 , which represents the
approximate score of the 𝑗-th token of passage 𝑃 with respect to the 𝑖-th term of the query π‘žπ‘– ,
as defined in Equation 1. Estimating whether π‘‡Λœ 𝑖,𝑗 has a large value is a proxy for estimating
the importance of a passage 𝑃 w.r.t to the query. Given a passage 𝑃 , our pre-filtering consists
in determining whether π‘‡Λœ 𝑖,𝑗 , for 𝑖 = 1, . . . , π‘›π‘ž , 𝑗 = 1, . . . , 𝑛𝑑 is large or not. Recall that π‘‡Λœ 𝑖,𝑗
represents the approximate score of the 𝑗-th token of passage 𝑃 with respect to the 𝑖-th term
of the query π‘žπ‘– , as defined in Equation 1. Our pre-filtering approach works by checking if the
centroid associated with 𝑇𝑗 (𝐢    Β― 𝑇𝑗 ) belongs to the set of the closest centroids of π‘žπ‘– . We define
closeπ‘‘β„Ž 𝑖 as the set of centroids whose scores surpass a specified threshold π‘‘β„Ž in relation to a
query term π‘žπ‘– . For a certain passage 𝑃 , we also introduce the list of centroids ids 𝐼𝑃 , where
                          Β― 𝑇𝑗 . The similarity of a passage with respect to a query can be rapidly
𝐼𝑃𝑗 is the centroid id of 𝐢
estimated with our novel filtering function 𝐹 (𝑃, π‘ž) ∈ [0, π‘›π‘ž ] with the following equation:
                                            π‘›π‘ž
                                                  1(βˆƒ 𝑗 s.t. 𝐼𝑃𝑗 ∈ closeπ‘‘β„Ž
                                            βˆ‘οΈ
                               𝐹 (𝑃, π‘ž) =                               𝑖 ).                             (4)
                                            𝑖=1

For a passage 𝑃 , this counts how many query terms have at least one similar passage term in
𝑃 , where β€œsimilar” describes the belonging of 𝑇𝑗 to closeπ‘‘β„Ž  𝑖 .
   In Figure 2 (left), we present a performance comparison of our innovative pre-filter operating
in conjunction with the centroid interaction mechanism (depicted by orange, blue, and green
lines) against the performance of the centroid interaction mechanism applied to the entire set
of candidate documents (indicated by the red dashed line) on the MS MARCO dataset. The plot
illustrates that our pre-filtering approach efficiently eliminates non-relevant passages without
adversely affecting the recall of the subsequent centroid interaction phase. For instance, we can
significantly reduce the candidate passage set to just 1000 elements using a threshold of 0.4
without any compromise in R@100. In the subsequent sections, we detail the implementation
of this pre-filter for optimal efficiency.
Building the bit vectors. Let 𝐢𝑆 = π‘ž Β· 𝐢 𝑇 , with 𝐢𝑆 ∈ [βˆ’1, 1]π‘›π‘ž Γ—|𝐢| , with π‘›π‘ž is the number of
query tokens, and |𝐢| is the number of centroids. For each 𝑖-th row of 𝐢𝑆, we want to scan it and
pick those 𝑗 s.t. 𝐢𝑆𝑖,𝑗 > π‘‘β„Ž. This conceptually trivial algorithm can be implemented by leverag-
ing SIMD instructions featured by modern CPUs. In particular, the AVX512 instruction set allows
                                                                  10.0
        0.8650                                                                     Naive IF
                                                                   7.5             Branchless




                                                      Time (ms)
        0.8625
R@100
                                                                                   VecBranchless
                                                                   5.0             Vectorized IF
        0.8600       0.3          0.5
                     0.4          w/o pre-filtering                2.5
        0.8575
              1000 2000 3000 4000 5000                                   0.2         0.4       0.6
                    #Scored Passages                                           Threshold value
Figure 2: R@100 with various values of the threshold (left). Comparison of different algorithms to
construct closeπ‘‘β„Ž
               𝑖 , for different values of π‘‘β„Ž (right).




to compare 16 fp32 values at a time thanks to the _mm512_cmp_epi32_mask instruction and
store the comparison result in a π‘šπ‘Žπ‘ π‘˜ variable. Those indexes 𝐽 = {𝑗 ∈ [0, 15] | π‘šπ‘Žπ‘ π‘˜π‘— = 1} (if
any) can be efficiently extracted by means of the _mm512_mask_compressstore instruction.
   The effectiveness of algorithms employing if-based structures is largely contingent on the
branch misprediction ratio. Contemporary CPUs speculate about the if condition’s outcome
by identifying patterns in the algorithm’s execution flow. If an incorrect branch prediction
occurs, a control hazard arises, leading to a pipeline flush with a delay of 15 to 20 clock
cycles, approximately 10𝑛𝑠. To address the inefficiency associated with branch misprediction,
we introduce a branchless algorithm. For a detailed description of the algorithm and of its
vectorized version, refer to the original paper [1].
   Figure 2 (right) presents a comparison of our different approaches, namely "Naive IF," the
"Vectorized IF," the "Branchless," and the "VecBranchless" described above. Branchless algorithms
present a constant execution time, regardless of the value of the threshold, while if-based
approaches offer better performances as the value of π‘‘β„Ž increases. With π‘‘β„Ž β‰₯ 0.3, "Vectorized
IF" is the most efficient approach, with a speedup up to 3 times compared to its naive counterpart.
Fast set membership. We now move to the problem of computing Equation 4, assuming
closeπ‘‘β„Ž 𝑖 to be known. Observe that this is a integer set membership problem, where we have
to test if at least one member of 𝐼𝑃 belongs to closeπ‘‘β„Ž  𝑖 , with 𝑖 = 1, . . . , π‘›π‘ž . Bit vectors (or bit
array) are a widely adopted solution for implementing sets of integer values. A bit vector maps
a set of integers up to 𝑁 into an array of 𝑁 bits, where the 𝑒-th bit is set to one if and only if
the integer 𝑒 is part of the set. Operations like addition and searching for any integer 𝑒 can be
executed in constant time using bit manipulation operators. In terms of memory occupancy, a
bit vectors requires 𝑁/8 bytes. In our scenario, given that |𝐢| = 218 , a bit vector only needs
32𝐾 bytes for storage.
   We further improve the efficiency of bit vectors by relying on the specific properties of
our setting. As we need to search through all the π‘›π‘ž bit vectors at a time, we rearrange the
representation of the bit vectors by stacking them vertically (Figure 3). This allows to search a
centroid index through all the closeπ‘‘β„Ž 𝑖 at a time. The bits corresponding to the same centroid
for different query terms are consecutive and fit a 32-bit word. This way, we can simultaneously
test the membership for all the queries in constant time with a single bitwise operation. In
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    1.5                                   20




                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           query (ms)
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         Time per




                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Time per
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         doc ( s)
          Stacked
         Bit Vectors                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         2
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     1 m=0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    1.0                      Baseline 10
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   xor
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   AAACCHicbVBLSgNBFOyJvxh/UZduGoPgKsxIUJdBNy4jmA9khtDT6SRNerqH7jdiGHIBD+BWj+BO3HoLT+A17ElmYRILHhRV7/GKCmPBDbjut1NYW9/Y3Cpul3Z29/YPyodHLaMSTVmTKqF0JySGCS5ZEzgI1ok1I1EoWDsc32Z++5Fpw5V8gEnMgogMJR9wSsBKvh8RGAGkT0pPe+WKW3VnwKvEy0kF5Wj0yj9+X9EkYhKoIMZ0PTeGICUaOBVsWvITw2JCx2TIupZKEjETpLPMU3xmlT4eKG1HAp6pfy9SEhkziUK7mWU0y14m/ud1ExhcBymXcQJM0vmjQSIwKJwVgPtcMwpiYgmhmtusmI6IJhRsTQtfQqXGQEKTNeMt97BKWhdV77Jau69V6jd5R0V0gk7ROfLQFaqjO9RATURRjF7QK3pznp1358P5nK8WnPzmGC3A+foFRAebeg==




                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    0.5                      Vectorized
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     0 0 0 0
         0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        0   1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              0.2        0.4     0.6
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   xor
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   AAACCHicbVBLSgNBFOyJvxh/UZduGoPgKsxIUJdBNy4jmA9khtDT6SRNerqH7jdiGHIBD+BWj+BO3HoLT+A17ElmYRILHhRV7/GKCmPBDbjut1NYW9/Y3Cpul3Z29/YPyodHLaMSTVmTKqF0JySGCS5ZEzgI1ok1I1EoWDsc32Z++5Fpw5V8gEnMgogMJR9wSsBKvh8RGAGkT0pPe+WKW3VnwKvEy0kF5Wj0yj9+X9EkYhKoIMZ0PTeGICUaOBVsWvITw2JCx2TIupZKEjETpLPMU3xmlT4eKG1HAp6pfy9SEhkziUK7mWU0y14m/ud1ExhcBymXcQJM0vmjQSIwKJwVgPtcMwpiYgmhmtusmI6IJhRsTQtfQqXGQEKTNeMt97BKWhdV77Jau69V6jd5R0V0gk7ROfLQFaqjO9RATURRjF7QK3pznp1358P5nK8WnPzmGC3A+foFRAebeg==




                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     0 0 1 0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         th
         0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        1   0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 PLAID




                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           query (ms)
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Time per
32 bit                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               0 1 1 0
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 Ours                     20
word
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 3                                                                        10
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      closei
         close0




                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        AAACE3icbVBLSgNBFOyJvxh/UXHlpjEIcRNmJKgbISiIywjmA0kIPZ1O0qSne+h+I4RhjuEB3OoR3IlbD+AJvIY9SRYmseBBUfUeryg/FNyA6347mZXVtfWN7GZua3tndy+/f1A3KtKU1agSSjd9YpjgktWAg2DNUDMS+II1/NFt6jeemDZcyUcYh6wTkIHkfU4JWKmbP7q7xu2AwBAgDlVIJSTF4KybL7gldwK8TLwZKaAZqt38T7unaBQwCVQQY1qeG0InJho4FSzJtSPDQkJHZMBalkoSMNOJJ/ETfGqVHu4rbUcCnqh/L2ISGDMOfLuZJjWLXir+57Ui6F91Yi7DCJik00f9SGBQOO0C97hmFMTYEkI1t1kxHRJNKNjG5r74So2A+CaxzXiLPSyT+nnJuyiVH8qFys2soyw6RieoiDx0iSroHlVRDVEUoxf0it6cZ+fd+XA+p6sZZ3ZziObgfP0CaVeeoQ==




                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        F = popcnt(m)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      2000 4000 6000 8000 10000
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  …
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             #Candidate Documents
         AAACDnicbVDLSsNAFJ3UV62vqEs3g0VwVRIp6rLoxmUF+4A2hMl02g6dZMLMTaGE/IMf4FY/wZ249Rf8An/DSZuFbT1w4XDOvdzDCWLBNTjOt1Xa2Nza3invVvb2Dw6P7OOTtpaJoqxFpZCqGxDNBI9YCzgI1o0VI2EgWCeY3Od+Z8qU5jJ6glnMvJCMIj7klICRfNvuhwTGACkVUrPMd3y76tScOfA6cQtSRQWavv3TH0iahCwCKojWPdeJwUuJAk4Fyyr9RLOY0AkZsZ6hEQmZ9tJ58gxfGGWAh1KZiQDP1b8XKQm1noWB2cxz6lUvF//zegkMb72UR3ECLKKLR8NEYJA4rwEPuGIUxMwQQhU3WTEdE0UomLKWvgRSToAEOjPNuKs9rJP2Vc29rtUf69XGXdFRGZ2hc3SJXHSDGugBNVELUTRFL+gVvVnP1rv1YX0uVktWcXOKlmB9/QJ7750f




                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      AAACDnicbVDLSsNAFJ3UV62vqEs3wSK4KokUdVl047KCfUAbwmQ6bYdOZsLMTaGE/IMf4FY/wZ249Rf8An/DSZuFbT1w4XDOvdzDCWPONLjut1Xa2Nza3invVvb2Dw6P7OOTtpaJIrRFJJeqG2JNORO0BQw47caK4ijktBNO7nO/M6VKMymeYBZTP8IjwYaMYDBSYNv9CMMYICVcapoFLLCrbs2dw1knXkGqqEAzsH/6A0mSiAogHGvd89wY/BQrYITTrNJPNI0xmeAR7RkqcES1n86TZ86FUQbOUCozApy5+vcixZHWsyg0m3lOverl4n9eL4HhrZ8yESdABVk8GibcAenkNTgDpigBPjMEE8VMVoeMscIETFlLX0IpJ4BDnZlmvNUe1kn7quZd1+qP9WrjruiojM7QObpEHrpBDfSAmqiFCJqiF/SK3qxn6936sD4XqyWruDlFS7C+fgHXAJ1Y




Figure 3: Vectorized Fast Set Membership                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Figure 4: Vectorized vs naΓ―ve Fast Set
algorithm based on bit vectors.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           Membership (up). Ours vs PLAID filtering (down).


detail, our algorithm starts by initializing a mask π‘š of π‘›π‘ž = 32 bits at zeros (Step 1, Figure 3).
Subsequently, for each term in the candidate documents, it performs a bitwise xor between the
mask and the 32-bit word representing membership to all the query terms (Step 2, Figure 3).
Consequently, Equation 4 can be derived by counting the number of 1s in π‘š at the end of the
execution using the popcnt operation available in modern CPUs (Step 3, Figure 3).
   Figure 4 (up) showcases that our algorithm described in Figure 3 (β€œVectorized”) is 10Γ— to 16Γ—
faster than the β€œBaseline” usage of bit vectors, and up to 30Γ— faster than the centroid-interaction
proposed in PLAID [4], cf. Figure 4 (down).
Fast Filtering of Candidate Passages
   Our pre-filtering approach allows us to efficiently filter out non-relevant passages and is
employed upstream of PLAID’s centroid interaction (Equation 2). The efficiency of the centroid
interaction itself can be improved by using our column-wise reduction approach. For reason
of space, we do not report the description of the algorithm in this discussion paper, and
we encourage the reader to refer to the original work [1]. We implement PLAID’s centroid
interaction in C++ and we compare its filtering time against our SIMD-based solution. The
results of the comparison are reported for different values of candidate documents in Figure 4
(down). Thanks to the proficient read-write pattern and the highly efficient column-wise
max-reduction, our method can be up to 1.8Γ— faster than the filtering proposed in PLAID.
Late Interaction We propose to the 𝑏-bit residual compressor [3, 4] Product Quantization
(PQ) [5]. Introducing PQ has two main advantages. On the one hand, it allows to compute
the dot product between an input query vector π‘ž and the compressed residual π‘Ÿπ‘π‘ž without
decompression. On the other hand, it allows to re-use the Consider a query π‘ž and a candidate
passage 𝑃 . We decompose the computation of the dot product between the query terms π‘žπ‘– and
the closest centroid 𝐢¯ 𝑇𝑗 . Equation 3 becomes
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   π‘›π‘ž                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               π‘›π‘ž
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Β― 𝑇𝑗 + π‘žπ‘– Β· π‘Ÿπ‘‡π‘— ) ≃                                   𝑇𝑗
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     Β― 𝑇𝑗 + π‘žπ‘– Β· π‘Ÿπ‘π‘ž
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   βˆ‘οΈ                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               βˆ‘οΈ
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  π‘†π‘ž,𝑃 =                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          max (π‘žπ‘– Β· 𝐢                              max (π‘žπ‘– Β· 𝐢               ),         (5)
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 𝑗=1...𝑛𝑑                                 𝑗=1...𝑛𝑑
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   𝑖=1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              𝑖=1

                     Β― 𝑇𝑗 . Our experimental evaluation shows that PQ is both faster (up to
where and π‘Ÿπ‘‡π‘— = 𝑇𝑗 βˆ’ 𝐢
3.6Γ—) and more effective compared to the 𝑏-bit compressor used in previous work. We propose
to further improve the efficiency of the scoring phase by hinging on the properties of Equation 5.
                                                   𝑇𝑗
                                      Β― 𝑇𝑗 > π‘žπ‘– Β· π‘Ÿπ‘π‘ž
In many cases, we have that π‘žπ‘– Β· 𝐢                    ; hence the max operator on 𝑗 is lead by the
score between the query term and the centroid, rather than the score between the query term
and the residual. We argue that it is possible to compute the scores on the residuals only for a
reduced set of document terms 𝐽¯ 𝑖 , where 𝑖 identifies the index of the query term. In particular,
               Β― 𝑇𝑗 > π‘‘β„Žπ‘Ÿ }, where π‘‘β„Žπ‘Ÿ is a second threshold that determines whether the score with
𝐽¯ 𝑖 = {𝑗|π‘žπ‘– Β· 𝐢
the centroid is sufficiently large. With the introduction of this new per-term filter, Equation 5
now becomes computing the max operator on the set of passages in 𝐽¯ 𝑖 , i.e.,
                                       π‘›π‘ž
                                                         𝑇         𝑇
                                                      Β― 𝑗 + π‘žπ‘– Β· π‘Ÿπ‘π‘žπ‘— ).
                                       βˆ‘οΈ
                              π‘†π‘ž,𝑃 =         max(π‘žπ‘– Β· 𝐢                                         (6)
                                               ¯𝑖
                                             π‘—βˆˆπ½
                                       𝑖=1

In practice, we compute the residual scores only for those document terms whose centroid score
is large enough. If 𝐽¯ 𝑖 = βˆ…, we compute π‘†π‘ž,𝑃 as in Equation 5. We experimentally verify that
this allows to save up to 30% in the late interaction with no performance degradation.

4. Experimental Evaluation
Experimental Settings. In this section, we compare our methodology with the state-of-the-art
engine for multi-vector dense retrieval, namely PLAID [4]. Our experiments are conducted on
the MS MARCO passages dataset [6] for in-domain evaluation and on LoTTE [3] for out-of-
domain evaluation. Embeddings for MS MARCO are generated using the ColBERTv2 model,
resulting in a dataset composed of about 600 million 𝑑-dimensional vectors, with 𝑑 = 128. The
implementation of Product Quantization utilizes the FAISS [7] library and is optimized using the
JMPQ [8] technique. The experiments are carried out on an Intel Xeon Gold 5318Y CPU clocked
at 2.10 GHz, equipped with the AVX512 instruction set, and executed with single-threading. The
code is compiled using GCC 11.3.0 with -O3 compilation options on a Linux 5.15.0-72 machine.
Evaluation. Table 1 compares EMVB against PLAID on the MS MARCO dataset, in terms of
memory requirements (num. of bytes per embedding), average query latency (in milliseconds),
MRR@10, and Recall@100, and 1000. With π‘š = 16, EMVB almost halves the per-vector
memory load compared to PLAID, achieving up to 2.8Γ— faster processing with minimal impact
on retrieval effectiveness. Doubling the number of sub-partitions per vector, i.e., π‘š = 32,
EMVB surpasses PLAID’s performance in terms of MRR and Recall while maintaining the same
memory footprint, achieving up to 2.5Γ— speedup.
   Table 2 presents a comparison between EMVB and PLAID in the out-of-domain evaluation on
the LoTTE dataset. Similar to PLAID [4], Success@5 and Success@100 are employed as retrieval
quality metrics. On this dataset, EMVB exhibits slightly lower performance in terms of retrieval
quality. It’s worth noting that JMPQ [8] cannot be applied in the out-of-domain evaluation due
to the absence of training queries. Instead, we utilize Optimized Product Quantization (OPQ) [9],
which searches for an optimal rotation of the dataset vectors to mitigate the quality degradation
associated with PQ. To address the retrieval quality loss, PQ is experimented with π‘š = 32, as
an increased number of partitions offers a better representation of the original vector. However,
EMVB provides a substantial speedup of up to 2.9Γ— compared to PLAID in this out-of-domain
         π‘˜          Method         Latency (π‘šπ‘ π‘’π‘.)    Bytes    MRR@10     R@100     R@1000
                    PLAID              131 (1.0Γ—)       36         39.4         -           -
         10         EMVB (m=16)         62 (2.1Γ—)       20         39.4         -           -
                    EMVB (m=32)         61 (2.1Γ—)       36         39.7         -           -
                    PLAID              180 (1.0Γ—)       36         39.8      90.6           -
         100        EMVB (m=16)         68 (2.6Γ—)       20         39.5      90.7           -
                    EMVB (m=32)         80 (2.3Γ—)       36         39.9      90.7           -
                    PLAID              260 (1.0Γ—)       36         39.8      91.3      97.5
         1000       EMVB (m=16)         93 (2.8Γ—)       20         39.5      91.4      97.5
                    EMVB (m=32)        104 (2.5Γ—)       36         39.9      91.4      97.5

Table 1
Comparison between EMVB and PLAID in terms of average query latency, number of bytes per vector
embeddings, MRR, and Recall on MS MARCO.


evaluation. This larger speedup, compared to MS MARCO, is attributed to the larger average
document lengths in LoTTE. In this context, filtering non-relevant documents using our bit
vector-based approach significantly impacts efficiency. It’s noteworthy that for the out-of-
domain evaluation, our pre-filtering method could be integrated into PLAID. This integration
could maintain PLAID’s accuracy while benefiting from EMVB’s efficiency. Combinations of
PLAID and EMVB are left for future exploration.

             π‘˜       Method         Latency (π‘šπ‘ π‘’π‘.)    Bytes    Success@5    Success@100
                     PLAID               131 (1.0Γ—)       36          69.1              -
             10
                     EMVB (m=32)          82 (1.6Γ—)       36          69.0              -
                     PLAID               202 (1.0Γ—)       36          69.4           89.9
             100
                     EMVB (m=32)         129 (1.6Γ—)       36          69.0           89.9
                     PLAID               411 (1.0Γ—)       36          69.6           90.5
             1000
                     EMVB (m=32)         142 (2.9Γ—)       36          69.0           90.1

Table 2
Comparison between EMVB and PLAID in terms of average query latency, number of bytes per vector
embeddings, Success@5, and Success@100 on LoTTE.

Acknowledgments
This work was supported by the EU - NGEU, by the PNRR - M4C2 - Investimento 1.3, Partenariato
Esteso PE00000013 - β€œFAIR - Future Artificial Intelligence Research” - Spoke 1 β€œHuman-centered
AI” funded by the European Commission under the NextGeneration EU program, by the PNRR
ECS00000017 Tuscany Health Ecosystem Spoke 6 β€œPrecision medicine & personalized healthcare”,
by the European Commission under the NextGeneration EU programme, by the Horizon Europe
RIA β€œExtreme Food Risk Analytics” (EFRA), grant agreement n. 101093026, by the β€œAlgorithms,
Data Structures and Combinatorics for Machine Learning” (MIUR-PRIN 2017), and by the
β€œAlgorithmic Problems and Machine Learning” (MIUR-PRIN 2022).
References
[1] F. M. Nardini, C. Rulli, R. Venturini, Efficient multi-vector dense retrieval with bit vectors,
    in: Proceedings of the 46th European Conference on Information Retrieval (ECIR 2024),
    2024.
[2] O. Khattab, M. Zaharia, Colbert: Efficient and effective passage search via contextualized
    late interaction over bert, in: Proceedings of the 43rd International ACM SIGIR conference
    on research and development in Information Retrieval, 2020, pp. 39–48.
[3] K. Santhanam, O. Khattab, J. Saad-Falcon, C. Potts, M. Zaharia, Colbertv2: Effective and
    efficient retrieval via lightweight late interaction, in: Proceedings of the 2022 Conference
    of the North American Chapter of the Association for Computational Linguistics: Human
    Language Technologies, 2022.
[4] K. Santhanam, O. Khattab, C. Potts, M. Zaharia, Plaid: an efficient engine for late interaction
    retrieval, in: Proceedings of the 31st ACM International Conference on Information &
    Knowledge Management, 2022.
[5] H. Jegou, M. Douze, C. Schmid, Product quantization for nearest neighbor search, IEEE
    Transactions on Pattern Analysis and Machine Intelligence (2010).
[6] T. Nguyen, M. Rosenberg, X. Song, J. Gao, S. Tiwary, R. Majumder, L. Deng, Ms marco: A
    human-generated machine reading comprehension dataset (????).
[7] J. Johnson, M. Douze, H. Jegou, Billion-scale similarity search with gpus, IEEE Transactions
    on Big Data (2021).
[8] Y. Fang, J. Zhan, Y. Liu, J. Mao, M. Zhang, S. Ma, Joint optimization of multi-vector
    representation with product quantization, in: Natural Language Processing and Chinese
    Computing, 2022.
[9] T. Ge, K. He, Q. Ke, J. Sun, Optimized product quantization, IEEE Transactions on Pattern
    Analysis and Machine Intelligence (2013).