<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>AMW</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>cKdtree: a Compact Kdtree for Spatial Data⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Gilberto Gutiérrez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rodrigo Torres-Avilés</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mónica Caniupán</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Departamento Ciencias de la Computación y Tecnologías de Información, Universidad del Bío-Bío</institution>
          ,
          <addr-line>Chillán</addr-line>
          ,
          <country country="CL">Chile</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Departamento Sistemas de Información, Universidad del Bío-Bío</institution>
          ,
          <addr-line>Concepción</addr-line>
          ,
          <country country="CL">Chile</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>16</volume>
      <fpage>0000</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>We introduce cKd-tree, a compact data structure designed to represent a Kd-tree index eficiently. The structure cKd-tree is essentially an encoding of the spiral code sequence of points within an implicit Kdtree (iKd-tree) using Directly Addressable Codes (DACs). The unique feature of cKd-tree lies in its ability to perform spiral encoding and decoding of points by relying solely on knowledge of their parent points within the iKd-tree. This inherent property, combined with DACs' direct access capability to sequence elements, enables cKd-tree to traverse and explore the tree while decoding only the nodes relevant to queries. We compare cKd-tree with iKd-tree and 2-tree data structures evaluating compression eficiency and execution time of point queries and the range queries. cKd-tree achieves a compression ratio comparable to that of 2-tree, approximately 70%, demonstrating heightened eficiency, particularly in scenarios characterized by sparse data. Additionally, 2-tree exhibits superior performance in querying individual points, whereas cKd-tree outperforms in the context of aggregate data queries, such as range queries.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Compression</kwd>
        <kwd>indices</kwd>
        <kwd>spatial data</kwd>
        <kwd>spatial points</kwd>
        <kwd>spatial queries</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Several multidimensional data structures have been devised for the storage and eficient querying
of set of points in both primary and secondary memory. Noteworthy among these structures
are the Kd-tree [1] and Quadtree [2], which enable the execution of single and aggregate
queries without necessitating a full scan of the entire dataset. A Kd-tree is a hierarchical
structure (binary tree), designed for recursive division of multi-dimensional space [1]. Within
this structure, each node contains data representing a -dimensional point in the space. When
dealing with static sets, data structures can be focused on implementing query operations,
which do not alter set size. This results in more cost-efective implementations in terms of
storage, as the algorithm proposed in [3] for a static Kd-tree. This static Kd-tree is implicitly
stored in an array, without using pointers, making it occupy less storage while maintaining
navigation eficiency comparable to its dynamic counterpart.
p7
p2</p>
      <p>p6
p5</p>
      <p>p3
a)
p1
p4
13
1121 R p5
18976523410 p7 p2 p6 p1 p3 p4
0 0 1 2 3 4 5 6 7 8 9 10 11 12 13
b)</p>
      <p>
        We refer to a balanced Kd-tree represented in an array as iKd-tree [4]. The construction process
of iKd-tree is described in [3]. As an illustration, Figure 1 represents an iKd-tree constructed from
a set of points  = ⟨1(
        <xref ref-type="bibr" rid="ref6 ref8">8, 6</xref>
        ), 2(
        <xref ref-type="bibr" rid="ref4 ref7">4, 7</xref>
        ), 3(
        <xref ref-type="bibr" rid="ref10 ref3">10, 3</xref>
        ), 5(
        <xref ref-type="bibr" rid="ref11 ref2">2, 11</xref>
        ), 6(
        <xref ref-type="bibr" rid="ref3 ref5">5, 3</xref>
        ), 4(
        <xref ref-type="bibr" rid="ref12 ref8">12, 8</xref>
        ), 7(
        <xref ref-type="bibr" rid="ref2 ref5">2, 5</xref>
        )⟩ with a
resulting arrangement stored in array  = [(
        <xref ref-type="bibr" rid="ref2 ref5">2, 5</xref>
        ), (
        <xref ref-type="bibr" rid="ref4 ref7">4, 7</xref>
        ), (
        <xref ref-type="bibr" rid="ref11 ref2">2, 11</xref>
        ), (
        <xref ref-type="bibr" rid="ref3 ref5">5, 3</xref>
        ), (
        <xref ref-type="bibr" rid="ref10 ref3">10, 3</xref>
        ), (
        <xref ref-type="bibr" rid="ref6 ref8">8, 6</xref>
        ), (
        <xref ref-type="bibr" rid="ref12 ref8">12, 8</xref>
        )].
The algorithm ensures the creation of a balanced binary tree, with a tree depth of log2(), as
depicted in Figure 1a). Navigating or exploring the iKd-tree is straightforward. Generally, the
position of the root node within the subtree, defined by the initial positions  and final positions
 of array , is situated at position ⌊︀ + ⌋︀ .
      </p>
      <p>2</p>
      <p>In this work, we introduce a compact version of the Kd-tree designed for storing a set of
static points  ⊆ N2, named the cKd-tree. We focused our attention on the Kd-tree given its
prominence as one of the most widely employed structures for indexing spatial and geometric
data [5].</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>The BSP-tree, very similar to the Kd-tree, is a multidimensional data structure that shares the
concept of recursively partitioning space using  − 1 dimensional hyperplanes. These
hyperplanes in the BSP-tree need not be iso-oriented. In contrast to the Kd-tree, where partitions
alternate between diferent dimensions, the BSP-tree adapts its partitions based on the
distribution of objects within the space or subspace. Another important data structure is the Quad-tree
[6]. This structure, like the Kd-tree, employs recursive division of space or subspace through
iso-oriented hyperplanes. However, a key distinction lies in the fact that each internal node of
the Quad-tree typically has 2 children. Unlike Kd-trees, these structures are predominantly
designed for dynamic object sets and are commonly implemented using pointers.</p>
      <p>Over the past few decades, there has been a pronounced surge in the exploration and
development of compact data structures (CDS). These structures compress static data, minimizing
storage requirements and facilitating data processing without the need for prior decompression
allowing the processing of data directly in main memory.</p>
      <p>Compact data structures ofer an alternative for representing points, exemplified by the
2-tree [7]. In this structure, points are derived from a binary matrix , where a point (, ) in
the set is symbolized by a stored value of 1 in the cell [, ]. The 2-tree allows to store static
sets of points without relying on pointers, making it a foundational structure for comparative
analysis in our study. This compact data structure has been used in diverse scenarios such as,
graph representations [7], raster data [8], and points in N2 [9].</p>
    </sec>
    <sec id="sec-3">
      <title>3. Our Proposal: The cKdtree</title>
      <p>cKd-tree is built upon the implicit version of a Kd-tree and leverages spiral encoding to represent
a set of points in a multidimensional (-dimensional) space. Moreover, cKd-tree employs DACs
encoding for a sequence of integers [10]. This dual encoding approach enables the structure to
store information more eficiently.</p>
      <p>The spiral encoding method involves assigning a positive integer to a point  in N2 based
on another point . Formally, considering points (, ) and (, ) from the set  ⊆  ×  ,
for ,  ⊆ N with dimensions || and | |, the spiral encoding of  with respect to  is denoted
as a function  : N2 → {0, . . . , || · |  |}. In this representation, () represents
the distance, measured in the number of cells, from the cell of  to the cell of  following a
spiral path with the origin at  (cell 0). Figure 1c) illustrates the spiral paths used to encode
points  and  starting from the reference point . In this example, () is equal to 10,
and () is equal to 22. The function () : {0, . . . , || · |  |} → N2 facilitates
retrieving the coordinates of a point  from its spiral code with respect to the point .</p>
      <p>The acronym DACs [10] stands for Directly Addressable Codes, which represent a
variablelength encoding of sequences of non-negative integers, typically arrays. This method divides
the binary representation of the integers of the array into blocks of  bits, adding a bit in the
most significant bit of the chunk, set to 0 when the chunk holds the most significant bits of the
integer, and 1 otherwise. Then, the chunks of size  + 1 are grouped together, in order, and
each group is called a level. With this construction, DACs keeps direct access to any element
of the encoded sequence without the need of any sampling method, therefore without using
asymptotically any extra space.</p>
      <p>Although it can be used a fixed block size  for DACs, it is possible to choose a diferent block
size at each level  (). This flexibility can be advantageous to achieve specific goals, such as
optimizing compression.</p>
      <p>A cKd-tree is represented as a tuple ⟨, ⟩, where  is a sequence of integers encoded
using DACs, representing an iKd-tree, and  ∈  denotes the root of the tree with  ⊆ N2 a set
of points. The sequence  is derived from the DACs encoding through a spiral codification of
the points in the iKd-tree.</p>
      <p>The algorithms for the construction of the cKd-tree together with the algorithms for
processing queries on sets of points are presented in [11]. We addressed two fundamental queries. The
point query which determines whether a given point  is part of the set of points represented
by the cKd-tree, and the range query that retrieves all points stored in the cKd-tree within the
bounds of the specified iso-oriented rectangle for the query range.</p>
      <p>a)</p>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <p>We compare cKd-tree against iKd-tree and 2-tree, with the latter being known for its eficiency
in compression and query execution. The iKd-tree and cKd-tree structures were implemented
in C++, while we utilized the 2-tree implementation from [12]1.</p>
      <p>We utilized synthetic two-dimensional datasets comprising points. The dataset sizes ranged
from 250, 000 to 100, 000, 000, distributed according to both Uniform and Gaussian distributions
in space. The datasets were generated within spaces (matrices) of dimensions 214 × 214 and
215 × 215.</p>
      <p>Figure 2 shows the compression percentages for both cKd-tree and 2-tree compact data
structures in two space scenarios (sizes 16, 384 × 16, 384 and 32, 768 × 32, 768, respectively).
The structures exhibit competitive compression percentages ranging from approximately 18%
to 90%. Both structures improve the compression performance when density increases. At
higher densities (≥ 12) and for Gaussian distribution (G in the charts), 2-tree marginally
outperforms cKd-tree. Conversely, at lower densities (≤ 6), cKd-tree outpaces 2-tree. The
enhanced compression performance of cKd-tree can be attributed to the spatial proximity of
points. In such scenarios, the resulting spiral codes of the points are generally smaller, leading
to a highly compressible sequence of integers by DACs.</p>
      <p>Figure 3 a) and b) displays the execution times of three structures in solving the point query
for Uniform and Gaussian distributions of points. As expected, 2-tree outperforms cKd-tree by
a significant margin (about 200 times faster), and iKd-tree also performs faster (11 times) than
cKd-tree. The advantage of 2-tree is attributed to its (log2 ) query execution time for one
point, while iKd-tree avoids the cost of decompressing points stored with DACs.</p>
      <p>Figure 3 c) and d) illustrates the execution times of range query over a matrix of size 32, 768 ×
32, 768 for the range of 3, 276 × 3, 276 over Uniform and Gaussian distribution. Notably,
cKdtree significantly outperforms 2-tree achieving a speedup of approximately 4.7 times. And,
as expected, iKd-tree outperforms cKd-tree in range query, albeit to a lesser extent, with an
average speedup of about 3 times for range queries of size 3, 276 × 3, 276. For the Gaussian
distribution 2-tree has better results, but it is still outperformed by cKd-tree, in a lesser extend.
The improved performance of 2-tree in this distribution is attributed to its optimal operation
1Available at https://github.com/simongog/sdsl-lite
a)
c)
0.012 cKD−tree</p>
      <p>iKD−tree
0.01 k2−tree
when points are concentrated in specific areas of space. This characteristic makes 2-tree
sensitive to point distribution, whereas both iKd-tree and cKd-tree exhibit similar behavior
across diferent distributions.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>We present a novel compact data structure named cKd-tree designed for representing an implicit
static Kd-tree. Through experimental evaluations against 2-tree and iKd-tree, our findings
reveal that cKd-tree achieves compression rates ranging from 45% to 77% depending on the
density. The percentage of compression is higher when the density increases.</p>
      <p>In terms of execution time for point queries, cKd-tree exhibits a slower performance compared
to 2-tree, as expected. However, cKd-tree showcases superior execution times for range queries,
significantly outperforming 2-tree, especially in scenarios with Uniform distribution where
cKd-tree is approximately 4.7 times faster. This continues to widen with increasing data density.</p>
      <p>cKd-tree extends the versatility of Kd-tree by providing a competitive alternative, enabling the
direct implementation of these algorithms in a compact form. As evidenced by our experiments,
cKd-tree exhibits superior performance compared to 2-tree, particularly when querying
aggregate data. This suggests its potential for enabling faster execution on more intricate queries,
including but not limited to  nearest neighbors and Pareto set calculations.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Bentley</surname>
          </string-name>
          ,
          <article-title>Multidimensional binary search trees used for associative searching</article-title>
          ,
          <source>Commun. ACM</source>
          <volume>18</volume>
          (
          <year>1975</year>
          )
          <fpage>509</fpage>
          -
          <lpage>517</lpage>
          . URL: https://doi.org/10.1145/361002.361007. doi:
          <volume>10</volume>
          .1145/ 361002.361007.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>H.</given-names>
            <surname>Samet</surname>
          </string-name>
          ,
          <article-title>The quadtree and related hierarchical data structures</article-title>
          ,
          <source>ACM Comput. Surv</source>
          .
          <volume>16</volume>
          (
          <year>1984</year>
          )
          <fpage>187</fpage>
          -
          <lpage>260</lpage>
          . URL: https://doi.org/10.1145/356924.356930. doi:
          <volume>10</volume>
          .1145/356924. 356930.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Brown</surname>
          </string-name>
          , Building a balanced
          <article-title>-d tree in ( log ) time</article-title>
          ,
          <source>Journal of Computer Graphics Techniques (JCGT) 4</source>
          (
          <year>2015</year>
          )
          <fpage>50</fpage>
          -
          <lpage>68</lpage>
          . URL: http://jcgt.org/published/0004/01/03/.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C.</given-names>
            <surname>SanJuan-Contreras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Gutiérrez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Martínez-Prieto</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          <article-title>Seco, cbik: A space-eficient data structure for spatial keyword queries, IEEE Access PP (</article-title>
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>1</lpage>
          . doi:
          <volume>10</volume>
          .1109/ ACCESS.
          <year>2020</year>
          .
          <volume>2997258</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>V.</given-names>
            <surname>Gaede</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Günther</surname>
          </string-name>
          ,
          <article-title>Multidimensional access methods</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>30</volume>
          (
          <year>1998</year>
          )
          <fpage>170</fpage>
          -
          <lpage>231</lpage>
          . URL: https://doi.org/10.1145/280277.280279. doi:
          <volume>10</volume>
          .1145/280277. 280279.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>H.</given-names>
            <surname>Samet</surname>
          </string-name>
          ,
          <article-title>The quadtree and related hierarchical data structures, ACM Computing Surveys (CSUR) 16 (</article-title>
          <year>1984</year>
          )
          <fpage>187</fpage>
          -
          <lpage>260</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>N.</given-names>
            <surname>Brisaboa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ladra</surname>
          </string-name>
          , G. Navarro,
          <article-title>Compact representation of web graphs with extended functionality</article-title>
          ,
          <source>Information Systems</source>
          <volume>39</volume>
          (
          <year>2014</year>
          )
          <fpage>152</fpage>
          -
          <lpage>174</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ladra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Paramá</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Silva</surname>
          </string-name>
          <string-name>
            <surname>Coira</surname>
          </string-name>
          ,
          <article-title>Scalable and queryable compressed storage structure for raster data</article-title>
          ,
          <source>Information Systems</source>
          (
          <year>2017</year>
          )
          <fpage>179</fpage>
          -
          <lpage>204</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Castro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Romero</surname>
          </string-name>
          , G. Gutiérrez,
          <string-name>
            <given-names>M.</given-names>
            <surname>Caniupán</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Quijada-Fuentes</surname>
          </string-name>
          ,
          <article-title>Eficient computation of the convex hull on sets of points stored in a k-tree compact data structure</article-title>
          ,
          <source>Knowledge and Information Systems</source>
          <volume>62</volume>
          (
          <year>2020</year>
          ).
          <source>doi:10.1007/s10115-020-01486-9.</source>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>N.</given-names>
            <surname>Brisaboa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ladra</surname>
          </string-name>
          , G. Navarro, Dacs:
          <article-title>Bringing direct access to variable-length codes</article-title>
          ,
          <source>Information Processing and Management</source>
          <volume>49</volume>
          (
          <year>2013</year>
          )
          <fpage>392</fpage>
          -
          <lpage>404</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>G.</given-names>
            <surname>Gutiérrez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Torres-Avilés</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Caniupán, ckd-tree: A compact kd-tree</article-title>
          ,
          <source>IEEE Access 12</source>
          (
          <year>2024</year>
          )
          <fpage>28666</fpage>
          -
          <lpage>28676</lpage>
          . doi:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2024</year>
          .
          <volume>3365054</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Gog</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Beller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mofat</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Petri, From theory to practice: Plug and play with succinct data structures</article-title>
          ,
          <source>in: 13th International Symposium on Experimental Algorithms</source>
          ,
          <source>(SEA</source>
          <year>2014</year>
          ),
          <year>2014</year>
          , pp.
          <fpage>326</fpage>
          -
          <lpage>337</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>