<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Investigating Binary Partition Power in Metric Query</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Richard</forename><surname>Connor</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">School of Computer Science</orgName>
								<orgName type="institution">University of St Andrews</orgName>
								<address>
									<addrLine>St Andrews</addrLine>
									<postCode>KY16 9SS</postCode>
									<country key="GB">Scotland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Alan</forename><surname>Dearle</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">School of Computer Science</orgName>
								<orgName type="institution">University of St Andrews</orgName>
								<address>
									<addrLine>St Andrews</addrLine>
									<postCode>KY16 9SS</postCode>
									<country key="GB">Scotland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Lucia</forename><surname>Vadicamo</surname></persName>
							<affiliation key="aff1">
								<orgName type="department">Institute of Information Science and Technologies</orgName>
								<orgName type="institution">CNR</orgName>
								<address>
									<settlement>Pisa</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Investigating Binary Partition Power in Metric Query</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">55609065D1658A445F49582815455FC5</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-23T20:31+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>similarity search</term>
					<term>metric indexing</term>
					<term>metric search</term>
					<term>exclusion power</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>It is generally understood that, as dimensionality increases, the minimum cost of metric query tends from 𝑂(log 𝑛) to 𝑂(𝑛) in both space and time, where 𝑛 is the size of the data set. With low dimensionality, the former is easy to achieve; with very high dimensionality, the latter is inevitable.</p><p>We previously described BitPart as a novel mechanism suitable for performing exact metric search in "high(er)" dimensions. The essential tradeoff of BitPart is that its space cost is linear with respect to the size of the data, but the actual space required for each object may be small as log 2 𝑛 bits, which allows even very large data sets to be queried using only main memory. Potentially the time cost still scales with 𝑂(log 𝑛). Together these attributes give exact search which outperforms indexing structures if dimensionality is within a certain range.</p><p>In this article, we reiterate the design of BitPart in this context. The novel contribution is an indepth examination of what the notion of "high(er)" means in practical terms. To do this we introduce the notion of exclusion power, and show its application to some generated data sets across different dimensions.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Searching a database for objects that are most similar to a query object is a fundamental task in many database application domains, for example data mining, knowledge discovery, information extraction, and machine learning <ref type="bibr" target="#b0">[1]</ref>.</p><p>In <ref type="bibr" target="#b1">[2]</ref> we described BitPart as a mechanism suitable for performing exact metric search in "high(er)" dimensions. While we did not define "high(er)", the evidence in the paper shows that it gives its best relative performance for spaces with dimensionality of between 10 and 30. BitPart's space cost is linear with respect to the size of the data, but the actual space required for each object may be small as log 2 𝑛 bits, which allows representations of very large data sets to fit in main memory. Potentially the time cost still scales with 𝑂(log 𝑛) time. Together these attributes give exact search which outperforms indexing structures as dimensionality increases within a certain range 1 . The BitPart mechanism uses a set of binary partitions defined over the finite data set, where each partition is represented by a bitmap. If the total set of bitmaps is perfectly balanced and perfectly orthogonal, then the probability of a selection of 𝑘 bitmaps being able to exclude an arbitrary element of the data is 1− 1 2 𝑘 , and the expected number of non-excluded values becomes very small if 𝑘 ≈ log 2 𝑛. While we can come close to achieving this for low-dimensional data, this becomes increasingly challenging as dimensionality increases.</p><p>In <ref type="bibr" target="#b1">[2]</ref> two issues are noted as requiring further investigation. The first is how to select appropriate pivots in order to find orthogonal exclusion partitions. The second is the interaction between partition balance and the efficacy of exclusion; this is the subject of this article.</p><p>To do this we introduce the notion of exclusion power, and show its application to some generated data sets across different dimensions. Exclusion power provides the ability to: fundamentally understand the exclusions that are possible in different datasets, to understand when one approach will outperform another and, to tune the BitPart algorithm to optimally accommodate data sets of different dimensions.</p><p>The novel contribution of this article is the quantification of exclusion power, which can be measured for a given binary partition structure with respect to a balancing factor 𝜏 . We observe that, in low dimensions, partitions are best balanced evenly, approximately splitting the search space into two equal parts. In higher dimensions however such balanced partitions perform very badly, and highly skewed partitions have much greater power. We are not aware of any previous work which attempts to quantify this effect, although it is well known within the "folklore" of metric search researchers.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Formal Context</head><p>Formally we are interested in searching a (large) finite set of objects 𝑆 which is a subset of an infinite set 𝑈 , where (𝑈, 𝑑) is a metric space <ref type="bibr" target="#b0">[1]</ref>. The general requirement is to efficiently find members of 𝑆 which are similar to an arbitrary member of 𝑈 given as a query, where the distance function 𝑑 gives the only way by which any two objects may be compared. The simplest type of similarity query is the range search query: for some threshold 𝑡, based on a query 𝑞 ∈ 𝑈 , the solution set is 𝒬(𝑞, 𝑡) = {𝑠 ∈ 𝑆| 𝑑(𝑞, 𝑠) ≤ 𝑡}.</p><p>The essence of metric search is to spend time pre-processing the finite set 𝑆 (i.e., indexing it) so that solutions to queries can be efficiently calculated. In all cases distances between members of 𝑆 and selected reference objects (pivot) are calculated during pre-processing. At query time the relative distances between the query and the same pivot objects can be used to make deductions about which data values may, or may not, be candidate solutions to the query.</p><p>Many metric indexes employ binary partitions of the space 𝑈 , i.e., partitions of the form 𝒫 = {𝒫 0 , 𝒫 1 } where 𝒫 0 ∪ 𝒫 1 = 𝑈 and 𝒫 0 ∩ 𝒫 1 = ∅. In the context of metric search, binary partitions are defined by partition rules that typically rely on computing the distance 𝑑 of the data objects to a set of pivots. Notable examples include ball and sheet partitioning (see, e.g., Figure <ref type="figure" target="#fig_1">1</ref>). A ball partition is defined by a pivot 𝑝 ∈ 𝑈 and a covering radius 𝜏 ∈ R + , so that</p><formula xml:id="formula_0">𝒫 0 = {𝑢 ∈ 𝑈 | 𝑑(𝑢, 𝑝) &gt; 𝜏 } 𝒫 1 = {𝑢 ∈ 𝑈 | 𝑑(𝑢, 𝑝) ≤ 𝜏 }</formula><p>A sheet partition is defined by two pivots 𝑝 1 , 𝑝 0 ∈ 𝑈 so that the boundary of the partition regions is the hyperplane equidistant from the two reference objects:</p><formula xml:id="formula_1">𝜏 𝒫 0 𝒫 1 𝑝 (a) Ball Partitioning 𝒫 0 𝒫 1 𝑝 1 𝑝 0 (b) Sheet Partitioning</formula><formula xml:id="formula_2">𝒫 0 = {𝑢 ∈ 𝑈 | 𝑑(𝑢, 𝑝 1 ) &gt; 𝑑(𝑢, 𝑝 0 )} 𝒫 1 = {𝑢 ∈ 𝑈 | 𝑑(𝑢, 𝑝 1 ) ≤ 𝑑(𝑢, 𝑝 0 )}</formula><p>Informally we refer to 𝒫 0 as "outside" and 𝒫 1 as "inside". Under certain circumstances it is possible to deduce that, for a given query, one of the subsets of the partition cannot (or conversely must) contain solutions to the query. Typically, the triangle inequality and the distances to the pivots are exploited to determine bounds on distances between data objects (distance constraints), which are used at query time for excluding certain regions from the searching process.</p><p>The exclusion of a region of a partition occurs when the query ball does not intersect the partition boundary. For a ball region, the condition for non-intersection of the query ball and the boundary is</p><formula xml:id="formula_3">|𝑑(𝑝, 𝑞) − 𝜏 | &gt; 𝑡<label>(1)</label></formula><p>For a sheet region, the non-intersection condition is</p><formula xml:id="formula_4">|𝑑(𝑝 1 , 𝑞) − 𝑑(𝑝 0 , 𝑞)| &gt; 2𝑡<label>(2)</label></formula><p>If the respective non-intersection condition occurs, the implication is that any possible solution to the query lies on one or other side of the partition boundary, and therefore either 𝒫 0 or 𝒫 1 may be excluded from the search for solutions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">The BitPart Mechanism</head><p>The core of the technique is to define a large set of binary partitions over the original space; the data is stored as a set of bitmaps according to containment with these regions. At query time, the query is assessed against this containment information, but without reference to the original data representation.</p><p>For each region, one of three possible conditions may be determined: (a) the solution set 𝒬(𝑞, 𝑡) must be fully contained in the region, or (b) there is no intersection between the region and the solution set, or (c) neither of these is the case. In either case (a) or (b), the containment information stored may be useful with respect to solving the query, and is used as part of the query computation. In case (c) however the containment information is of no value, and is not accessed as a part of the computation. This approach maximises the effectiveness of memory used for calculation against the original data representation: the amount of memory required depends on the cardinality of the original data, but not on the size of its individual objects.</p><formula xml:id="formula_5">𝑞 𝑡 𝑝 1 𝑝 2 𝑝 3 𝑝 4 𝑠 1 𝑠 2 𝑠 3 𝑠 3 𝑠 5 𝑠 6 𝒜 1 ℬ 1 𝒞 1 𝒟 1 ℰ 1 ℱ 1 (a) 𝑞 𝑡 (b) Point Partition bitmaps 𝒜 ℬ 𝒞 𝒟 ℰ ℱ 𝑠 1 1 1 0 1 1 0 𝑠 2 1 0 0 0 1 1 𝑠 3 0 0 0 0 0 0 𝑠 4 0 0 0 1 0 1 𝑠 5 0 1 0 1 0 0 𝑠 6 1 1 0 1 0 0 𝒬(𝑞, 𝑡) - 1 0 1 - 0 (c)</formula><p>An Example. Figure <ref type="figure" target="#fig_2">2</ref> shows a simple example within the 2D plane, comprising four reference objects 𝑝 1 to 𝑝 4 and a set of six regions defined by them. The regions are respectively: 𝒜 1 , the area to the left of the line between 𝑝 1 and 𝑝 2 ; ℬ 1 , the area above the line between 𝑝 1 and 𝑝 3 ; and the areas within the variously sized circles drawn around each 𝑝 1 to 𝑝 4 , labeled 𝒞 1 , 𝒟 1 , ℰ 1 and ℱ 1 respectively. Note that each regional boundary defines a binary partition of the total space of the form 𝒫 = {𝒫 1 , 𝒫 0 }, where 𝒫 0 = 𝑈 ∖ 𝒫 1 . Thus in Figure <ref type="figure" target="#fig_2">2</ref>, six binary partitions</p><formula xml:id="formula_6">(𝒜 = {𝒜 1 , 𝒜 0 }, ℬ = {ℬ 1 , ℬ 0 }, 𝒞 = {𝒞 1 , 𝒞 0 }, etc) are considered.</formula><p>For each partition 𝒫 we generate a bitmap of 𝑛 bits, where 𝑛 is the number of data points, which stores the logical containment information of each data object 𝑠 𝑖 , that is 1 if 𝑠 𝑖 ∈ 𝒫 1 and 0 if 𝑠 𝑖 / ∈ 𝒫 1 (Figure <ref type="figure" target="#fig_2">2c</ref>). Thus a bit being set (to 1) corresponds to that object being in the region and a member of 𝑃 1 .</p><p>Figure <ref type="figure" target="#fig_2">2a</ref> also shows a range query 𝑞 drawn with a threshold 𝑡. It can be seen that all solutions to this query must lie within the area highlighted in Figure <ref type="figure" target="#fig_2">2b</ref>. The circle around the query intersects with two regional boundaries (𝒜 1 and ℰ 1 ), and so no information is available with respect to these; however it is completely contained within two of the defined regions (ℬ 1 and 𝒟 1 ), and fails to intersect with the final two (𝒞 1 and ℱ 1 ). Such containment and intersection is derivable only from the measurement of distances between the query and the reference objects, the definition of the regions, and the search radius. Here for example the possible solution area shown is determined using only the four distance calculations 𝑑(𝑞, 𝑝 1 ).. 𝑑(𝑞, 𝑝 4 ). It can be seen that the only possible solutions to the query are those which match on all non-intersecting fields; in this case, the objects 𝑠 1 , 𝑠 5 and 𝑠 6 depicted in Figure <ref type="figure" target="#fig_2">2</ref>. This is equivalent to the Conjunctive Normal Form (CNF) expression ℬ ∧ ¬𝒞 ∧ 𝒟 ∧ ¬ℱ. This expression covers the set of all possible solutions to the query and hints at how the data can be stored using bit vectors.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Data structures</head><p>Before describing the query algorithm, we describe the data structures used and how they are initialised. A set of enumerated pivots 𝑝 1 to 𝑝 𝑚 is selected from 𝑆. We define a set of binary partitions within 𝑈 , each of the form 𝒫 = {𝒫 0 , 𝒫 1 }, defined using the distance function 𝑑. There may be many more regions than reference objects; for example a set of 𝑚 reference objects defines at least (︀ 𝑚</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>2</head><p>)︀ sheet regions, plus 𝑚 ball regions. We now define the notion of an exclusion zone (EZ) as a containment map of 𝑆 based on a given partition; this is the information we will use at query time to perform exclusions and derive a candidate set of solutions. We impose an ordering on 𝑆, then for each 𝑠 𝑖 map whether it is a member of the region 𝒫 1 or otherwise. This logical containment information is best stored in a bitmap of 𝑛 bits, where 𝑛 = |𝑆|. One such exclusion zone is generated per partition.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Overview of the Query Algorithm</head><p>The query process comprises three distinct phases:</p><p>Phase 1 in which the query is checked against the regional definitions, and the two types of regions with non-intersecting boundaries are identified. Initially, the distance from the query 𝑞 to each pivot 𝑝 𝑖 is measured. For each partition, it can be established if the query ball intersects with the boundary of the partition. If there is no intersection, any solution to the query must lie on one or other side of the partition's boundary, so the exclusion zone is brought into the query calculation in one of two sets depending on whether the query solutions are fully contained within 𝒫 1 or 𝒫 0 . We name these sets of bitmaps 𝐵 1 and 𝐵 0 , which are carried forwards to the computation in Phase 2.</p><p>Phase 2 in which the regions identified are used to conduct a series of logical operations, thus identifying a set of candidate solutions for the query.</p><p>The second phase comprises the manipulation of the bitmaps deriving from the first phase to identify a set of candidate solutions. This may be efficiently achieved by a series of bitwise operations over these bitmaps <ref type="foot" target="#foot_0">2</ref> . It can be seen now that any solution is guaranteed to be identifiable from the bitmap deriving from the bitwise calculation:</p><formula xml:id="formula_7">⎛ ⎝ ⋀︁ 𝑏∈𝐵 1 𝑏 ⎞ ⎠ ∧ ⎛ ⎝ ¬ ⎛ ⎝ ⋁︁ 𝑏∈𝐵 0 𝑏 ⎞ ⎠ ⎞ ⎠<label>(3)</label></formula><p>Phase 3 in which the candidate solutions are checked against the original data set to identify the exact solution to the query. This requires a sequential scan of the single bitmap resulting from Phase 2 and checking the data corresponding to any remaining set bits using the original space and distance metric in order to produce an exact solution to the query.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Partition analysis</head><p>To unify our treatment of different kinds of binary partitions with their associated distance constraints, we introduce the concept of a binary partition characterised by a single partition function 𝑓 : 𝑈 → R and a balancing factor 𝜏 ∈ R with the following properties: • if 𝑓 (𝑞) ≤ 𝜏 − 𝑡 then 𝒬(𝑞, 𝑡) ⊂ 𝒫 1 and 𝒫 0 can be excluded • if 𝑓 (𝑞) &gt; 𝜏 + 𝑡 then 𝒬(𝑞, 𝑡) ⊂ 𝒫 0 and 𝒫 1 can be excluded Note that different functions 𝑓 may be used to determine the same regions 𝒫 0 , 𝒫 1 but with different distance lower-bounds, and thus different exclusion rules. For example, let's consider the Euclidean space (R 𝑛 , ℓ 2 ), 𝜏 = 0, 𝑓 1 (𝑠) = 𝑑(𝑠,𝑝 1 )−𝑑(𝑠,𝑝 0 ) 2 and 𝑓 2 (𝑠) = 𝑑(𝑠,𝑝 1 ) 2 −𝑑(𝑠,𝑝 0 ) 2 2𝑑(𝑝 1 ,𝑝 0 ) for two fixed pivot 𝑝 1 and 𝑝 0 . Both the functions 𝑓 1 and 𝑓 2 produce the same sheet partitioning, i.e., 𝒫 1 = {𝑠 ∈ R 𝑛 |𝑑(𝑠, 𝑝 1 ) ≤ 𝑑(𝑠, 𝑝 0 )} and 𝒫 0 = {𝑠 ∈ R 𝑛 |𝑑(𝑠, 𝑝 1 ) &gt; 𝑑(𝑠, 𝑝 0 )}. However, in <ref type="bibr" target="#b6">[7]</ref> we proved that the exclusion rule determined by 𝑓 1 (Hyperbolic exclusion) is weaker than the one determined by 𝑓 2 (Hilbert Exclusion), and the latter is valid only on the large class of Supermetric Spaces <ref type="bibr" target="#b7">[8]</ref>.</p><p>In the following we show how general ball partitioning and sheet partitioning can be described within this formalism. Sheet Partitions Sheet partitions are defined with respect to two reference points 𝑝 1 and 𝑝 0 . Usually, a simple binary partition is defined according to whichever of 𝑝 1 and 𝑝 0 is the nearer value with respect to the metric 𝑑. However it has been noted <ref type="bibr" target="#b8">[9,</ref><ref type="bibr" target="#b9">10]</ref> that such partitions may also be subject to a balancing factor 𝛼, so that</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Ball Partitions</head><formula xml:id="formula_8">𝒫 1 = {𝑠 ∈ 𝑈 | 𝑑(𝑠, 𝑝 1 ) ≤ 𝑑(𝑠, 𝑝 0 ) + 𝛼},𝒫 0 = {𝑠 ∈ 𝑈 | 𝑑(𝑠, 𝑝 1 ) &gt; 𝑑(𝑠, 𝑝 0 ) + 𝛼}</formula><p>In this case, we can use 𝑓 (𝑠) = (𝑑(𝑠, 𝑝 1 ) − 𝑑(𝑠, 𝑝 0 ))/2 which gives property 1 above for the constant 𝜏 = 𝛼/2. Moreover from the triangle inequality of the space we have that 𝑑(𝑠, 𝑝 1 ) ≤ 𝑑(𝑠, 𝑠 ′ ) + 𝑑(𝑠 ′ , 𝑝 1 ) and 𝑑(𝑠 ′ , 𝑝 0 ) ≤ 𝑑(𝑠, 𝑠 ′ ) + 𝑑(𝑠, 𝑝 0 ) for any 𝑠, 𝑠 ′ ∈ 𝑈 , which gives 𝑑(𝑠, 𝑠 ′ ) ≥ 𝑑(𝑠,𝑝 1 )−𝑑(𝑠,𝑝 0 )−𝑑(𝑠 ′ ,𝑝 1 )+𝑑(𝑠 ′ ,𝑝 0 ) 2 = 𝑓 (𝑠) − 𝑓 (𝑠 ′ ). By symmetry, we also have 𝑑(𝑠, 𝑠 ′ ) &gt; 𝑓 (𝑠 ′ ) − 𝑓 (𝑠), therefore 𝑑(𝑠, 𝑠 ′ ) &gt; |𝑓 (𝑠) − 𝑓 (𝑠 ′ )|.</p><p>Like ball partitions, the balance of the partition is affected by selecting different values for 𝜏 .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Exclusion power</head><p>The exclusion power of a partition gives a quantification of the effectiveness of that partition with respect to a given metric search task. It is clear that this is likely to be quite different depending on the partition type, the selection of reference objects, and the chosen value 𝜏 . The meaning of exclusion power is based on the probability, for arbitrarily selected objects 𝑞 and 𝑠, and a distance 𝑡, of being able to demonstrate that 𝑑(𝑞, 𝑠) &gt; 𝑡. As we will show, for low dimensional spaces the maximum exclusion power of any partition structure is likely to coincide with a value of 𝜏 which causes the partition to bisect the finite search space 𝑆, i.e. |𝒫 0 | ≈ |𝒫 1 |. However as we will see, in higher dimensional spaces this may be a very bad choice.</p><p>For a range query 𝒬(𝑞, 𝑡) we define the exclusion power of the partition 𝒫 = {𝒫 0 , 𝒫 1 } as the probability of excluding one generic element 𝑠 on the basis only of the data partition to which it belongs:</p><formula xml:id="formula_9">𝑃 (𝑠 ∈ 𝒫 0 ) • 𝑃 (𝒬(𝑞, 𝑡) ∩ 𝒫 0 = ∅) + 𝑃 (𝑠 ∈ 𝒫 1 ) • 𝑃 (𝒬(𝑞, 𝑡) ∩ 𝒫 1 = ∅) (4) = 𝑃 (𝑠 ∈ 𝒫 0 ) • 𝑃 (𝒬(𝑞, 𝑡) ⊂ 𝒫 1 ) + 𝑃 (𝑠 ∈ 𝒫 1 ) • 𝑃 (𝒬(𝑞, 𝑡) ⊂ 𝒫 0 )<label>(5)</label></formula><p>The exact calculation of the probabilities 𝑃 (𝒬(𝑞, 𝑡) ⊂ 𝒫 𝑖 ), for 𝑖 = 0 or 1, is not straightforward for a general binary partition in a metric space, however, if we know that the partition can be characterised by a function 𝑓 and a balancing factor 𝜏 , we can estimate a lower-bound of these probabilities <ref type="foot" target="#foot_1">3</ref> and thus obtain an approximation of the exclusion power as</p><formula xml:id="formula_10">𝑃 (𝑓 (𝑠) &gt; 𝜏 ) • 𝑃 (𝑓 (𝑞) ≤ 𝜏 − 𝑡) + 𝑃 (𝑓 (𝑠) ≤ 𝜏 ) • 𝑃 (𝑓 (𝑞) &gt; 𝜏 + 𝑡)<label>(6)</label></formula><p>Let CDF(𝑥) be the cumulative distribution function of 𝑓 (𝑠) for 𝑠 ∈ 𝑆 then the exclusion power can be expressed as</p><formula xml:id="formula_11">𝑔(𝜏, 𝑡) = (1 − CDF(𝜏 )) • CDF(𝜏 − 𝑡) + CDF(𝜏 ) • (1 − CDF(𝜏 + 𝑡))<label>(7)</label></formula><p>where 𝜏 is the partition balancing factor and 𝑡 in the query distance threshold Estimates of exclusion power can be made with respect to a representative sample of the data set, and indeed of the query set if that is different 4 . A large enough sample will give a good estimate of the probability of an individual datum falling within each of the subsets.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Power graphs</head><p>To understand the effect of different values of 𝜏 an exclusion power graph may be constructed which is plotted across the range of 𝜏 for a fixed value of 𝑡. This allows the optimum value of 𝜏 to be deduced for a range query with threshold 𝑡.</p><p>We illustrate the construction of power graphs for 5 dimensional data in Section 5.1 and for 20 dimensional data in 5.2.  Figure <ref type="figure" target="#fig_5">3</ref> shows histograms of two sets of sampled distances from a randomly generated five-dimensional Euclidean space. The left-hand side of the figure shows the distribution of 5,000 witness points in terms of distance from a randomly selected pivot point, 𝑝. The right-hand side shows the distribution of the 5NN (fifth nearest-neighbour) distances for a set of 1,000 queries; this distance representing a query threshold necessary to return a small proportion of the data, and thus useful values for 𝑡.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.">Example</head><p>Based on this information, we will pick example values of 𝜏 ^= 0.89 and 𝑡 ^= 0.19, being the median values in the distributions of 𝜏 and 𝑡 values, respectively. We can now consider the exclusion power of these values applied to the partition thus formed, centered around 𝑝.</p><p>The left-hand side of Figure <ref type="figure">4</ref> shows the Cumulative Probability Density function (CDF) with the X-axis values of 𝜏 ^, 𝜏 ^− 𝑡 ^, and 𝜏 ^+ 𝑡 ^highlighted. Given that 𝜏 ^has been selected as the median distance from 𝑝, the value of the CDF at 𝜏 ^is 0.5. Therefore if an exclusion occurs, half of the data set will be excluded and its exclusion power is:</p><formula xml:id="formula_13">CDF(𝜏 ^− 𝑡 ^) • 0.5 + (1 − CDF(𝜏 ^+ 𝑡 ^)) • 0.5.</formula><p>The more general form of this equation (i.e., Equation <ref type="formula" target="#formula_11">7</ref>) hints how the exclusion power can be displayed for a single partition, as shown on the right-hand side of Figure <ref type="figure">4</ref>. Here, for a fixed value of 𝑡 (again 0.19), the power of the partition is calculated over the whole range of distances that have been measured between the witness points and the point 𝑝. As can be seen, the maximum power is achieved when the partition is split at the median distance, i.e. 𝜏 ^= 0.89 in this example. With this partition, and this value of 𝜏 , the probability of excluding an arbitrary datum for an arbitrary query is around 0.21.  left-hand side of Figure <ref type="figure" target="#fig_11">5</ref> shows a histogram of 5,000 distances measured with respect to randomly selected element 𝑝 of a 20-dimensional uniformly generated Euclidean space. The right-hand side shows a histogram (for 1,000 queries selected from the same set) of the distances to the 5th-nearest neighbour with respect to this same set of 5,000 values. Figure <ref type="figure" target="#fig_8">6</ref> shows the same information in the form of cumulative probability density functions. The left-hand side of Figure <ref type="figure" target="#fig_8">6</ref> shows the CDF function superimposed with a line corresponding to the medium value for 𝜏 . The value of t for this data is 1.07.</p><p>Finally, although we have no space here to provide in-depth analysis, Figure <ref type="figure" target="#fig_9">7</ref> shows exclusion power graphs using sheet partitions over both 5D and 20D spaces. It can be seen that the graph patterns are quite similar.   As can be seen in Figure <ref type="figure" target="#fig_8">6</ref> such a value is unlikely to result in any successful exclusions. These diagrams explain behaviour observed by many researchers into metric search that choosing unbalanced indexing structures often works better for higher dimensional data. The establishment of a unified partition interface using 𝑓 and 𝜏 allows similar graphs to be produced for sheet partitions (e.g., Figure <ref type="figure" target="#fig_9">7</ref>). The mechanistic value of these graphs is that the value or values of 𝜏 corresponding to the maxiumum power can be automatically extracted on a per-partition basis, and used in an implementation of BitPart. In the case of low dimensional data a single offset can be used for each ball where as for high(er) dimensional data two offsets can be used. Furthermore, as described earlier a particular implementation can use as few or as many exclusion zones that as necessary, as may be predicted by the power value at the optimum value.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3.">Use of Exclusion Power analysis</head><p>In Section 6 these offsets are applied to a BitPart implementation to demonstrate their effect.  In order to validate the above results we ran experiments against a BitPart implementation<ref type="foot" target="#foot_2">5</ref> using uniformly generated data in 5 and 20 dimensions with Euclidean distance. Each contained one million data points, and 1000 queries were executed. In all experiments fixed threshold distances 𝑡 were used and set to values which correspond to one-millionth of the data volume. In order to measure exclusion power, the 1000 queries were run 25 times for differing values of 𝜏 for both sheet and ball partitions. The 5D data uses 20 pivot points and the 20D data 100.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Experimental Validation using BitPart</head><p>In <ref type="bibr" target="#b1">[2]</ref> we described how a selection of 𝑚 reference points allows for (︀ 𝑚</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>2</head><p>)︀ sheet and 𝑚 ball partitions to be produced. After the above analysis, we used this number for the 5D space, but in the 20D space we generated twice that number, corresponding to the twin peaks in the power graphs. Note that this does not require any extra distance calculations at query time, as the number of reference points is not increased. Furthermore there is not necessarily any increase in memory usage, as partitions are typically stored in secondary memory and only fetched when the query ball does not intersect with the partition, which requires only the evaluation of the 𝑓 function to determine. For each experiment a suitable range is calculated for 𝜏 for both balls and sheets, ensuring that for all partitions both 𝒫 0 and 𝒫 1 are non-empty, and the whole calculation was executed with different values of 𝜏 across this range.</p><p>Results are shown in Figure <ref type="figure" target="#fig_12">8</ref>. In order to visualize these results, the number of residual (i.e. Phase 3) distance calculations per query was measured, this number giving a good proxy for the value of the exclusion mechanism. The plots show the inverse of these counts. Peaks similar to those shown in the power graphs in Figures <ref type="figure" target="#fig_10">4 and 6</ref> can be seen in these experimental results.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>SEBD 2022 :</head><label>2022</label><figDesc>The 30th Italian Symposium on Advanced Database Systems, June 19-22, 2022, Tirrenia (PI), Italy rchc@st-andrews.ac.uk (R. Connor); al@st-andrews.ac.uk (A. Dearle); lucia.vadicamo@isti.cnr.it (L. Vadicamo) 0000-0003-4734-8103 (R. Connor); 0000-0002-1157-2421 (A. Dearle); 0000-0001-7182-7038 (L. Vadicamo)</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Examples of a Ball Partitioning defined by a pivot 𝑝 ∈ 𝑈 and a covering radius 𝜏 ∈ R + , and a Sheet Partitioning associated to two pivots 𝑝 1 , 𝑝 0 ∈ 𝑈 .</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: (a) A partitioning of a data space along with a query 𝑞 with threshold 𝑡; (b) the regions containing the solution set; (c) the data representation according to containment of regions.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Property 1 :Property 2 :</head><label>12</label><figDesc>𝒫 0 = {𝑠 ∈ 𝑈 |𝑓 (𝑠) &gt; 𝜏 } and 𝒫 1 = {𝑠 ∈ 𝑈 |𝑓 (𝑠) ≤ 𝜏 } 𝑑(𝑠, 𝑠 ′ ) ≥ |𝑓 (𝑠) − 𝑓 (𝑠 ′ )| for all 𝑠, 𝑠 ′ ∈ 𝑈 (the distance lower-bound property)Note that if 𝑓 (𝑠) = 𝜏 , then 𝑠 is on the partition boundary and by convention we include the partition boundary in 𝒫 1 . The function 𝑓 is used both for determining the regions 𝒫 0 , 𝒫 1 , and for deriving the exclusion rules to be used at query time. By exploiting the distance lower-bound provided by the function 𝑓 , we have that |𝑓 (𝑞) − 𝜏 | &gt; 𝑡 is a sufficient condition to exclude one of the two partitions from the search. Specifically,</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head></head><label></label><figDesc>A ball partition can be characterised by the function 𝑓 (𝑠) = 𝑑(𝑠, 𝑝) and the constant 𝜏 . In this case 𝒫 1 = {𝑠 ∈ 𝑈 |𝑓 (𝑠) = 𝑑(𝑠, 𝑝) ≤ 𝜏 }, and 𝒫 0 = {𝑠 ∈ 𝑈 |𝑓 (𝑠) = 𝑑(𝑠, 𝑝) &gt; 𝜏 }, so Property 1 is clear. Properties 2 follows from the triangle inequality: 𝑑(𝑠, 𝑝) ≤ 𝑑(𝑠, 𝑠 ′ ) + 𝑑(𝑠 ′ , 𝑝), and 𝑑(𝑠 ′ , 𝑝) ≤ 𝑑(𝑠 ′ , 𝑠) + 𝑑(𝑠, 𝑝) which give that 𝑑(𝑠, 𝑠 ′ ) ≥ |𝑑(𝑠, 𝑝) − 𝑑(𝑠 ′ , 𝑝)| = |𝑓 (𝑠) − 𝑓 (𝑠 ′ )| for any 𝑠, 𝑠 ′ ∈ 𝑈 .The balance of the partition is affected by selecting different values for 𝜏 which we refer to as different balances.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Typical inter-point and query threshold distances from 5-dimensional Euclidean data</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head></head><label></label><figDesc>CDF of 𝑓 (𝑠) = 𝑑(𝑠, 𝑝) varying 𝑠 Exclusion power graph over 𝜏 (fixing 𝑡 = 𝑡 ^). The exclusion power for 𝜏 = 𝜏 îs 0.23.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_7"><head>Figure 4 :Figure 5 :</head><label>45</label><figDesc>Figure 4: CDF and Power Graph for 5-dimensional data</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_8"><head>Figure 6 :</head><label>6</label><figDesc>Figure 6: CDF and Power Graph for 20-dimensional data</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_9"><head>Figure 7 :</head><label>7</label><figDesc>Figure 7: Exclusion Power graphs for sheet partitions in 5 and 20 dimensional Euclidean space</figDesc><graphic coords="10,168.22,278.60,145.85,118.96" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_10"><head>Figures 4 and 6</head><label>6</label><figDesc>Figures4 and 6demonstrate the exclusion power in 5 and 20 Dimensions respectively. In the 5 dimensional case we can observe that setting a 𝜏 ^value of 0.89 (close to the mean of the distribution of measured distances) will work extremely well. With such a value the exclusion power is maximised. By contrast, choosing the mean in the case of 20D data will work very badly. As can be seen in Figure6such a value is unlikely to result in any successful exclusions. These diagrams explain behaviour observed by many researchers into metric search that choosing unbalanced indexing structures often works better for higher dimensional data.The establishment of a unified partition interface using 𝑓 and 𝜏 allows similar graphs to be produced for sheet partitions (e.g., Figure7). The mechanistic value of these graphs is that the value or values of 𝜏 corresponding to the maxiumum power can be automatically extracted on a per-partition basis, and used in an implementation of BitPart. In the case of low dimensional data a single offset can be used for each ball where as for high(er) dimensional data two offsets can be used. Furthermore, as described earlier a particular implementation can use as few or as many exclusion zones that as necessary, as may be predicted by the power value at the optimum</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_11"><head>(a) 5</head><label>5</label><figDesc>dimensional space (b) 20 dimensional space</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_12"><head>Figure 8 :</head><label>8</label><figDesc>Figure 8: Empirical Inverse distances per query for 5D and 20D data for different values of 𝜏 . These charts plot both ball and sheet partitions, which have different ranges of 𝜏 ; we have therefore normalised these within the offset axis. The Y axis shows the inverse of the number of residual distance computations, i.e. those not excluded by use of the BitPart mechanism (higher is better).</figDesc><graphic coords="11,114.01,168.43,179.17,134.34" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>: single-pivot ball partitioning over 5 dimensional data</head><label></label><figDesc></figDesc><table><row><cell></cell><cell>0.09</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>0.2</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell>0.08</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>0.18</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Frequency</cell><cell>0.05 0.06 0.07</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>Frequency</cell><cell>0.12 0.14 0.16</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Normalized</cell><cell>0.02 0.03 0.04</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>Normalized</cell><cell>0.04 0.06 0.08 0.1</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell>0.01</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>0.02</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell>0</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>0</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell>0.2</cell><cell>0.4</cell><cell>0.6</cell><cell>0.8</cell><cell>1</cell><cell>1.2</cell><cell>1.4</cell><cell>1.6</cell><cell>0.14</cell><cell>0.16</cell><cell>0.18</cell><cell>0.2</cell><cell>0.22</cell><cell>0.24</cell><cell>0.26</cell><cell>0.28</cell></row><row><cell></cell><cell></cell><cell cols="6">Distance from ball centre</cell><cell></cell><cell cols="8">Distance of a query from its 5NN</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>(a)</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_0">On modern processors many of these bitwise operations can be performed in single instructions, each of which may be performed in parallel.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_1">In general, 𝑓 (𝑞) ≤ 𝜏 − 𝑡 ⇒ 𝒬(𝑞, 𝑡) ⊂ 𝒫1, therefore 𝑃 (𝑓 (𝑞) ≤ 𝜏 − 𝑡) ≤ 𝑃 (𝒬(𝑞, 𝑡) ⊂ 𝒫1). Similarly, 𝑓 (𝑞) &gt; 𝜏 + 𝑡 ⇒ 𝒬(𝑞, 𝑡) ⊂ 𝒫0, and thus 𝑃 (𝑓 (𝑞) &gt; 𝜏 + 𝑡) ≤ 𝑃 (𝒬(𝑞, 𝑡) ⊂ 𝒫0). The equality holds for some cases, e.g., the ball partitioning, where it can be proved that 𝑓 (𝑞) ≤ 𝜏 − 𝑡 ⇔ 𝒬(𝑞, 𝑡) ⊂ 𝒫1 and 𝑓 (𝑞) &gt; 𝜏 + 𝑡 ⇔ 𝒬(𝑞, 𝑡) ⊂ 𝒫𝑜.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_2">The experimental code may be found in https://bitbucket.org/richardconnor/metricbitblaster.git</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Conclusions</head><p>In this paper, we have reiterated the design of the BitPart indexing and query mechanisms. The novel contribution of this paper is an exposition of how dimensionality effects the efficacy of this and other metric search mechanisms. We have demonstrated how the distribution of distances changes with the dimensionality of the data and how in turn this affects the ability to perform effective exclusions and thus efficient queries. In particular we have explored the notion of balance and how that affects exclusion. To do this we have introduced the notion of exclusion power, and shown its application to some Euclidean data sets in different dimensions.</p><p>This analysis gives fundamental insights into what data can be queried effectively and what cannot. Furthermore it demonstrates how index and search algorithms can be tuned to compensate for data sets of arbitrary dimensions.</p><p>We have demonstrated that the phenomena explored in mathematical terms are manifested in synthetically generated data sets and queries over them using an implementation of BitPart. We believe that the results in this paper may be applied to other metric search and indexing mechanisms. Our future plans include investigating these phenomena in real life data sets and exploring the outstanding issue of partition orthogonality.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">Similarity search: the metric space approach</title>
		<author>
			<persName><forename type="first">P</forename><surname>Zezula</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Amato</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Dohnal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Batko</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2006">2006</date>
			<publisher>Springer Science &amp; Business Media</publisher>
			<biblScope unit="volume">32</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Bitpart: Exact metric search in high(er) dimensions</title>
		<author>
			<persName><forename type="first">A</forename><surname>Dearle</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Connor</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Information Systems</title>
		<imprint>
			<biblScope unit="volume">95</biblScope>
			<biblScope unit="page">101493</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Indexability, concentration, and vc theory</title>
		<author>
			<persName><forename type="first">V</forename><surname>Pestov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Discrete Algorithms</title>
		<imprint>
			<biblScope unit="volume">13</biblScope>
			<biblScope unit="page" from="2" to="18" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">A quantitative analysis and performance study for similaritysearch methods in high-dimensional spaces</title>
		<author>
			<persName><forename type="first">R</forename><surname>Weber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H.-J</forename><surname>Schek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Blott</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of 24th VLDB</title>
				<meeting>of 24th VLDB</meeting>
		<imprint>
			<publisher>Morgan Kaufmann</publisher>
			<date type="published" when="1998">1998</date>
			<biblScope unit="volume">98</biblScope>
			<biblScope unit="page" from="194" to="205" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">MI-File: Using inverted files for scalable approximate similarity search</title>
		<author>
			<persName><forename type="first">G</forename><surname>Amato</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Gennaro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Savino</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Multimedia Tools and Applications</title>
		<imprint>
			<biblScope unit="page" from="1333" to="1362" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Metric index: An efficient and scalable solution for precise and approximate similarity search</title>
		<author>
			<persName><forename type="first">D</forename><surname>Novak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Batko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Zezula</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Information Systems</title>
		<imprint>
			<biblScope unit="volume">36</biblScope>
			<biblScope unit="page" from="721" to="733" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Hilbert exclusion: improved metric search through finite isometric embeddings</title>
		<author>
			<persName><forename type="first">R</forename><surname>Connor</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">A</forename><surname>Cardillo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Vadicamo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Rabitti</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Transactions on Information Systems (TOIS)</title>
		<imprint>
			<biblScope unit="volume">35</biblScope>
			<biblScope unit="page" from="1" to="27" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Supermetric search</title>
		<author>
			<persName><forename type="first">R</forename><surname>Connor</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Vadicamo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">A</forename><surname>Cardillo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Rabitti</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Information Systems</title>
		<imprint>
			<biblScope unit="volume">80</biblScope>
			<biblScope unit="page" from="108" to="123" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Reference point hyperplane trees</title>
		<author>
			<persName><forename type="first">R</forename><surname>Connor</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Similarity Search and Applications</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="65" to="78" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">On applications of parameterized hyperplane partitioning</title>
		<author>
			<persName><forename type="first">J</forename><surname>Lokoč</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Skopal</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Third International Conference on SImilarity Search and APplications</title>
				<meeting>the Third International Conference on SImilarity Search and APplications</meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="131" to="132" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
