<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Towards Practical Visual Search Engine Within Elasticsearch</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Matthew</forename><surname>Mu</surname></persName>
							<email>matthew.mu@jet.com</email>
						</author>
						<author>
							<persName><forename type="first">Raymond</forename><surname>Zhao</surname></persName>
							<email>raymond@jet.com</email>
						</author>
						<author>
							<persName><forename type="first">Guang</forename><surname>Yang</surname></persName>
						</author>
						<author>
							<persName><forename type="first">Jing</forename><surname>Zhang</surname></persName>
						</author>
						<author>
							<persName><forename type="first">John</forename><surname>Yan</surname></persName>
						</author>
						<author>
							<affiliation key="aff0">
								<orgName type="institution" key="instit1">Jet.com</orgName>
								<orgName type="institution" key="instit2">Walmart Labs</orgName>
								<address>
									<settlement>Hoboken</settlement>
									<region>NJ</region>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff1">
								<orgName type="institution" key="instit1">Jet.com</orgName>
								<orgName type="institution" key="instit2">Walmart Labs</orgName>
								<address>
									<settlement>Hoboken</settlement>
									<region>NJ</region>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff2">
								<orgName type="institution" key="instit1">Jet.com</orgName>
								<orgName type="institution" key="instit2">Walmart Labs</orgName>
								<address>
									<settlement>Hoboken</settlement>
									<region>NJ</region>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff3">
								<orgName type="institution" key="instit1">Jet.com</orgName>
								<orgName type="institution" key="instit2">Walmart Labs</orgName>
								<address>
									<settlement>Hoboken</settlement>
									<region>NJ</region>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff4">
								<orgName type="institution" key="instit1">Jet.com</orgName>
								<orgName type="institution" key="instit2">Walmart Labs</orgName>
								<address>
									<settlement>Hoboken</settlement>
									<region>NJ</region>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff5">
								<address>
									<settlement>Ann Arbor</settlement>
									<region>Michigan</region>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Towards Practical Visual Search Engine Within Elasticsearch</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">3BB7BE483E142A1A0197063CF9E30B51</idno>
					<note type="submission">10 &quot;query_encoded_token_1&quot; 11 } 12 }, 13 &quot;weight&quot;: 1 14 },</note>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T06:32+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Elasticsearch</term>
					<term>visual search</term>
					<term>content-based image retrieval</term>
					<term>multimodal search</term>
					<term>eCommerce &quot;sum&quot;</term>
					<term>27 &quot;custom_scripts&quot;</term>
					<term>38 &quot;source&quot;: &quot;negative_euclidean_distance&quot;</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In this paper, we describe our end-to-end content-based image retrieval system built upon Elasticsearch, a well-known and popular textual search engine. As far as we know, this is the first time such a system has been implemented in eCommerce, and our efforts have turned out to be highly worthwhile. We end up with a novel and exciting visual search solution that is extremely easy to be deployed, distributed, scaled and monitored in a cost-friendly manner. Moreover, our platform is intrinsically flexible in supporting multimodal searches, where visual and textual information can be jointly leveraged in retrieval.</p><p>The core idea is to encode image feature vectors into a collection of string tokens in a way such that closer vectors will share more string tokens in common. By doing that, we can utilize Elasticsearch to efficiently retrieve similar images based on similarities within encoded sting tokens. As part of the development, we propose a novel vector to string encoding method, which is shown to substantially outperform the previous ones in terms of both precision and latency.</p><p>First-hand experiences in implementing this Elasticsearchbased platform are extensively addressed, which should be valuable to practitioners also interested in building visual search engine on top of Elasticsearch.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">INTRODUCTION</head><p>Elasticsearch <ref type="bibr">[22]</ref>, built on top of Apache Lucene library <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b14">15,</ref><ref type="bibr" target="#b31">38]</ref>, is an open-source, real-time, distributed and multi-tenant textual search engine. Since its first release in February 2010, Elasticsearch has been widely adopted by eCommerce websites (e.g., Ebay, Etsy, Jet, Netflix, Grubhub) to successfully help customers discover products based on the textual queries they requested <ref type="bibr" target="#b12">[13,</ref><ref type="bibr" target="#b50">57]</ref> .</p><p>But a picture is more than often worth a thousand words. With the explosive usage of phone cameras, content-based image retrieval <ref type="bibr" target="#b15">[16]</ref> is increasingly demanded from customers. Especially for categories like furniture, fashion and lifestyle (where buying decisions are largely influenced by products' visual appealingness), uploading a picture of the product they like could be substantially more specific, expressive and straightforward than elaborating it into abstract textual description.</p><p>Finding images relevant with the uploaded picture tends to be much more involved and vaguer than retrieving documents matching keywords <ref type="bibr" target="#b38">[45,</ref><ref type="bibr" target="#b41">48,</ref><ref type="bibr" target="#b51">58]</ref> typed into the search box, as words (by themselves) are substantially more semantic and meaningful than image pixel values. Fortunately, modern AI techniques, especially the ones developed in the field of deep learning <ref type="bibr" target="#b2">[3,</ref><ref type="bibr">21]</ref>, have made incredible strides in image feature extraction <ref type="bibr">[17, 32, 39, 42-44, 59, 60]</ref> to embed images as points in high-dimensional Euclidean space, where similar images are located nearby. So, given a query image, we can simply retrieve its visually similar images by finding its nearest neighbors in this high-dimensional feature space. However, Elasticsearch, as an inverted-index-based search engine, is not much empowered to accomplish this mathematically straightforward operation in an efficient manner (though efforts <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b6">7,</ref><ref type="bibr">19,</ref><ref type="bibr" target="#b23">30,</ref><ref type="bibr" target="#b29">36]</ref> have been made successfully in finding nearest neighbors over spaces of much lower dimension), which significantly limits the applicability of its nicely designed engineering system as well as the huge volume of product metadata already indexed into its database (for textual search). The gist of the paper is to conquer this difficulty, and thus make it feasible to conduct visual search within Elasticsearch.</p><p>In this paper, we describe our end-to-end visual search platform built upon Elasticsearch. As far as we know, this is the first attempt to achieve this goal and our efforts turn out to be quite worthwhile. By taking advantage of the mature engineering design from Elasticsearch, we end up with a visual search solution that is extremely easy to be deployed, distributed, scaled and monitored. Moreover, due to Elasticsearch's disk-based (and partially memory cached) inverted index mechanism, our system is quite cost-effective. In contrast to many existing systems (using hashing-based <ref type="bibr" target="#b1">[2,</ref><ref type="bibr">20,</ref><ref type="bibr">23,</ref><ref type="bibr" target="#b26">33,</ref><ref type="bibr" target="#b47">[54]</ref><ref type="bibr" target="#b48">[55]</ref><ref type="bibr" target="#b49">[56]</ref> or quantization-based [18, <ref type="bibr" target="#b18">[25]</ref><ref type="bibr" target="#b19">[26]</ref><ref type="bibr" target="#b20">[27]</ref><ref type="bibr" target="#b22">29]</ref> approximate nearest neighbor (ANN) methods), we do not need to load those millions of (high-dimensional and dense) image feature vectors into RAM, one of the most expensive resources in large-scale computations. Furthermore, by integrating textual search and visual search into one engine, both types of product information can now be shared and utilized seamlessly in a single index. This paves a coherent way to support multimodal searches, allowing customers to express their interests in a variety of textual requests (e.g., keywords, brands, attributes, price ranges) jointly with visual queries, at which most of existing visual search systems fall short (if not impossible).</p><p>Since the image preprocessing step and the image feature extraction step involved in our system are standard and independent of Elasticsearch, in this paper we address more towards how we empower Elasticsearch to retrieve close image feature vectors, i.e., the Elasticsearch-related part of the visual system. Our nearest neighbor retrieval approach falls under the general framework recently proposed by Rygl et al. <ref type="bibr" target="#b40">[47]</ref>. The core idea is to create text documents from image feature vectors by encoding each vector into a collection of string tokens in a way such that closer vectors will share more string tokens in common. This enables Elasticsearch to approximately retrieve neighbors in image feature space based on their encoded textual similarities. The quality of the encoding procedure (as expected) is extremely critical to the success of this approach. In the paper, we propose a noval scheme called subvector-wise clustering encoder, which substantially outperforms the element-wise rounding one proposed and examined by Rygl et al. <ref type="bibr" target="#b40">[47]</ref> and Ruzicka et al. <ref type="bibr" target="#b39">[46]</ref>, in terms of both precision and latency. Note that our methodology should be generally applicable to any full-text search engine (e.g., Solr <ref type="bibr" target="#b44">[51]</ref>, Sphinx <ref type="bibr" target="#b0">[1]</ref>) besides Elasticsearch, but in the paper we do share a number of Elasticsearch-specific implementation tips based on our first-hand experience, which should be valuable to practitioners interested in building their own visual search system on top of Elasticsearch.</p><p>The rest of the paper is organized as follows. In Section 2, we describe the general pipeline of our visual search system, and highlight a number of engineering tweaks we found useful when implementing the system on Elasticsearch. In Section 3 and 4, we focus on how to encode an image feature vector into a collection of string tokens-the most crucial part in setting up the system. In Section 3, we first review the element-wise rounding encoder and address its drawbacks. As a remedy, we propose a new encoding scheme called subvector-wise clustering encoder, which is empirically shown in Section 4 to much outperform the element-wise rounding one.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">GENERAL FRAMEWORK OF VISUAL SEARCH WITHIN ELASTICSEARCH</head><p>The whole pipeline of our visual search engine is depicted in Figure <ref type="figure" target="#fig_0">1</ref>, which primarily consists of two phases: indexing and searching.</p><p>Indexing. Given image feature vectors</p><formula xml:id="formula_0">X := {x 1 , x 2 , . . . , x n } ⊆ R d ,<label>(2.1)</label></formula><p>we will first encode them into string tokens</p><formula xml:id="formula_1">S := {s 1 , s 2 , . . . , s n } ,<label>(2.2)</label></formula><p>where s i := E(x i ) for some encoder E(•) converting a ddimensional vector into a collection of string tokens of cardinality m. The original numerical vectors X and encoded tokens S, together with their textual metadata (e.g, product titles, prices, attributes), will be all indexed into the Elasticsearch database, to wait for being searched.</p><p>Searching. Conceptually, the search phase consists of two steps: retrieval and reranking. Given a query vector x, we will first encode it into ŝ := E( x) via the same encoder used in indexing, and retrieve r (r ≪ n) most similar vectors R := x i 1 , x i 2 , . . . , x i r as candidates based on the overlap between the string token set ŝ and the ones in {s 1 , s 2 , . . . , s n }, i.e., {i 1 , i 2 , . . . , i r } = r-arg max</p><formula xml:id="formula_2">i ∈ {1,2, ...,n } |ŝ ∩ s i |. (2.3)</formula><p>We will then re-rank vectors in the candidate set R according to their exact Euclidean distances with respect to the query vector x, and choose the top-s (s ≤ r ) ones as the final visual search result to output, i.e., s-arg min</p><formula xml:id="formula_3">i ∈ {i 1 ,i 2 , ...,i r } ∥x i − x ∥ 2 . (2.4)</formula><p>As expected, the choice of E(•) is extremely critical to the success of the above approach. A good encoder E(•) should encourage image feature vectors closer in Euclidean distance to share more string tokens in common, so that the retrieval set R obtained from the optimization problem (2.3) could contain enough meaning candidates to be fed into the exact search in (2.4). We will elaborate and compare different choices of encoders in the next two sections (Section 3&amp;4).</p><p>Implementation. In this part, we will address how we implement the retrieval and reranking steps in the searching phase efficiently within just one JSON-encoded request body (i.e., JSON 1), which instructs the Elasticsearch server to compute (2.3) and (2.4) and then return the visual search result in a desired order (via Elasticsearch's RESTful API over HTTP).</p><p>For the retrieval piece, we construct a function score query <ref type="bibr" target="#b8">[9]</ref> to rank database images based on (2.3). Specifically, our function score query (lines 3-29 in JSON 1) consists of m score functions, each of which is a term filter <ref type="bibr" target="#b13">[14]</ref> (e.g., lines 6-14 in JSON 1) to check whether the encoded feature token ŝi from the query image is being matched or not. With all the m scores being summed up (line 26 in JSON 1) using the same weight (e.g., lines 13 and 23 in JSON 1), the ranking score for the database images are calculated exactly as the number of feature tokens they overlap with the ones in ŝ.</p><p>For the reranking piece, our initial trial is to fetch the top-r image vectors from the retrieval step, and calculate (2.4) to re-rank them outside Elasticsearch. But this approach prevents our visual system from being an end-to-end one within Elasticsearch, and thus makes it hard to leverage many useful microservices (e.g., pagination) provided by Elasticsearch. More severely, this vanilla approach introduces substantial latency in communication as thousands of high-dimensional and dense image embedding vectors have to be transported out of Elasticsearch database. As a remedy, we design a query rescorer <ref type="bibr" target="#b11">[12]</ref> (lines 30-52 in JSON 1) within Elasticsearch to execute a second query on the top-r database image vectors returned from the function score query, to tweak their scores and re-rank them based on their exact Euclidean distances with the query image vector. In specific, we implement a custom Elasticsearch plugin <ref type="bibr" target="#b9">[10]</ref> (lines 35-47 in JSON 1) to compute the negation of the Euclidean distance between query image vector and the one from database. As Elasticsearch will rank the result based on the ranking score from high to low, the output will be in the desired order from the smallest distance to the largest one.</p><p>Multimodal search. More often than not, scenarios more complicated than visual search will be encountered. For instance, a customer might be fascinated with the design and style of an armoire at her friend's house, but she might want to change its color to be better aligned with her own home design or want the price to be within her budget (see Figure <ref type="figure" target="#fig_1">2</ref>). Searching using the picture snapped is most likely in vain. To better enhance customers' shopping experiences, a visual search engine should be capable of retrieving results as a joint outcome by taking both the visual and textual requests from customers into consideration. Fortunately, our Elasticsearchbased visual system can immediately achieve this with one or two lines modifications in JSON 1. In particular, filters can be inserted within the function score query to search only among products of customers' interests (e.g., within certain price range <ref type="bibr" target="#b10">[11]</ref>, attributes, colors). Moreover, general full-text query <ref type="bibr" target="#b7">[8]</ref> can also be handled, score of which can be blended with the one from visual search in a weighted manner.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">VECTOR TO STRING ENCODING</head><p>The success of our approach hinges upon the quality of the encoder E(•), which ideally should encourage closer vectors to share more sting tokens in common, so that the retrieval set R found based on token matching contains enough meaningful candidates. In the following, we first review the element-wise rounding encoder proposed by Rygl et al. <ref type="bibr" target="#b40">[47]</ref>, and discuss its potential drawbacks. As a remedy, we propose a novel encoding scheme called subvector-wise clustering encoder. Armoire is searched using image query jointly with color/price range specified by the customer. Our Elasticsearch-based visual search engine can be easily tailored to handle complicated business requests like the above by adding filters (e.g., term filter <ref type="bibr" target="#b13">[14]</ref>, range filter <ref type="bibr" target="#b10">[11]</ref>) to JSON 1.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Element-wise Rounding</head><p>Proposed and examined by Rygl et al. <ref type="bibr" target="#b40">[47]</ref> and Ruzicka et al. <ref type="bibr" target="#b39">[46]</ref>, the element-wise rounding encoder rounds each value in the numerical vector to p decimal places (where p ≥ 0 is a fixed integer), and then concatenates its positional information and rounded value as the string tokens.</p><p>Example 1. For a vector x = [0.1234, −0.2394, 0.0657], rounding to two decimal places (i.e., p = 2) produces string tokens of x as s = {"pos1val0.12", "pos2val-0.24", "pos3val0.07"} .</p><p>The encoded positional information is essential for the invertedindex-based search system to match (rounded) values at the same position without confusion. Suppose on the other hand, positional information is ignored, and thus s = {"val0.12", "val-0.24", "val0.07"} .</p><p>Then the attribute "val0.12" could be mistakenly matched by another encoded token even when it is not produced from the first entry.</p><p>For a high-dimensional vector x ∈ R d , this vanilla version of the element-wise rounding encoder will generate a large collection of string tokens (essentially with |E(x)| = d), which makes it infeasible for Elasticsearch to compute (2.3) in real time.</p><p>Filtering. As a remedy, Rygl et al. <ref type="bibr" target="#b40">[47]</ref> presents a useful filtering technique to sparsify the string tokens. In specific, only top-m entries in terms of magnitude are selected to create rounding tokens.</p><p>Example 2. For the same setting with Example 1, when m is set as 2, the string tokens will be produced as s = {"pos1val0.12", "pos2val-0.24"} with only the first and second entries being selected; and when m is set as 1, the string tokens will be produced as s = {"pos2val-0.24"} , with only the second entry being selected.</p><p>Drawbacks. Although the filtering strategy is suggested to maintain a good balance between feature sparsity and search quality <ref type="bibr" target="#b39">[46,</ref><ref type="bibr" target="#b40">47]</ref>, it might not be the best practice to reduce the number of string tokens with respect to finding nearest neighbors in general. First, for two points x, x ∈ R d , their Euclidean distance</p><formula xml:id="formula_4">∥ x − x ∥ 2 2 = d i=1 ( xi − x i ) 2 , (3.1)</formula><p>is summed along each axis equally rather than biasedly based on the magnitude of xi (or x i ). In specific, a mismatch/match with a (rounded) value 0.01 does not imply that it is less important than a mismatch/match with a 0.99, in terms of their contributions to the sum (3.1). What essentially matters is the deviation ∆ i := xi − x i rather than the value of xi (or x i ) by itself. Therefore, entries with small magnitude should not be considered as less essential and be totally ignored. Second, the efficacy of the filtering strategy is vulnerable to data distributions. For example, when the embedding vectors are binary codes <ref type="bibr" target="#b17">[24,</ref><ref type="bibr" target="#b24">31,</ref><ref type="bibr" target="#b27">34,</ref><ref type="bibr" target="#b28">35,</ref><ref type="bibr" target="#b45">52]</ref>, choosing top-m entries will lead to an immediate tanglement.</p><p>In the next subsection, we will propose an alternative encoder, which keeps all value information into consideration and is also more robust with respect to the underlying data distribution.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Subvector-wise Clustering</head><p>Different from the element-wise rounding one, an encoder that operates on a subvector level will be presented in this part. The idea is also quite natural and straightforward. For any vector x ∈ R d , we divide it into m subvectors 1 , [x 1 , . . . ,</p><formula xml:id="formula_5">x d /m x 1 , x d /m+1 , . . . , x 2d /m x 2 , . . . . . . , x d −m+1 , . . . , x m x m ]. (3.2) Denote X i := x i 1 , x i 2 , .</formula><p>. . , x i n as the collection of the i-th subvectors from X for i = 1, 2, . . . , m. We will then separately apply the classical k-means algorithm <ref type="bibr" target="#b30">[37]</ref> to divide each X i into k clusters with the learned assignment function</p><formula xml:id="formula_6">A i : R d /m → {1, 2, . . . , k }</formula><p>assigning each subvector to the cluster index it belongs to. Then for any x ∈ R d , we will encode it into a collection of m string tokens "pos1cluster{A 1 (x 1 )}", "pos2cluster{A 2 (x 2 )}", . . . . <ref type="bibr">(3.3)</ref> The whole idea is illustrated in Figure <ref type="figure" target="#fig_2">3</ref>. The trade-off between search latency and quality is well controlled by the parameter m. In specific, a larger m will tend to increase the search quality as well as the search latency, as more string tokens per each vector will be indexed. 1 For simplicity, we assume m divides d .</p><p>In contrast with the element-wise rounding encoder, our subvector-wise clustering encoder obtains m string tokens without throwing away any entry in x, and will generate string tokens more adaptive with the data distribution, as the assignment function A i (•) for each subspace is learned through X i (or data points sampled from X i ). Then each subvector is encoded into a string token by combining its position in x and the cluster it belongs to, so exacly m string tokens will be produced.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">EXPERIMENT</head><p>In this section, we will compare the performance of the subvectorwise clustering encoder and the element-wise rounding one in terms of both precision and latency, when they are being used in our content-based image retrieval system built upon Elasticsearch.</p><p>Settings. Our image datasets consists of around half a million images selected from Jet.com's furniture catalog <ref type="bibr" target="#b21">[28]</ref>. For each image, we extract its image feature vector using the pretrained Inception-ResNet-V2 model <ref type="bibr" target="#b46">[53]</ref>. In specific, each image is embedded into a vector in R 1536 by taking the output from the penultimate layer (i.e., the last average pooling layer) of the neural network model. String tokens are produced respectively with encoding schemes at different configurations. For the element-wise rounding encoder, we select p ∈ {0, 1, 2, 3}, and m ∈ {32, 64, 128, 256}. For the subvector-wise clustering encoder, we experiment with k ∈ {32, 64, 128, 256} and m ∈ {32, 64, 128, 256}. Under each scenario, we index the image feature vectors and their string tokens into a singlenode Elustersearch cluster deployed on a Microsoft Azure virtual machine <ref type="bibr" target="#b33">[40]</ref> with 12 cores and 112 GiB of RAM. To better focus on the comparison of the efficacy in encoding scheme, only vanilla setting of Elasticsearch (one shard and zero replica) is used in creating each index. Evaluation. To evaluate the two encoding schemes, we randomly select 1,000 images to act as our visual queries. For each of the query image, we find the set of its 24 nearest It can be clearly seen that our subvector-wise encoding scheme is capable of achieving higher precision with smaller latency. neighbors in Euclidean distance, which is treated as gold standard. We use Precision@24 <ref type="bibr" target="#b42">[49]</ref>, which measures the overlap between the 24 images retrieved from Elasticsearch (with r ∈ {24, 48, 96, . . . , 6144} respectively) and the gold standard, to evaluate the retrieval efficacy of different encoding methods under various settings. We also record the latency for Elasticsearch to execute the retrieval and reranking steps in the searching phase.</p><p>Results. In Table <ref type="table">1</ref>, we report the Precision@24 and search latency averaged over the 1,000 queries randomly selected. Results corresponding to p ∈ {2, 3} or r ∈ {24, 48} are skipped as they are largely outperformed by other settings. Configurations that can achieve precision ≥ 80% and latency ≤ 0.5s are highlighted in bold. From Table <ref type="table">1</ref>, we can see that the subvector-wise encoder outperforms the element-wise one, as for all results obtained by the element-wise encoder, we can find a better result from the subvector-wise one in both precision and latency. To better visualize this fact, we plot the Pareto frontier curve over the space of precision and latency in Figure <ref type="figure" target="#fig_3">4</ref>. In specific, the dashed (resp. solid) curve in Figure <ref type="figure" target="#fig_3">4</ref> is plotted as the best average Precision@24 achieved among all configurations we experiment for element-wise rounding (resp. subvector-wise clustering) encoder, under different latency constraints. From Figure <ref type="figure" target="#fig_3">4</ref>, we can more clearly observe that the subvector-wise encoder surpasses the element-wise one. Notably, when we require the search latency to be smaller than 0.3 second, the subvector-wise encoder is able to achieve an average Precision@24 as 92.14%, yielding an improvement of more than 11% over the best average Precision@24 that can be obtained by the element-wise one.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">FUTURE WORK</head><p>Although our subvector-wise clustering encoder outperforms the element-wise rounding one, it might be still restrictive to enforce a vector to be divided into subvectors exclusively using (3.2), which could potentially downgrade the performance of the encoder. Our next step is to preprocess the data (e.g., transform the data through some linear operation x → T [x] with T [•] learned from the data) before applying our subvectorwise clustering encoder. We believe this flexibility will make our encoding scheme more robust and adaptive with respect to different image feature vectors extracted from various image descriptors. Another interesting research direction is to evaluate the performances of different encoding schemes in other information retrieval contexts-e.g., neural ranking model based textual searches <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b34">41,</ref><ref type="bibr" target="#b43">50]</ref>, where relevances between user-issued queries and catalog products are modeled by their Euclidean distances in the embedding space to better match customers' intents with products  <ref type="table">1</ref>: Mean Precision@24 | ES average latency. For each setting, we average the Precision@24 and the number of seconds used over the 1,000 query images randomly selected from the furniture dataset. Settings with mean precision ≥ 80% and latency ≤ 0.5s are highlighted in bold. </p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Pipeline of our visual search system within Elasticsearch. The image vectors and their encoded string tokens are indexed together into Elasticsearch. At search time, the query vector x will be first encoded into string tokens ŝ, based on which a small candidate set R is retrieved. We will then re-rank vectors in R according to their exact Euclidean distances with x, and output the top ones as our final visual search outcome.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Illustration of multimodal search. Armoire is searched using image query jointly with color/price range specified by the customer. Our Elasticsearch-based visual search engine can be easily tailored to handle complicated business requests like the above by adding filters (e.g., term filter [14], range filter [11]) to JSON 1.</figDesc><graphic coords="4,470.11,142.45,70.45,70.45" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Illustration of the subvector-wise clustering encoder. The vector x ∈ R d is divided into m subvectors. Subvectors at the same position are considered together to be classified into k clusters. Then each subvector is encoded into a string token by combining its position in x and the cluster it belongs to, so exacly m string tokens will be produced.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 4 :</head><label>4</label><figDesc>Figure4: Pareto frontier for the element-wise rounding and the subvector-wise clustering encoders in the space of latency and precision. It can be clearly seen that our subvector-wise encoding scheme is capable of achieving higher precision with smaller latency.</figDesc><graphic coords="6,72.00,72.00,226.77,170.08" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>[</head><label></label><figDesc>18] T. Ge, K. He, Q. Ke, and J. Sun.2013. Optimized product quantization for approximate nearest neighbor search. In Proceedings of CVPR. 2946-2953. [19] C. Gennaro, G. Amato, P. Bolettieri, and P. Savino. 2010. An approach to content-based image retrieval based on the Lucene search engine library. In Proceedings of TPDL. 55-66. [20] Y. Gong, S. Lazebnik, A. Gordo, and F. Perronnin. 2013. Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 12 (2013), 2916-2929. [21] I. Goodfellow, Y. Bengio, and A. Courville. 2016. Deep learning. Vol. 1. MIT press Cambridge. [22] C. Gormley and Z. Tong. 2015. Elasticsearch: The Definitive Guide: A Distributed Real-Time Search and Analytics Engine. " O'Reilly Media, Inc. ". [23] K. He, F. Wen, and J. Sun. 2013. K-means hashing: An affinity-preserving quantization method for learning binary compact codes. In Proceedings of CVPR. 2938-2945.</figDesc></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>ACKNOWLEDGEMENT</head><p>We are grateful to three anonymous reviewers for their helpful suggestions and comments that substantially improve the paper. We would also like to thank Eliot P. Brenner and Aliasgar Kutiyanawala for proofreading the first draft of the paper.</p></div>
			</div>

			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0" />			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Aksyonoff</surname></persName>
		</author>
		<title level="m">Introduction to Search with Sphinx: From installation to relevance tuning</title>
				<imprint>
			<publisher>O&apos;Reilly Media, Inc</publisher>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions</title>
		<author>
			<persName><forename type="first">A</forename><surname>Andoni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Indyk</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of FOCS</title>
				<meeting>FOCS</meeting>
		<imprint>
			<date type="published" when="2006">2006</date>
			<biblScope unit="page" from="459" to="468" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Learning deep architectures for AI</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Foundations and trends® in Machine Learning</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="1" to="127" />
			<date type="published" when="2009">2009. 2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Apache lucene 4</title>
		<author>
			<persName><forename type="first">A</forename><surname>Bialecki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Muir</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Ingersoll</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Imagination</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of SIGIR workshop on open source information retrieval</title>
				<meeting>SIGIR workshop on open source information retrieval</meeting>
		<imprint>
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">End-to-End Neural Ranking for eCommerce Product Search</title>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">P</forename><surname>Brenner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kutiyanawala</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Yan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of SIGIR eCom&apos;18</title>
				<meeting>SIGIR eCom&apos;18</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Multi-dimensional points</title>
		<ptr target="https://www.elastic.co/blog/lucene-points-6.0" />
	</analytic>
	<monogr>
		<title level="m">Elasticsearch contributors</title>
				<imprint>
			<date type="published" when="2016-06-16">2016. June 16, 2018</date>
		</imprint>
	</monogr>
	<note>coming in Apache Lucene 6.0</note>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Numeric and Date Ranges in Elasticsearch: Just Another Brick in the Wall</title>
		<ptr target="https://www.elastic.co/blog/numeric-and-date-ranges-in-elasticsearch-just-another-brick-in-the-wall" />
	</analytic>
	<monogr>
		<title level="m">Elasticsearch contributors</title>
				<imprint>
			<date type="published" when="2017-06-16">2017. June 16, 2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<ptr target="https://www.elastic.co/guide/en/elasticsearch/reference/6.1/full-text-queries.html" />
		<title level="m">Elasticsearch contributors</title>
				<imprint>
			<date type="published" when="2018-05-01">2018. May 01, 2018</date>
		</imprint>
	</monogr>
	<note>Full text queries</note>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Function score query</title>
		<ptr target="https://www.elastic.co/guide/en/elasticsearch/reference/6.1/query-dsl-function-score-query.html" />
	</analytic>
	<monogr>
		<title level="m">Elasticsearch contributors</title>
				<imprint>
			<date type="published" when="2018-05-01">2018. May 01, 2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<ptr target="https://www.elastic.co/guide/en/elasticsearch/reference/6.1/modules-plugins.html" />
		<title level="m">Elasticsearch contributors</title>
				<imprint>
			<date type="published" when="2018-05-01">2018. May 01, 2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Range query</title>
		<ptr target="https://www.elastic.co/guide/en/elasticsearch/reference/6.1/query-dsl-range-query.html" />
	</analytic>
	<monogr>
		<title level="m">Elasticsearch contributors</title>
				<imprint>
			<date type="published" when="2018-05-01">2018. May 01, 2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<ptr target="https://www.elastic.co/guide/en/elasticsearch/reference/6.1/search-request-rescore.html" />
		<title level="m">Elasticsearch contributors</title>
				<imprint>
			<date type="published" when="2018-05-01">2018. May 01, 2018</date>
		</imprint>
	</monogr>
	<note>Rescoring</note>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<title level="m" type="main">Stories from Users Like You</title>
		<ptr target="https://www.elastic.co/use-cases" />
		<imprint>
			<date type="published" when="2018-05-01">2018. May 01, 2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Term query</title>
		<ptr target="https://www.elastic.co/guide/en/elasticsearch/reference/6.1/query-dsl-term-query.html" />
	</analytic>
	<monogr>
		<title level="m">Elasticsearch contributors</title>
				<imprint>
			<date type="published" when="2018-05-01">2018. May 01, 2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Apache Lucene library</title>
		<ptr target="https://lucene.apache.org" />
	</analytic>
	<monogr>
		<title level="m">Lucene contributors</title>
				<imprint>
			<date type="published" when="2018-05-06">2018. May 06, 2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Image retrieval: Ideas, influences, and trends of the new age</title>
		<author>
			<persName><forename type="first">R</forename><surname>Datta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Joshi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Comput. Surveys</title>
		<imprint>
			<biblScope unit="volume">40</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page">5</biblScope>
			<date type="published" when="2008">2008. 2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Decaf: A deep convolutional activation feature for generic visual recognition</title>
		<author>
			<persName><forename type="first">J</forename><surname>Donahue</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Jia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Vinyals</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Hoffman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Tzeng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Darrell</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of ICML</title>
				<meeting>ICML</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="647" to="655" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Comparative evaluation of binary features</title>
		<author>
			<persName><forename type="first">J</forename><surname>Heinly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Dunn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Frahm</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ECCV</title>
				<imprint>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="759" to="773" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Product quantization for nearest neighbor search</title>
		<author>
			<persName><forename type="first">H</forename><surname>Jegou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Douze</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Schmid</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE transactions on pattern analysis and machine intelligence</title>
		<imprint>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="117" to="128" />
			<date type="published" when="2011">2011. 2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Aggregating local image descriptors into compact codes</title>
		<author>
			<persName><forename type="first">H</forename><surname>Jegou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Perronnin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Douze</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Sánchez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Perez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Schmid</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE transactions on pattern analysis and machine intelligence</title>
		<imprint>
			<biblScope unit="volume">34</biblScope>
			<biblScope unit="issue">9</biblScope>
			<biblScope unit="page" from="1704" to="1716" />
			<date type="published" when="2012">2012. 2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Searching in one billion vectors: re-rank with source coding</title>
		<author>
			<persName><forename type="first">H</forename><surname>Jegou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Tavenard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Douze</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Amsaleg</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of ICASSP. IEEE</title>
				<meeting>ICASSP. IEEE</meeting>
		<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="861" to="864" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<title/>
		<author>
			<persName><surname>Jet</surname></persName>
		</author>
		<ptr target="https://jet.com/search?category=18000000" />
		<imprint>
			<date type="published" when="2018-05-01">2018. May 01, 2018</date>
			<publisher>Furniture</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Locally optimized product quantization for approximate nearest neighbor search</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Kalantidis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Avrithis</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of CVPR</title>
				<meeting>CVPR</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="2321" to="2328" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<title level="m" type="main">Geo Capabilities in Elasticsearch</title>
		<author>
			<persName><forename type="first">Nicholas</forename><surname>Knize</surname></persName>
		</author>
		<ptr target="https://www.elastic.co/assets/blt827a0a9db0f2e04e/webinar-geo-capabilities.pdf" />
		<imprint>
			<date type="published" when="2018-06-16">2018. June 16, 2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Simultaneous feature learning and hash coding with deep neural networks</title>
		<author>
			<persName><forename type="first">H</forename><surname>Lai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Pan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Yan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of CVPR</title>
				<meeting>CVPR</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="3270" to="3278" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Building high-level features using large scale unsupervised learning</title>
		<author>
			<persName><forename type="first">Q</forename><surname>Le</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ranzato</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Monga</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Devin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Corrado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dean</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ng</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of ICML</title>
				<meeting>ICML</meeting>
		<imprint>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="507" to="514" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Discrete graph hashing</title>
		<author>
			<persName><forename type="first">W</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Mu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in NIPS</title>
				<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="3419" to="3427" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<monogr>
		<title level="m" type="main">Convolutional Hashing for Automated Scene Matching</title>
		<author>
			<persName><forename type="first">M</forename><surname>Loncaric</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Weber</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1802.03101</idno>
		<imprint>
			<date type="published" when="2018">2018. 2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Deep Binary Representation for Efficient Image Retrieval</title>
		<author>
			<persName><forename type="first">X</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Song</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Xie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Zhang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in Multimedia</title>
		<imprint>
			<biblScope unit="page">2017</biblScope>
			<date type="published" when="2017">2017. 2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">LIRE: open source visual information retrieval</title>
		<author>
			<persName><forename type="first">M</forename><surname>Lux</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Riegler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Halvorsen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Pogorelov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Anagnostopoulos</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of MMSys</title>
				<meeting>MMSys</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">Some methods for classification and analysis of multivariate observations</title>
		<author>
			<persName><forename type="first">J</forename><surname>Macqueen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the fifth Berkeley symposium on mathematical statistics and probability</title>
				<meeting>the fifth Berkeley symposium on mathematical statistics and probability<address><addrLine>Oakland, CA, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1967">1967</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="281" to="297" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><surname>Mccandless</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Hatcher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Gospodnetic</surname></persName>
		</author>
		<title level="m">Lucene in action: covers Apache Lucene 3</title>
				<imprint>
			<publisher>Manning Publications Co</publisher>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<analytic>
		<title level="a" type="main">Unsupervised and transfer learning challenge: a deep learning approach</title>
		<author>
			<persName><forename type="first">G</forename><surname>Mesnil</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Dauphin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Glorot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Rifai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Goodfellow</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Lavoie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Muller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Desjardins</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Warde-Farley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Vincent</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Courville</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Bergstra</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of ICML Workshop on Unsupervised and Transfer Learning. JMLR. org</title>
				<meeting>ICML Workshop on Unsupervised and Transfer Learning. JMLR. org</meeting>
		<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="97" to="111" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b33">
	<monogr>
		<title level="m" type="main">Virtual machines</title>
		<ptr target="https://azure.microsoft.com/en-us/services/virtual-machines/" />
		<imprint>
			<date type="published" when="2018-05-01">2018. May 01, 2018</date>
		</imprint>
		<respStmt>
			<orgName>Microsoft Azure</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b34">
	<monogr>
		<title level="m" type="main">Neural Models for Information Retrieval</title>
		<author>
			<persName><forename type="first">B</forename><surname>Mitra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Craswell</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1705.01509</idno>
		<imprint>
			<date type="published" when="2017">2017. 2017</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b35">
	<analytic>
		<title level="a" type="main">Learning and transferring mid-level image representations using convolutional neural networks</title>
		<author>
			<persName><forename type="first">M</forename><surname>Oquab</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Bottou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Laptev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Sivic</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of CVPR</title>
				<meeting>CVPR</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="1717" to="1724" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b36">
	<analytic>
		<title level="a" type="main">Self-taught learning: transfer learning from unlabeled data</title>
		<author>
			<persName><forename type="first">R</forename><surname>Raina</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Battle</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Packer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ng</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of ICML</title>
				<meeting>ICML</meeting>
		<imprint>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="759" to="766" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b37">
	<analytic>
		<title level="a" type="main">CNN features off-the-shelf: an astounding baseline for recognition</title>
		<author>
			<persName><forename type="first">A</forename><surname>Razavian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Azizpour</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Sullivan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Carlsson</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CVPR workshop</title>
				<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="512" to="519" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b38">
	<analytic>
		<title level="a" type="main">The probabilistic relevance framework: BM25 and beyond</title>
		<author>
			<persName><forename type="first">S</forename><surname>Robertson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Zaragoza</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Foundations and Trends® in Information Retrieval</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="333" to="389" />
			<date type="published" when="2009">2009. 2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b39">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><surname>Ruzicka</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Novotny</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Sojka</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Pomikalek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Rehurek</surname></persName>
		</author>
		<ptr target="http://ceur-ws.org/Vol-1923/article-01.pdf" />
		<title level="m">Flexible Similarity Search of Semantic Vectors Using Fulltext Search Engines</title>
				<imprint>
			<date type="published" when="2018">2018. 2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b40">
	<analytic>
		<title level="a" type="main">Semantic Vector Encoding and Similarity Search Using Fulltext Search Engines</title>
		<author>
			<persName><forename type="first">J</forename><surname>Rygl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Pomikalek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Rehurek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ruzicka</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Novotny</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Sojka</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2nd Workshop on Representation Learning for NLP</title>
				<meeting>the 2nd Workshop on Representation Learning for NLP</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="81" to="90" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b41">
	<analytic>
		<title level="a" type="main">A vector space model for automatic indexing</title>
		<author>
			<persName><forename type="first">G</forename><surname>Salton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Wong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Yang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Commun. ACM</title>
		<imprint>
			<biblScope unit="volume">18</biblScope>
			<biblScope unit="page" from="613" to="620" />
			<date type="published" when="1975">1975. 1975</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b42">
	<monogr>
		<author>
			<persName><forename type="first">H</forename><surname>Schütze</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Raghavan</surname></persName>
		</author>
		<title level="m">Introduction to information retrieval</title>
				<imprint>
			<publisher>Cambridge University Press</publisher>
			<date type="published" when="2008">2008</date>
			<biblScope unit="volume">39</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b43">
	<analytic>
		<title level="a" type="main">A latent semantic model with convolutional-pooling structure for information retrieval</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Shen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Deng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Mesnil</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of CIKM</title>
				<meeting>CIKM</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="101" to="110" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b44">
	<monogr>
		<title level="m" type="main">Apache Solr enterprise search server</title>
		<author>
			<persName><forename type="first">D</forename><surname>Smiley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Pugh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Parisa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mitchell</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2015">2015</date>
			<publisher>Packt Publishing Ltd</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b45">
	<monogr>
		<title level="m" type="main">Binary Generative Adversarial Networks for Image Retrieval</title>
		<author>
			<persName><forename type="first">J</forename><surname>Song</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1708.04150</idno>
		<imprint>
			<date type="published" when="2017">2017. 2017</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b46">
	<analytic>
		<title level="a" type="main">Inception-v4, inception-resnet and the impact of residual connections on learning</title>
		<author>
			<persName><forename type="first">C</forename><surname>Szegedy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ioffe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Vanhoucke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Alemi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of AAAI</title>
				<meeting>AAAI</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="4278" to="4284" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b47">
	<analytic>
		<title level="a" type="main">Small codes and large image databases for recognition</title>
		<author>
			<persName><forename type="first">A</forename><surname>Torralba</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Fergus</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Weiss</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of CVPR</title>
				<meeting>CVPR</meeting>
		<imprint>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="1" to="8" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b48">
	<analytic>
		<title level="a" type="main">Semi-supervised hashing for scalable image retrieval</title>
		<author>
			<persName><forename type="first">J</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of CVPR</title>
				<meeting>CVPR</meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="3424" to="3431" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b49">
	<analytic>
		<title level="a" type="main">Spectral hashing</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Weiss</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Torralba</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Fergus</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in NIPS</title>
				<imprint>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="1753" to="1760" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b50">
	<monogr>
		<ptr target="https://en.wikipedia.org/wiki/Elasticsearch" />
		<title level="m">Wikipedia contributors</title>
				<imprint>
			<date type="published" when="2018-05-06">2018. May 06, 2018</date>
		</imprint>
	</monogr>
	<note>Elasticsearch</note>
</biblStruct>

<biblStruct xml:id="b51">
	<monogr>
		<title level="m" type="main">Tf-idf</title>
		<ptr target="https://en.wikipedia.org/wiki/tf-idf" />
		<imprint>
			<date type="published" when="2018-05-06">2018. May 06, 2018</date>
		</imprint>
	</monogr>
	<note>Wikipedia contributors</note>
</biblStruct>

<biblStruct xml:id="b52">
	<analytic>
		<title level="a" type="main">How transferable are features in deep neural networks</title>
		<author>
			<persName><forename type="first">J</forename><surname>Yosinski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Clune</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Lipson</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in NIPS</title>
				<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="3320" to="3328" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b53">
	<analytic>
		<title level="a" type="main">Visualizing and understanding convolutional networks</title>
		<author>
			<persName><forename type="first">M</forename><surname>Zeiler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Fergus</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of ECCV</title>
				<meeting>ECCV</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="818" to="833" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
