<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Realtime query completion via deep language models</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Po-Wei</forename><surname>Wang</surname></persName>
							<email>poweiw@cs.cmu.edu</email>
						</author>
						<author>
							<persName><forename type="first">Huan</forename><surname>Zhang</surname></persName>
							<email>ecezhang@ucdavis.edu</email>
						</author>
						<author>
							<persName><forename type="first">Vijai</forename><surname>Mohan</surname></persName>
							<email>vijaim@amazon.com</email>
						</author>
						<author>
							<persName><forename type="first">Inderjit</forename><forename type="middle">S</forename><surname>Dhillon</surname></persName>
						</author>
						<author>
							<persName><forename type="first">J</forename><surname>Zico Kolter</surname></persName>
						</author>
						<author>
							<affiliation key="aff0">
								<orgName type="department">Machine Learning Dept</orgName>
								<orgName type="institution">Carnegie Mellon University Pittsburgh</orgName>
								<address>
									<region>PA</region>
									<country key="US">United States</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff1">
								<orgName type="department">Electrical and Computer Engineering</orgName>
								<orgName type="institution">UC Davis Davis</orgName>
								<address>
									<region>CA</region>
									<country key="US">United States</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff2">
								<address>
									<settlement>Palo Alto</settlement>
									<region>CA</region>
									<country key="US">United States</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff3">
								<orgName type="institution">Amazon &amp; UT Austin</orgName>
								<address>
									<settlement>Palo Alto</settlement>
									<region>CA</region>
									<country key="US">United States</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff4">
								<orgName type="department">School of Computer Science</orgName>
								<orgName type="institution">Carnegie Mellon University Pittsburgh</orgName>
								<address>
									<region>PA</region>
									<country key="US">United States</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff5">
								<address>
									<settlement>Ann Arbor</settlement>
									<region>Michigan</region>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff6">
								<address>
									<settlement>Ann Arbor</settlement>
									<region>Michigan</region>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Realtime query completion via deep language models</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">5BF4A8F1E6926146A5B36E334CCF18EA</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T06:31+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Query completion</term>
					<term>query correction</term>
					<term>deep learning</term>
					<term>realtime</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Search engine users nowadays heavily depend on query completion and correction to shape their queries. Typically, the completion is done by database lookup which does not understand the context and cannot generalize to prefixes not in the database. In this paper, we propose to use unsupervised deep language models to complete and correct the queries given an arbitrary prefix. We address two main challenges that renders this method practical for large-scale deployment: 1) we propose a modified beam search process which integrates with a completion distance based error correction model, combining the error correction process (as a potential function) together with the language model; and 2) we show how to efficiently perform our modified beam search process on CPU to complete the queries with error correction in real time, by exploiting the greatly overlapped forward propagation process and conducting amortized dynamic programming on the search tree, along with both SIMD-level and thread level parallelism. We outperform the off-the-shelf Keras implementation by a factor of 50, thus allowing us to generate query suggestions in real time (generating top 16 completions within 16 ms). Experiments on two large scale datasets from AOL and Amazon.com show that the method substantially increases hit rate over standard approaches, reduces the memory footprint of database lookup based approach by over two orders of magnitude, and is capable of handling tail queries.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">INTRODUCTION</head><p>Search completion is the problem of taking the prefix of a search query from a user and generating several candidate completions. This problem has enormous potential utility and monetary value to any search provider: the more accurately an engine can find the desired completions for a user (or indeed, potentially steer the user towards high-value completions), the more quickly it can lead the user to their desired goal.</p><p>This paper proposes a realtime search completion architecture based upon deep character-level language models. The basic idea is that instead of looking up possible completions from a generic database, we perform search under a deep-network-based language model to find the most likely completions of the user's current input. This allows us to integrate the power of deep language models, that have been shown to perform extremely well on complex language modeling and prediction tasks, with the desired goal of finding a good completion. Although this is a conceptually simple strategy (and one which has been considered before, as we highlight below in the literature survey), there are two key elements required to make this of practical use for a search engine provider, which together make up the primary technical contributions of the paper: 1) The completion must be error correcting, able to handle small errors in the user's initial input and provide completions for the most likely "correct" input. We propose such an approach that combines a character-level language model with an edit-distance-based potential function, combining the two using a tree-based beam search algorithm; 2) The completion must be realtime, able to produce highquality potential completions in time that is not even perceivable to the user. We achieve this by developing an efficient tree-based version of beam search and an amortized dynamic programming algorithm for error correction based on completion distance (our proposed editing distance variant) along the search tree, exploiting thread-level and SIMD-level CPU-based computation for a single query, and through numerous optimizations to the implementation that we discuss below.</p><p>We evaluate the method on the AOL search dataset, a dataset consisting of over 36 million total search queries, as well as on an Amazon product search dataset containing over 100 million user search queries. Our proposed method substantially outperforms highly optimized standard search completion algorithms in terms of its hit rate (the benefit of the deep language model and the error correction), while being fast enough to execute in real time for search engines. Our approach is also very memory efficient and reduces the memory usage of database lookup based query completion system by at least two orders of magnitude; in addition, we can handle tail queries, which are the queries that are rarely seen and for which database lookup based approach cannot give any query completions. The experiments on AOL search dataset and code are publicly available online<ref type="foot" target="#foot_0">1</ref> .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">RELATED WORK 2.1 Background on search completion</head><p>Here we review existing approaches to search query completion and error correction. Broadly speaking, two types of query completions are most relevant to our work, database lookup methods and learning-based approaches.</p><p>Database Lookup. One of the most intuitive ways to do query completion is to do a database lookup. That is, given a prefix, we can fetch all the known queries matching the prefix and return the most frequent candidates. This is called the "most popular completion" (MPC) <ref type="bibr" target="#b0">[1]</ref>, which corresponds to the maximum likelihood estimator for P (completion | prefix). The database lookup can be efficiently implemented by a trie <ref type="bibr" target="#b8">[9]</ref>. For instance, it takes only 15µs to give 16 suggestions for a query in our own trie-based implementation. However, due to the long-tail nature <ref type="bibr" target="#b19">[20]</ref> of the search queries, many prefixes might not exist in the database; for example, in the AOL search data, 28% of the queries are unique. An excellent survey of these current "classical" approaches is given in Cai et al. <ref type="bibr" target="#b2">[3]</ref>.</p><p>Learning-based. In addition to database lookup approaches, in recent years there have been a number of approaches that use learning-based methods for query completion. Sordoni et al. <ref type="bibr" target="#b18">[19]</ref> use a translation model at the word level to output single-word search query suggestions, and also model consecutive sessions of the same user. Liu et al. <ref type="bibr" target="#b11">[12]</ref> proposed a word-based method for code completion, but focused solely on greedy stochastic sampling for the prediction. Mitra and Craswell <ref type="bibr" target="#b13">[14]</ref> also used neural networks combined with a database-based model to handle tail queries, but focused on CNN approaches that just output the single most likely word-level completion. Shokouhi <ref type="bibr" target="#b17">[18]</ref> used logistic regression to learn a personalized query ranking model, specific to individual users. All these approaches are relevant but fairly orthogonal to our own, as we focus here on character-level modeling, beam search, and realtime completion. Finally, Park and Chiba <ref type="bibr" target="#b14">[15]</ref> very recently published an approach similar to ours, which uses a character-level language model for completion. But their approach focuses on the use of embeddings (such as word2vec) to produce "intelligent" completions that make use of additional context, and the approach does not handle error correction; they also do not report the prediction time of their completions, which is a key driver for our work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Error correction for queries</head><p>Our work also relates to methods on error and spelling correction approaches, which again are roughly divided into heuristic models and learning-based approaches.</p><p>Heuristic models. Whitelaw et al. <ref type="bibr" target="#b21">[22]</ref> proposed generating candidate sets that contain common errors for given prefixes, then searching these based upon the current query. Similarly, Martins and Silva <ref type="bibr" target="#b12">[13]</ref> use a ternary search tree to accelerate the search within candidate sets for spelling correction in general. The approaches are nice in that they are easily parallelizable at runtime, but are relatively "brute force", and cannot handle previously unseen permutations.</p><p>Learning-based Model. On the learning side, Duan and Hsu <ref type="bibr" target="#b5">[6]</ref> train an n-gram Markov model combined with A* search to determine candidate misspelling; this is similar to our approach except with a much richer language model replacing the simple n-gram model, which creates several challenges in the search procedure itself. Likewise, Xie et al. <ref type="bibr" target="#b22">[23]</ref> use a similar character-level model with attention, but do so in the context of error correcting an entire paragraph of text, and don't focus on the same realtime aspects that we do.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">BACKGROUND ON QUERY COMPLETION</head><p>When a user types any prefix string s in the search engine, the query completion function will start to recommend the best r completions, each denoted ŝ, according to certain metrics. For example, one might want to maximize the probability that a recommendation is clicked. The conditional probability can be formulated as</p><formula xml:id="formula_0">P (ŝ | s) := P (completion | prefix),<label>(1)</label></formula><p>and the goal of query completion in the setting is to find the top r most probable strings ŝ which potentially also maximize some additional metric, such as the click-through rate. Denote s 1:m as the first m characters in string s. We first discuss the query completion in a simplified setting, in which all completions must contain the prefix exactly, that is: ŝ1:m = s 1:m , and</p><formula xml:id="formula_1">P (ŝ 1:n | s 1:m ) = P (ŝ m+1:n | s 1:m ) = P (ŝ m+1:n | ŝ1:m ), (<label>2</label></formula><formula xml:id="formula_2">)</formula><p>where n is the total length of a completion. Note that the probability is defined in the sequence domain, which contains exponentially many candidate strings. To simplify the model, we can apply the conditional probability formula recursively and have</p><formula xml:id="formula_3">P (ŝ m+1:n | ŝ1:m ) = n−1 t =m P (ŝ t +1 | ŝ1:t ).<label>(3)</label></formula><p>This way, we only need to model P (ŝ t +1 | ŝ1:t ), that is, the probability of the next character under the current prefix. This is precisely a character-level language model, and we can learn it in an unsupervised manner using a variety of methods, though here we focus on the popular approach of using recurrent neural networks (RNNs) for this character-level language model. Character-level models are the right fidelity for the search completion task, because they satisfy the customer's expectation from the user interface, and additionally, word-level models or sequence-to-sequence probabilities would not be able to model probabilities under all partial strings.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">The Unsupervised Language Model</head><p>We first focus on the language model term P (ŝ t +1 | ŝ1:t ), the probability of next character under the current prefix. RNNs in general, and variants like long short term memory networks (LSTMs) <ref type="bibr" target="#b7">[8]</ref>, are extremely popular for high-fidelity character level modeling, and achieve state-of-the-art performance for a number of datasets <ref type="bibr" target="#b4">[5]</ref>. Since they can be trained from unsupervised data (e.g., just datasets of many unannotated search queries), we can easily adapt the model to whatever terms users are actually searching for in the dataset, with the potential to adapt to new searches, products, etc, simply by occasionally retraining the model on all data collected up to the current point.</p><p>Although character-level language modeling is a fairly standard approach, we briefly highlight the model we use for completeness. Consider a recurrent neural network with hidden state h t at time t. We want to encode the prefix ŝ1:t and predict the next character using h t . We follow fairly standard approaches here and use an LSTM model, in particular the specific implementation from the Keras library [4]<ref type="foot" target="#foot_1">2</ref> , which is defined by the recurrences</p><formula xml:id="formula_4">i t = σ (W x i x t + W hi h t −1 + b i ) ,<label>(4)</label></formula><formula xml:id="formula_5">f t = σ W x f x t + W hf h t −1 + b f ,<label>(5)</label></formula><formula xml:id="formula_6">o t = σ (W xo x t + W ho h t −1 + b o ) ,<label>(6)</label></formula><formula xml:id="formula_7">c t = i t ⊙ tanh (W xc x t + W hc h t −1 + b c ) + f t ⊙ c t −1 ,<label>(7)</label></formula><formula xml:id="formula_8">h t = o t ⊙ tanh (c t ) ,<label>(8)</label></formula><formula xml:id="formula_9">in which h t , b ∈ R d , x t ∈ R |C | , ∀t, W x i ,W x f ,W xo ,W xc and</formula><p>W hi ,W hf ,W ho ,W hc are the forward kernel and recurrent kernel with corresponding dimensions, σ is the sigmoid activation function, and ⊙ is the element-wise product. We use a one-hot encoding of characters as input, a two-layer LSTM with 256 to 1024 hidden units (more discussion on these choices below), and for prediction of character ŝt+1 , we feed the hidden layer h t to a softmax function</p><formula xml:id="formula_10">P (ŝ t +1 = i | ŝ1:t ) = softmax (i; W softmax h t ) = exp(w T i h t ) |C | j=1 exp(w T j h t ) ,<label>(9)</label></formula><p>for all i in the character set C and train the language model to maximize the log likelihood (minimize the categorical cross-entropy loss), minimize</p><formula xml:id="formula_11">W − s ∈S n s |s | |s | t =1 log P (s t +1 | s 1:t ),<label>(10)</label></formula><p>where S denotes the set of queries, |s | is the length of query s and n s is the number of times query s appears in the dataset. Further, we pad all queries with an end-of-sequence symbol to predict whether the query is complete.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Stochastic Search and Beam Search</head><p>Once we have the language model, we can evaluate the probability P (ŝ m+1:n | ŝ1:m ) for any prefix ŝ1:m , but would ideally like to find the completion with the highest probability. Enumerating all the possible strings is not an option because we have exponentially many candidates. Indeed, finding the best sequence probability, which is called the "decoding problem", is NP-hard <ref type="bibr" target="#b6">[7]</ref>, so we have to rely on approximations.</p><p>The most naive way to do so is simply via sampling: we sample the next character (according to its probability of occurrence) given the current prefix, until we hit an end-of-sequence (EOS) symbol: For t = m; ; t++ :</p><formula xml:id="formula_12">ŝt+1 ∼ P (ŝ t +1 | ŝ1:t ); If ŝt+1 == EOS : break;</formula><p>This method produces output that looks intuitively reasonable. However, it is biased toward longer sequences (as we can possibly miss the EOS symbol even if it has a relatively large probability) with short-term dependencies and clearly does not generate the most probable sequences, because sampling in a greedy fashion is clearly not the same as sampling from the sequence space.</p><p>That is, we really need to do a better approximate search to get better results. One classic way to do this is to perform beam search, that is, perform breadth-first search while keeping the top-r candidates. We illustrate the algorithm as follows: cand := {s 1:m : 0}, result := { } For t = m; cand is not empty; t++:</p><formula xml:id="formula_13">cand new := s 1:t +1 : log P (s 1:t +1 | s 1:m )</formula><p>for all s t +1 ∈ C, for all s 1:t ∈ cand ;</p><p>cand := the most probable (r − |result|) candidates in cand new ;</p><p>Move s 1:t +1 from cand to result if s t +1 is EOS symbol;</p><p>By performing beam search we can consistently obtain a more probable set of completions compared to stochastic search. However, there are two issues with the above method. First, it does not handle error correction (which is necessary for any practical type of completion) since the completion always attempts to find sequences that fit the current prefix exactly. 3 Second, as we show below, a naive implementation of this model is extremely slow, often taking on the order of one second to produce 16 completions for a given prefix. Thus, in the next two sections, we present our primary technical contributions, which address both these issues.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">COMPLETION WITH ERROR CORRECTION</head><p>Most of the time, query completion is more than completing over a fixed prefix. The input prefix might contain mistakes and sometimes we would also like to insert keywords in the prefix. Traditionally, the database community handles the two features by first doing a pass of error correction by matching the input to a typo database generated by permuting characters, then matching the database again on the permuted terms for insertion completion <ref type="bibr">[17, chap. 14]</ref>. Our observation is that with a language-model-based approach, we can handle the spelling correction and insertion completion all in one model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Error correction via the noisy channel model</head><p>the beginning of completions to be identical to the prefix so that we can "correct" the user input. Thus, the problem of finding the most probable completion becomes arg max ŝ1:n</p><formula xml:id="formula_14">P (ŝ 1:n | s 1:m )<label>(11)</label></formula><p>Now let us derive the maximum-a-posteriori (MAP) estimate of the above problem. Using Bayes' theorem, <ref type="bibr" target="#b10">(11)</ref> can be rewritten as arg max ŝ1:n</p><formula xml:id="formula_15">P (s 1:m | ŝ1:n )P (ŝ 1:n ) P (s 1:m ) . (<label>12</label></formula><formula xml:id="formula_16">)</formula><p>Because s 1:m never changes, P (s 1:m ) can be considered as a constant. Thus, solving ( <ref type="formula" target="#formula_15">12</ref>) is equivalent to maximizing the following:</p><formula xml:id="formula_17">arg max ŝ1:n log P (s 1:m | ŝ1:n ) + log P (ŝ 1:n ). (<label>13</label></formula><formula xml:id="formula_18">)</formula><p>This MAP estimate is called the noisy channel model <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b9">10]</ref> in NLP, in which the first part log P (s 1:m | ŝ1:n ) models the noisy channel of user inputs, and the second part log P (ŝ 1:n ) models the prior. For example, when using the noisy channel model for error correction in a paragraph, we can assume that users have a constant probability to make a typo for each letter. Under such assumption, the noisy channel log P (s 1:m | ŝ1:n ) is proportional to the edit distance (Levenshtein distance). For the prior part, we can plug in whatever P (ŝ 1:n ) we have for the paragraph, like the n-gram transitional probability or the language model. However, error correction for queries is essentially different from that for paragraphs; in query completion the user inputs are always incomplete. Thus, we must perform the completion and error correction at the same time. One consequence of such a constraint is that we can no longer use the edit distance function directly.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Edit Distance v.s. Completion Distance</head><p>The edit distance function, which returns the minimum changes (add/substitute/delete) to transform one string into another, is a natural candidate to measure the number of corrections between user inputs and completions. Assume that the probability by which users make an error is constant, like 2%. As we mentioned before, the noisy channel under such an assumption can be written as</p><formula xml:id="formula_19">log P (s 1:m | ŝ1:n ) = −α • edit distance(s 1:m , ŝ1:n ),<label>(14)</label></formula><p>where α = − log 2%. Note that to handle incomplete prefix and insertion completion, we should not incur penalties for the completions.</p><p>That is, we should not count the edit distance for adding words after the last character (of terms) from the user input. This can be done by modifying the transition function in the edit distance algorithm.</p><p>To be specific, we change the penalty to an indicator when dealing with the "add" operation in the edit distance algorithm; we define the new transition function to be dist new (j)= min</p><formula xml:id="formula_20">           dist new (j-1) + I (s j−1 last char) add; dist compl (j-1) + 1 substitute; dist compl (j) + 1 delete;<label>(15)</label></formula><p>We called the new edit distance function a "completion distance", in a way that the completion "pokemon go plus" for the prefix "poke go" would not incur unwanted penalties, because the added characters are proper completions (only append character after terms).</p><p>To perform error correction under the noisy channel model, we still need to integrate the noisy channel (distance function) with our LSTM-based language model (the prior), which can only be evaluated once in the forward direction because of the beam search procedure. Recall that the dynamic programming algorithm <ref type="bibr" target="#b20">[21]</ref> of edit (completion) distance costs O (m • t ) to compare two strings of length m and t. If we apply the algorithm to every candidate in the beam search for the incremental length t which ranges from 1 to n, it would add O (|C |rm • n 2 ) overhead to the beam search procedure, where |C | is the size of character set, r is the number of candidates we keep, and n is the length of the final completion. This overhead is not affordable, and we need to modify the dynamic programming algorithm for completion distance to amortize it on the search tree.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">Amortized Dynamic Programming On the Search Tree</head><p>We can exploit the fact that every new candidate in the beam search procedure originates incrementally from a previous candidate. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4">Extensions</head><p>While we are using the simple assumption (2% user error rate) in this paper, the error correction algorithm can be generalized in various ways <ref type="bibr">[10, chap. 5]</ref>. For example, we can plug in the frequency statistics in the transition function of edit distance <ref type="bibr" target="#b10">[11]</ref> or learn it directly from the corpus <ref type="bibr" target="#b21">[22]</ref>. Finally, we note that this idea of inserting a noisy channel model naturally generalizes to contexts other than edit distance. For example, many product search engines wish to drive the user not simply to a high-probability completion, but to a completion that is likely to lead to an actual sale. By modifying the prior probability to more heavily weight high-value completions, we can effectively optimize metrics other than simple completion probability using this approach.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">REALTIME COMPLETION</head><p>Starting with the system as proposed previously, the key challenge that remains now is to perform such completions in real time. Response time is crucial for query completion because unless the user can see completions as they type the query, the results will likely have very little value. The bar we set for ourselves in this work is to provide 16 candidate completions in about 20 milliseconds on current hardware. <ref type="foot" target="#foot_3">4</ref> The 20 milliseconds budget, combined with typical network latency, is similar to the pace that a user types in the query; we need to complete faster than typing to make our realtime completion usable in practice. Unfortunately, a naive implementation of beam search with the model trained above (using off-the-shelf implementations), requires more than one second to complete forward propagation through the network and beam search.</p><p>In this section, we now provide a detailed breakdown of how we have empirically improved this performance by a factor of over 50x in order to achieve sub-20-ms completion times.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">LSTM over a Tree</head><p>First, we observe that all new candidates in the beam search process are extensions from the old candidates because of the BFS property. In this case, the forward propagations would greatly overlap. If we can maintain h t for every old candidate, extending one character for new candidates would require only one forward propagation step. That is, we amortize the LSTM forward propagation over the search tree. The algorithm is illustrated below. cand := {s 1:m : (h m , 0)}, result := { }; For t = m; cand is not empty; t++:</p><formula xml:id="formula_21">cand new :=      s 1:t +1 : (h t , log P (s 1:t | s 1:m ) + log P (s t +1 | s 1:t ))</formula><p>for every s t +1 ∈ C, for every s 1:t ∈ cand </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">CPU implementation and LSTM tweaks</head><p>Although GPUs appear to be most suitable for computation in deep learning, for this particular application we found that the CPU is actually better suited to the task. This is due to the need for branching and maintaining relatively complex data structures in the beam search process, along with the integration of the edit distance computation. Thus, implementation on a GPU requires a process that frequently shuffles very small amounts of data (each new character) between the CPU and GPU and can be very inefficient. We thus implemented the entire beam search, error correction and forward propagation in C on the CPU.</p><p>However, after moving to a pure CPU implementation, it is the case that initially about 90% of the time is spent on computing the matrix-vector product in the LSTM. By properly moving to batch matrix-matrix operations with a minibatch that contains all r candidates maintained by beam search, we can substantially speed this up; By grouping together the product between the W matrices and h t for all r candidates maintained by the beam search procedure, we can use matrix-matrix products, which have significantly better cache efficiency even on the CPU. We use the Intel MKL BLAS, and the total of these optimizations further reduces the running time to 75ms. By further parallelizing the updates via 8 OpenMP threads brings completion time down to 25 ms.</p><p>Finally, one of the most subtle but surprising speedups we attained was through a slightly tweaked LSTM implementation. With the optimizations above, computing the sigmoid terms in the LSTM actually took a surprisingly large 30% of the total computation time. This is due to the fact that 1) our LSTM implementation uses a hard sigmoid activation, which as a clipping operation requires branch prediction; and 2) the fact that the activations we need to apply the sigmoid to are not consecutive in the hidden state vector means we cannot perform fast vectorized operations. By simply grouping together the terms i t , f t , o t in the hidden state, and by using Intel SSE-based operations for the hard sigmoid, we further reduce the completion time down to 13.3ms, or 16.3ms if we include the error correction procedure.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">EXPERIMENTAL RESULTS</head><p>We evaluate our method on the AOL search dataset <ref type="bibr" target="#b15">[16]</ref>, a public dataset of real-world searches from 2006, as well as an internal dataset of product search queries from Amazon.com. The AOL dataset contains 36 million total queries, with 10 million of these being unique, illustrating the long tail in these search domains. We set a maximum sequence length for the queries at 60 characters, as this contained 99.5% of all queries. The Amazon dataset contains a random sample of about 110 million product search queries that users typed on the Amazon.com web site during 2017 (we excluded sexually explicit and culturally insensitive or inappropriate queries).</p><p>Training and testing splits. For each example in the dataset, we choose a random cutting point (always after two characters in the string), and treat all characters beforehand as the prefix and all characters afterwards as the completion. For examples in the validation and test set, we use these prefixes and actual completions to evaluate the completions that our method predicts. In the training set, we discard the cutting points and just train on the entire queries.</p><p>For the AOL dataset, we use a test set size of 330K queries, and use the rest for training. For the Amazon dataset we use a test set size of 1 million queries. We create training and testing splits to evaluate our method using two different strategies:</p><p>• Prefix splitting: sort the queries according to the MD5 hash of the prefix, and then split. This ensures that data in the test set does not contain an exact prefix match in the training set. • Time splitting: For both the AOL and Amazon datasets, we sort the queries by timestamp and split. This mimics making predictions online as new data comes in.</p><p>Table <ref type="table">1</ref>: Character-level language model cross-entropy loss (see <ref type="bibr" target="#b9">(10)</ref>) for the LSTM on AOL search dataset and Amazon Product Search dataset. We explored both LSTM and GRU as the recurrent units and varied their dimensions from 256 to 1,024. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.1">Training language model</head><p>We trained our character-level language model on the characters of all the queries in the training set. We trained each model for 3-4 epochs over the entire dataset, and applied early stopping if the validation loss did not improve for more than 50, 000 mini-batches. We used a 2-layer LSTM with 256, 512, 1024 hidden dimensions with dropout of 0.5 between the two LSTM layers (no dropout within a single layer), and used Adam optimizer to train with a minibatch size of 256. We use cross-entropy loss weighted by query length for training. For each model, we select the learning rate from {10 −2 , 10 −3 , 10 −4 }, weight decay from {10 −7 , 10 −8 , 10 −9 , 0} and gradient norm clipping from {0.01, 0.001, 0.0001, 0.00001}. We also conduct experiments on replacing LSTM with Gated Recurrent Unit (GRU), as GRU has lower computation cost compared to LSTM.</p><p>Training and validation losses for each datasets, under the two different splittings and varying LSTM/GRU dimensions, are shown in Table <ref type="table">1</ref>. We observe that with the same model size, GRU usually shows slightly worse performance than LSTM, and increasing the hidden dimension does help in improving model performance.</p><p>Our training time on a NVIDIA V100 GPU (on AWS P3 instance) for the AOL dataset is 17, 18 and 24 hours for LSTM with 256, 512, and 1024 neurons, respectively. For the Amazon dataset, we remove all duplicate queries and apply per instance weight as the number of occurrences of each query to reduce the number of training examples to iterate over. The training time for the Amazon dataset is roughly three times longer than the AOL dataset using a batch size of 256. However, we found that if we increase the batch size to 1024, we can speed up the training by a factor of 2.2 because of increased GPU utilization, without noticeable performance loss. As a result, we can run one epoch of the Amazon dataset in approximately 6 hours, and training on the entire dataset can be done within one day.</p><p>We evaluated relatively few other architectures for this model, as the goal here is to use the character-level language model for completion rather than attain state-of-the-art results on language modeling in general. It is worth noting that the validation loss is lower than the training loss in Table <ref type="table">1</ref>, and the two losses becomes closer when the LSTM/GRU size is increased, indicating that our models are still in the regime of under-fitting, and even larger LSTM sizes may be used for improving performance. However, the requirement of completing the queries in real-time forbids us from using a larger model, as we will show shortly in the next section.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.2">Runtime evaluation</head><p>Compared with the traditional database lookup based query completion system, one challenge of our deep learning based approach is its prediction time. Here we summarize the speedups achieved by the different optimizations discussed in Section 5 in Table <ref type="table" target="#tab_2">2</ref>, and report the time to give 16 suggestions for a prefix. A naive implementation in Keras would result in a prediction time of over one second per query, which is intolerable in the case of completing the user's query in real time, where a completion time close to typing speed is desired. With all the optimization techniques applied, we observe over 50X speedup comparing with a naive beam search implementation. These optimizations are crucial to make our deep learning based query completion practical as an online service.</p><p>One interesting point to note is that stochastic search in this setting actually takes three times longer with all the same optimization techniques applied than beam search, to generate the same number of completion candidates. This is due to the fact that stochastic search tends to generate completions that are much longer than those of beam search, interestingly making the "simpler" method here actually substantially slower while giving worse completions (which we will evaluate shortly). As an online service, it is important to pay attention to the worstcase performance, especially because the nature of user query prefixes have a long-tail nature. In Table <ref type="table" target="#tab_3">3</ref>, we measure the prediction time using 100,000 real user query prefixes from Amazon.com, and report the Top-Percentiles (TPS) performance. A TP99 of 12 ms indicates that 99% of user requests can be served under 12 milliseconds. In Figure <ref type="figure">1</ref>, we plot the cumulative distribution function (CDF) of prediction time. We observe that the distribution of prediction time given real user query prefixes does have a very long tail. We desire that the response time of our query completion service is close to user typing speed, thus LSTM with 1024 hidden neurons is unsuitable for our use case despite showing the best prediction performance. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.3">Performance evaluation</head><p>Finally, we evaluate the actual performance of the completion approaches, both comparing the performance of our beam search method to stochastic search (evaluated by log likelihood under the model), and comparing our completion method to a heavily optimized in-memory trie-base completion model, the standard data structure for completion given string prefixes.</p><p>Stochastic Search vs. Beam Search. In Table <ref type="table">5</ref> we highlight the performance of beam search versus stochastic search for query completion, evaluated in terms of log likelihood under the model. Over all models, splitting methods and LSTM sizes, beam search produces substantially better results in terms of log likelihood; in addition, it is 3x faster as mentioned above. Thus we believe that beam search is necessary in our real-time query completion task, justifying our efforts on optimizing its runtime. Note that in this case we are not including any error correction, as it is not trivial to integrate this into the stochastic search setting, and we wanted a direct comparison on sample likelihood.</p><p>Our approach vs. database lookup. Finally, we compare our total approach (beam search with error correction) to a trie-based (i.e., prefix lookup) completion model. We compare the approach using a combination of two metrics: 1) probabilistic coverage, which is simply the empirical conditional probability of the predicted completion given the prefix:</p><formula xml:id="formula_22">i P (completion i | prefix), (<label>16</label></formula><formula xml:id="formula_23">)</formula><p>where P is the empirical probability for the whole dataset (counts of completion i over all other queries with the same prefix in the whole dataset); and 2) hit rate, which simply lists the number of times a completion appears in the entire dataset. Because the error correction model adjusts the prefix, it is not possible to compute probabilistic coverage exactly, but we can still get a sense of how likely the completions are based upon how often they occur using the hit rate metric. Table <ref type="table">4</ref> shows the performance of the trie-based approach, beam search, and beam search with error correction under these metrics. Our models generally outperform trie-based approaches in all settings, the one exception being probabilistic coverage on the time-based training/testing split. This is possibly due to some amount of shift over time in the search query terms. And although we cannot generate coverage numbers for the error-correction method, the significantly larger hit rate suggests that it is indeed giving better completions than all the alternative approaches.</p><p>Further, we note that in addition to these numbers, there are a few notable disadvantages with trie-based lookup. The trie data structure we compare to is very memory intensive (requires keeping prefixes for all relevant queries in memory), and takes a minimum of 11 GB of RAM for the entire AOL search data set, and over 50 GB of RAM for the Amazon dataset. Our deep learning based language model approach uses over two magnitudes less memory, even at its largest configuration (LSTM-1024), as shown in Table <ref type="table">6</ref>. It is worth mentioning that the Amazon dataset we used in experiments contains user queries that are sampled from one month's data on amazon.com; if we want to utilize complete data from the month, it can be memory intensive to use the trie-based approach, while our deep learning based approach does not have this limitation and in general, a learning based approach can benefit more from having a bigger dataset.</p><p>Additionally, if a prefix has not been seen before in the dataset, the trie-based approach will offer no completions. In our test set of the Amazon dataset, which contains user queries from a time period subsequent to the training data, we observe that there are a substantial fraction of queries which do not appear in the training data. A trie-based query approach cannot give these queries as suggestions, whereas our deep learning based approach can still make suggestions using its language model. Furthermore, the triebased approach is not amenable to error correction in isolation, as candidate corrections need to be proposed prior to lookup in the database; the process of repeatedly generating these candidates and performing the lookups will work for at most 2 edits, whereas our approach empirically easily handles completions that include 4-5 edits; this is reflected in the significantly higher hit rate in Table <ref type="table">4</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">CONCLUSIONS</head><p>In this paper, we have presented a search query completion approach based upon character-level deep language models. We proposed a method for integrating the approach with an error correction framework and showed that candidate completions with error correction can be efficiently generated using beam search. We further described several optimizations that enabled the system to deliver results in real time, including a CPU-based custom LSTM implementation. We demonstrated the effectiveness of our method on two large-scale datasets from AOL and Amazon, and showed that our proposed deep learning based query completion model is able to jointly produce better completions than simple prefix lookup, while simultaneously being able to generate the candidates in real time.</p><p>Acknowledgment. The authors thank Juzer Arsiwala for his help on preparing Amazon training dataset.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head></head><label></label><figDesc>the most probable r − |result| candidates in cand new Move s 1:t +1 from cand to result if s t +1 is EOS symbol Bump h t to h t +1 by one step of LSTM on s t +1 , ∀s 1:t +1 ∈cand Note that the initialization takes O (md 2), and the four lines in the loop cost O (r |C |d ), O (r |C |), O (r ), and O (rd 2 ), where m is the length of s 1:m , d is the hidden dimension of LSTM, |C | is the length of character set C, and r is the number of completions required. Using this approach, the complexity for computing r completions for ddimensional LSTM reduces from O (n 2 rd (d +|C |)) to O (nrd (d +|C |)) for sequence with maximum length n. A naive C implementation shows that the running time for such search drops to 250 ms from over 1 sec.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head></head><label></label><figDesc>That is, only one character is changed. Thus, if we can maintain the last column in the completion distance algorithm, that is dist compl , ∀j, for every candidate, we can save the repeated effort in building the edit distance table. The resulting algorithm is summarized below:</figDesc><table /><note>cand := {empty string " ": 0}, result := { } For t = 0; cand is not empty; t++: cand new := ŝ1:t+1 : log P (s 1:m | ŝ1:t+1 ) + log P (ŝ 1:t +1 ) for all ŝt+1 ∈ C, for all ŝ1:t ∈ cand ; cand := the most probable (r − |result|) candidates in cand new ; Move ŝ1:t+1 from cand to result if s t +1 is EOS symbol; Maintain the last col of dist new for P (s 1:m | ŝ1:t ) ∀ŝ 1:t ∈ cand; By such bookkeeping, we are able to amortize the completion distance algorithm over the beam search procedure, making it n times faster (from O (|C |rm • n 2 ) to O (|C |rm • n)).</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 2 :</head><label>2</label><figDesc>The speedups from different optimizations, with an LSTM of dimension 256. Results were produced on a Xeon E5-2670 machine, running 8 threads.</figDesc><table><row><cell>Optimization</cell><cell>Resulting runtime</cell></row><row><cell>Naive beam search implementation</cell><cell>&gt;1sec</cell></row><row><cell>Tree-based beam search</cell><cell>250ms</cell></row><row><cell>Adding MKL BLAS</cell><cell>75ms</cell></row><row><cell>OpenMP parallelization</cell><cell>25ms</cell></row><row><cell>Custom LSTM implementation</cell><cell>13.3ms</cell></row><row><cell>Adding prefix edit distance</cell><cell>16.3 ms</cell></row><row><cell>Stochastic search</cell><cell>40 ms</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 3 :</head><label>3</label><figDesc>Top-Percentile of completion time on the Amazon dataset with varying LSTM dimension. TPx is the minimum time (in milliseconds) under which x% of requests have been served. Results were produced on an Intel Xeon E5-2686 v4 machine, running with 8 threads.</figDesc><table><row><cell>Top-Percentile</cell><cell>256</cell><cell>LSTM Dimension 512 1024</cell></row><row><cell>TP50</cell><cell cols="2">5.388 ms 15.74 ms 67.19 ms</cell></row><row><cell>TP90</cell><cell cols="2">8.067 ms 24.17 ms 96.08 ms</cell></row><row><cell>TP99</cell><cell cols="2">11.14 ms 28.86 ms 114.74 ms</cell></row><row><cell>TP99.9</cell><cell cols="2">12.02 ms 30.93 ms 125.63 ms</cell></row><row><cell>TP99.99</cell><cell cols="2">12.30 ms 31.35 ms 131.84 ms</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://github.com/xflash96/query_completion</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">Note that, as we describe below, we won't actually use the Keras library at prediction time, but we do use it for training</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_2">Following the convention in the previous section, define s 1:m and ŝ1:n to be the prefix and completion string of length m and n, respectively. Different from the previous section, we no longer constrain<ref type="bibr" target="#b2">3</ref> A character-level LSTM alone only gives the probability of the input/completion sequence. It could not control the trade-off between the probability of user typos and the likelihood of a completion, thus an error model is necessary.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">Experiments are carried on an Intel Xeon E5-2670 machine. We use up to 8 threads, and test it on the error-corrected query completion model with</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="512" xml:id="foot_4">hidden units. For each input, 16 suggestions are generated.</note>
		</body>
		<back>
			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0" />			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Context-sensitive query auto-completion</title>
		<author>
			<persName><forename type="first">Ziv</forename><surname>Bar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">-Yossef</forename></persName>
		</author>
		<author>
			<persName><forename type="first">Naama</forename><surname>Kraus</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 20th international conference on World wide web</title>
				<meeting>the 20th international conference on World wide web</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="107" to="116" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">An improved error model for noisy channel spelling correction</title>
		<author>
			<persName><forename type="first">Eric</forename><surname>Brill</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Robert</forename><forename type="middle">C</forename><surname>Moore</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 38th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics</title>
				<meeting>the 38th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2000">2000</date>
			<biblScope unit="page" from="286" to="293" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">A survey of query auto completion in information retrieval</title>
		<author>
			<persName><forename type="first">Fei</forename><surname>Cai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Maarten</forename><surname>De Rijke</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Foundations and Trends® in Information Retrieval</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="273" to="363" />
			<date type="published" when="2016">2016. 2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<author>
			<persName><forename type="first">François</forename><surname>Chollet</surname></persName>
		</author>
		<ptr target="https://github.com/fchollet/keras." />
		<title level="m">Keras</title>
				<imprint>
			<date type="published" when="2015">2015. 2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m" type="main">Hierarchical multiscale recurrent neural networks</title>
		<author>
			<persName><forename type="first">Junyoung</forename><surname>Chung</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sungjin</forename><surname>Ahn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yoshua</forename><surname>Bengio</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1609.01704</idno>
		<imprint>
			<date type="published" when="2016">2016. 2016</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Online spelling correction for query completion</title>
		<author>
			<persName><forename type="first">Huizhong</forename><surname>Duan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Bo-June Paul</forename><surname>Hsu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 20th international conference on World wide web</title>
				<meeting>the 20th international conference on World wide web</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="117" to="126" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">The Viterbi algorithm</title>
		<author>
			<persName><forename type="first">David</forename><surname>Forney</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. IEEE</title>
				<meeting>IEEE</meeting>
		<imprint>
			<date type="published" when="1973">1973. 1973</date>
			<biblScope unit="volume">61</biblScope>
			<biblScope unit="page" from="268" to="278" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Long short-term memory</title>
		<author>
			<persName><forename type="first">Sepp</forename><surname>Hochreiter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jürgen</forename><surname>Schmidhuber</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Neural computation</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="page" from="1735" to="1780" />
			<date type="published" when="1997">1997. 1997</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Space-efficient data structures top-k completion</title>
		<author>
			<persName><forename type="first">Bo-June</forename></persName>
		</author>
		<author>
			<persName><forename type="first">Paul</forename><surname>Hsu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Giuseppe</forename><surname>Ottaviano</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 22nd international conference on World Wide Web</title>
				<meeting>the 22nd international conference on World Wide Web</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="583" to="594" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">Speech and language processing</title>
		<author>
			<persName><forename type="first">Dan</forename><surname>Jurafsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">James</forename><forename type="middle">H</forename><surname>Martin</surname></persName>
		</author>
		<imprint>
			<biblScope unit="volume">3</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">A spelling correction program based on a noisy channel model</title>
		<author>
			<persName><forename type="first">Kenneth</forename><forename type="middle">W</forename><surname>Mark D Kernighan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">William</forename><forename type="middle">A</forename><surname>Church</surname></persName>
		</author>
		<author>
			<persName><surname>Gale</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 13th conference on Computational linguistics-Volume 2</title>
				<meeting>the 13th conference on Computational linguistics-Volume 2</meeting>
		<imprint>
			<date type="published" when="1990">1990</date>
			<biblScope unit="page" from="205" to="210" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title/>
		<author>
			<persName><forename type="first">Chang</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Xin</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Richard</forename><surname>Shin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Joseph</forename><forename type="middle">E</forename><surname>Gonzalez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Dawn</forename><surname>Song</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Neural Code Completion</title>
		<imprint>
			<date type="published" when="2016">2016. 2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Spelling correction for search engine queries</title>
		<author>
			<persName><forename type="first">Bruno</forename><surname>Martins</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mário J</forename><surname>Silva</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Natural Language Processing</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2004">2004</date>
			<biblScope unit="page" from="372" to="383" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Query auto-completion for rare prefixes</title>
		<author>
			<persName><forename type="first">Bhaskar</forename><surname>Mitra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Nick</forename><surname>Craswell</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 24th ACM International on Conference on Information and Knowledge Management</title>
				<meeting>the 24th ACM International on Conference on Information and Knowledge Management</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="1755" to="1758" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">A Neural Language Model for Query Auto-Completion</title>
		<author>
			<persName><forename type="first">Dae</forename><surname>Hoon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Park</forename></persName>
		</author>
		<author>
			<persName><forename type="first">Rikio</forename><surname>Chiba</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval</title>
				<meeting>the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="1189" to="1192" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">A picture of search</title>
		<author>
			<persName><forename type="first">Greg</forename><surname>Pass</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Abdur</forename><surname>Chowdhury</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Cayley</forename><surname>Torgeson</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 1st international conference on Scalable information systems</title>
				<meeting>the 1st international conference on Scalable information systems</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2006">2006</date>
			<biblScope unit="volume">1</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<author>
			<persName><forename type="first">Toby</forename><surname>Segaran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jeff</forename><surname>Hammerbacher</surname></persName>
		</author>
		<title level="m">Beautiful data: the stories behind elegant data solutions</title>
				<imprint>
			<publisher>O&apos;Reilly Media, Inc</publisher>
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Learning to personalize query auto-completion</title>
		<author>
			<persName><forename type="first">Milad</forename><surname>Shokouhi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval</title>
				<meeting>the 36th international ACM SIGIR conference on Research and development in information retrieval</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="103" to="112" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">A hierarchical recurrent encoder-decoder for generative context-aware query suggestion</title>
		<author>
			<persName><forename type="first">Alessandro</forename><surname>Sordoni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yoshua</forename><surname>Bengio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Hossein</forename><surname>Vahabi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christina</forename><surname>Lioma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jakob</forename><forename type="middle">Grue</forename><surname>Simonsen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jian-Yun</forename><surname>Nie</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 24th ACM International on Conference on Information and Knowledge Management</title>
				<meeting>the 24th ACM International on Conference on Information and Knowledge Management</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="553" to="562" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Improving recommendation for long-tail queries via templates</title>
		<author>
			<persName><forename type="first">Idan</forename><surname>Szpektor</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Aristides</forename><surname>Gionis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yoelle</forename><surname>Maarek</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 20th international conference on World wide web</title>
				<meeting>the 20th international conference on World wide web</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="47" to="56" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">The string-to-string correction problem</title>
		<author>
			<persName><forename type="first">A</forename><surname>Robert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Michael</forename><forename type="middle">J</forename><surname>Wagner</surname></persName>
		</author>
		<author>
			<persName><surname>Fischer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of the ACM (JACM)</title>
		<imprint>
			<biblScope unit="volume">21</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="168" to="173" />
			<date type="published" when="1974">1974. 1974</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Using the web for language independent spellchecking and autocorrection</title>
		<author>
			<persName><forename type="first">Casey</forename><surname>Whitelaw</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ben</forename><surname>Hutchinson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Grace</forename><forename type="middle">Y</forename><surname>Chung</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Gerard</forename><surname>Ellis</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing</title>
				<meeting>the 2009 Conference on Empirical Methods in Natural Language Processing</meeting>
		<imprint>
			<date type="published" when="2009">2009</date>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="890" to="899" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b22">
	<monogr>
		<title level="m" type="main">Neural language correction with character-based attention</title>
		<author>
			<persName><forename type="first">Ziang</forename><surname>Xie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Anand</forename><surname>Avati</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Naveen</forename><surname>Arivazhagan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Dan</forename><surname>Jurafsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Andrew</forename><forename type="middle">Y</forename><surname>Ng</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1603.09727</idno>
		<imprint>
			<date type="published" when="2016">2016. 2016</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
