=Paper=
{{Paper
|id=Vol-2311/paper_8
|storemode=property
|title=Query Rewrite for Null and Low Search Results in eCommerce
|pdfUrl=https://ceur-ws.org/Vol-2311/paper_8.pdf
|volume=Vol-2311
|authors=Zehong Tan,Canran Xu,Mengjie Jiang,Hua Yang,Xiaoyuan Wu
|dblpUrl=https://dblp.org/rec/conf/sigir/TanXJYW17
}}
==Query Rewrite for Null and Low Search Results in eCommerce==
<pdf width="1500px">https://ceur-ws.org/Vol-2311/paper_8.pdf</pdf>
<pre>
             Query Rewrite for Null and Low Search Results in
                               eCommerce

                       Zehong Tan                                      Canran Xu                          Mengjie Jiang
                  eBay Search Science                            eBay Search Science                   eBay Search Science
                   zetan@ebay.com                         canxu@ebay.com        menjiang@ebay.com
                                                 Hua Yang              Xiaoyuan Wu
                                          eBay Search Science                         eBay Search Science
                                         hyang2@ebay.com                              xiaowu@ebay.com

ABSTRACT                                                                         will be used to boost the relevant items in the ranking phase.
In most cases when users encounter with null and low search                      However, randomly dropping or replacing terms may alter
queries, there are lots of inventory to these queries on eBay,                   the original search intent which could result in retrieving ir-
but they can’t be retrieved by regular query expansion. There-                   relevant items. For example, when a null query “led brake
fore, retrieving more relevant items by performing more ag-                      light nano made in taiwan” is sent for term dropping using
gressive query rewrites will largely improve the user experi-                    the current algorithm, top subqueries could preserve unim-
ence. This paper proposes a query rewrite system that refor-                     portant segments like “made in taiwan” As a result, even
mulate the original query into multiple alternative queries                      though these items contain enough number of tokens in the
that enlarge recall without altering the main search intent.                     user query, the relevance is low and user experience is bad.
In this paper, we present a term dropping algorithm that un-                     To address this issue, we implement separate algorithms that
derstands and keeps the important terms in the query, and                        can drop or replace terms using techniques originated from
a term replacement algorithm that learns from user behav-                        natural language processing. These algorithms will attempt
ior and language model. Experiments on randomly sampled                          to recognize the important segments of the query so that
eBay queries shows that the n-best rewrites by these two                         dropping or replacing terms will not distort its intent.
algorithms effectively recover more relevant items. Besides,                        The task of query term dropping is assisted by tagging
the experiments indicate that our algorithm can predict the                      the part-of-speech (POS), entities, unit of measures (UOM)
exact user refined query in its top-5 rewrite candidates for                     and phrases. In the eCommerce domain, entities are item
over 50% of the null and low queries.                                            properties such as brand name, material, color, shape, gen-
                                                                                 der, etc. On the other hand, for null and low queries that
                                                                                 are not verbose, we will reformulate the query by replacing
Keywords                                                                         some of the tokens. This approach avoids accidental turn-
eCommerce, Search Experience, Query Rewrite, Language                            ing the specific seed query into a broad query. To achieve
Model                                                                            this, we build a language model from large amounts of user
                                                                                 query logs. With a given set of randomly selected null and
1.    INTRODUCTION                                                               low queries, we focus on the evaluation of n-best rewrites
                                                                                 that are produced for evaluation.
  When the retrieved items of a query is lower than some
                                                                                    To summarize, our main contributions are:
certain number (e.g. 10), the search results page results in
bad customer experiences. Therefore recovering more items                             • We propose an effective algorithm to drop unimportant
for null and low queries will dramatically increase conversion                          terms without distorting the intent of the query
and click through rates. Causes of this issue range from a
misspelled word to a too specific query. Therefore, with a                            • We apply a language model to estimate the probability
few exceptions this task can be tackled by simple rule-based                            of candidate queries for term replacement.
solutions. In this scenario, dropping or replacing some terms
in these queries may recover more items. The current null                        2.    RELATED WORKS
and low recovery algorithm randomly drops tokens in the
user query up to some certain percentage and come up with                           Information retrieval with verbose queries has attracted
a collection of subqueries. Subsequently a relevance model                       lots of interest in recent years. Redundancies that reside in
                                                                                 these queries suggest that generating subqueries by dropping
                                                                                 unimportant terms using natural language processing (NLP)
Copyright © 2017 by the paper’s authors. Copying permitted for private and       techniques[1, 2, 3] in the original query would be helpful.
academic purposes.                                                               To extract patterns to reduce long queries, researchers tried
In: J. Degenhardt, S. Kallumadi, M. de Rijke, L. Si, A. Trotman, Y. Xu (eds.):   to extract features like tf-idf, POS, entities and similarity
Proceedings of the SIGIR 2017 eCom workshop, August 2017, Tokyo, Japan,          scores to the original queries. When the retrieved documents
published at http://ceur-ws.org                                                  are available, more sophisticated work employs random walk
                                                                                 from each token in the original query on a graph of retrieved
                                                                                 documents and the score of each suggestion is ranked by the
                                                                                 Hadamard product of each term[4].
   Query rewrite systems can be traced back to 2001’s when
Google started to provide dynamic spelling checking sys-
tems. Almost simultaneously, Yahoo started to track com-
mon misspelling of frequent queries such as movie stars.
Technical details for these systems are not available, but
they seem to be based on spelling algorithms and statisti-
cal frequencies. Traditional query rewriting system tries to
use rule-based techniques such as finding synonyms of words
that are extracted from wordnet[5]. In recent years, tech-        Figure 1: Workflow of query term dropping. (a)
niques that utilize NLP and statistical models have been          Query understanding phase: each token of the query
widely used for tasks of query rewrite systems. Mays et.          is tagged by part-of-speech or entities. (b) Rewrites
al.[6] showed that noisy channel model could be used for          generation phase: drop unimportant tokens to gen-
both real-word or non-word errors spelling correction. In         erate subqueries. (c) Rewrites ordering phase: each
these works, either a set of real-word confusion pairs or a       candidate is assigned with a score according to the
lexicon against the edit distance is prepared[7]. [8, 9] pro-     tags and passes (see text), and the ranking of the
posed to use a large set of ordered query pairs that are scored   candidates are given accordingly.
by their semantic similarities. These algorithms are not
specifically designed for null and low search queries. On the
other hand, [10] tried to add taxa constraints to sub-queries     gorithm that considers both the alternate query length and
and retrieve items from a prebuilt database of historical         the quality of the tokens.
purchases of null and low queries. Distribution of inferred
taxa constraints from the historical data are then added to       3.1   Brand Disambiguation
each subquery for secondary retrieval. Furthermore, using           Due to the ambiguity of brands which are derived from
query-snippet pairs from user query logs enlarges the scope       real word, recognition of brands such as “white” or “1928”
of query expansion in an improved contextual way[11, 12].         requires an additional disambiguation algorithm. Therefore
Some works[13] tried to build a query rewrite system without      we build a discriminative model P (B|q) that predicts the
requiring a predefined dictionary. There are several other        term B is a brand in the context of query q. This is achieved
researches that are relevant to query rewrite, such as using      by computing
large human labeled data set to build a conditional random                                 X
fields discriminative model[14].                                                P (B|q) =      P (Ci |q)P (B|Ci ),          (1)
                                                                                            i

3.   QUERY TERM DROPPING                                          where Ci is the ith shopping category. In Eq. (1), P (Ci |q)
   In this section, we describe the proposed approach to          is the model that projects the context of the query onto the
query term dropping. The aim of term dropping is to re-           shopping categories, and P (B|Ci ) is extracted from the data
duce a verbose query to multiple short query candidates to        from eBay’s inventory. The model that learns P (Ci |q) is
increase recall as well as keep the query intent.                 evaluated by inferred category demand. The optimal thresh-
   To understand users’ intent, we tag the possible product       old for classification is determined by the user click data.
and aspects in the user query, including brands (such as          Tagging procedure for brand name will be performed once
Nike), product names (such as Macbook and Play Station            Eq. (1) has been run through the entire token list in the
2), colors, materials, product models (such as hd-650), UOM       query.
etc. Entities and attributes are tagged by mapping the dic-
tionaries mined from eBay’s structured catalog. As null and       3.2   Scoring Algorithm
low queries are mostly tail queries, most of the tokens cannot       The alternative query generation is implemented as a multi-
be covered by structured data. The syntax functionalities         pass process. Earlier passes keep more original tokens in
of these tokens are labeled by the POS tagging. Moreover,         the alternate query, resulting in more efficient query with
phrases are also detected in this phase by mining the user        less recalls. Later passes will keep fewer tokens to enlarge
query logs to make sure tokens in a phrase will be dropped        the recall size. The search engine issues the term dropping
or kept as a whole. To this end, when there are multiple          queries pass by pass, and stops whenever it gets enough ef-
possible tagging sequences, we select the sequence that tags      fective alternative queries and items to show to users. An
most of the tokens. This query tagging procedure provides a       offline experiment suggests that for 70% of the null and low
context for core concept and attributives extraction in term      queries, it need only one pass to complete the whole process.
dropping.                                                         Suggestions from the earlier pass will always rank higher
   The selection of terms to drop is set by predetermined         than the ones from later passes, as overall queries with more
rules. Brands, product names or first noun phrases are con-       original tokens are believed to be closer to the original user
sidered as Tier 1 group. They are the likely core product in      intent.
this query. Aspects (colors, materials, etc), UOM and ad-            In order to score the suggestions in one pass, we basically
jectives etc. are considered as Tier 2 group, as they are the     prefer suggestions with more brands or nouns over sugges-
attributes of the core product. The stop words and conjunc-       tions with more adjectives. The baseline tagging score is
tive words etc. are considered as Tier 3 group. To this end,      given by a step function over their tags. After tagging, we
a term dropping suggestion is generated by the core product       found that there are remaining 54% tokens are nouns. Due
terms in combination with attribute terms. Under the re-          to the fact that lots of null and low queries have misspelled
quirement of minimum number of tokens of the alternative          words that are of low document frequencies, we are inspired
query, the n-best suggestions are by generated a scoring al-      to rank these nouns by the cross-entropy of their clicking
probabilities conditioning on their document frequencies:                        most predictive. Unlike grammatically strict natural lan-
                                                                                 guage for long text, the feature of user queries is even more
           Pclick (r) = −P (click|r) log (1 − P (click|r)),                (2)   sparse. Therefore we only incorporate bigram features in the
where r is the document frequency of token, and P (click|r)                      language model. Remedies of the sparsity of bigram features
is evaluated by a large sample of null queries. The clicking                     are achieved by Kneser-Ney smoothing technique[16].
score is added to the tagging score with an undetermined                            In the scenario of web search for eCommerce, the terms
coefficient, which is later optimized by maximizing the dis-                     in the queries are not necessarily following a chronicle order.
counted cumulative gain (DCG).                                                   If a user types “case iphone”, the intent is not far from the
                                                                                 query “iphone case”. For this reason, we also incorporate
                                                                                 the backward-bigrams in the model without enhancing the
4.     QUERY TERM REPLACEMENT                                                    number of parameters. Since the backward-bigram proba-
   In this section we describe the proposed approach to term                     bility is not commensurate with the normal bigrams, these
replacement. The inputs are a query q0 consisting of a se-                       probabilities are assigned with a free parameter λ:
quence of m words (w1 , . . . , wm ). As the formalism in [6],
the algorithm generates a set of candidates C(q0 ), and then
the best suggestion is given by the highest language model                                                  n−1
                                                                                                            Y                     n
                                                                                                                                  Y
probability. To generate the candidates, we start by gener-                            P (w1 . . . wn ) '         P (wi+1 |wi )         P (wi−1 |wi )λ   (7)
ating a set of candidate words for each input word wi that is                                               i=1                   i=2

mined from the user behavior logs. Thus the probability for
each candidate query q1 ∈ C(q0 ) would be proportional to                          Finally, instead of just choosing a candidate term replace-
P (q0 |q1 )P (q1 ), where P (q0 ) is the probability of the original             ment with the highest probability, we only accommodate the
query modeled by the n-gram language model. Therefore,                           candidates that have higher probability estimated from the
term replacement is formulated as                                                language model than the original query. This is to remedy
                                                                                 over-replacing issue, that is, avoid null and low queries that
                    q1∗ = arg max P (q0 |q1 )P (q1 ),                      (3)   are not suitable for term replacement.
                            q1 ∈C(q0 )

where P (q0 |q1 ) and P (q1 ) is the noisy channel model and
language model respectively. Furthermore, in practice, con-                      5.    EXPERIMENT
sidering the fact that the number of candidates in the noisy                       We test how well our alternative query service can perform
channel model of a particular token ranges differently to the                    on the eBay search engine. The evaluation scenario aims at
cardinality of bigrams, we raise the noisy channel by a power                    measuring the ability of the algorithm on recovering null and
γ:                                                                               low search result page and retrieve relevant items.
                   q1∗ = arg max P (q0 |q1 )γ P (q1 ).                     (4)
                           q1 ∈C(q0 )                                            5.1     Data Preparation
                                                                                   We sampled two query sets of null and low queries from
4.1      Noisy Channel Model                                                     search log for term dropping and replacement separately.
   The noisy channel model, P (q0 |q1 ), describes the emis-                     The training set for term dropping consists of one week of
sion probability of an original query q0 being replaced by q1 .                  user’s search and click data. The training data for the query
We collect query pairs of term replacement within the same                       term replacement consists of query pairs taken from 11 weeks
session from user query logs. For each original query, we                        of query logs. The sampled test set contains 3,000 null and
                                                   0
assume that only one token wi is replaced by wi , therefore                      low queries from the consecutive week of the training data.
P (q0 |q1 ) = P (wi |wi0 ).
   Note that for each observed token wi , the probability                        5.1.1    Query Term Dropping
P (wi |wi0 = wi ) should be counted as well. Unlike [Mays
                                                                                   In the experiment for term dropping, we leverage eBay’s
1991], where the probability of wi not being replaced is as-
                                                                                 structure data dictionary for the product and attribute rec-
signed with a tuning parameter, it is estimated by the effec-
                                                                                 ognizing in the query tagging phase. When we perform
tive non-replacement count:
                                                                                 POS tagging for search queries, we employ the Stanford bi-
                                         Count(wi → wi )                         directional model. However, in the scenario of web search,
               P (wi |wi0 = wi ) =                        ,                (5)   the pior probability of verbs is much less than ordinary new
                                         Count(wi → wi0 )
                                                                                 articles, we perform an offline correction on top of the Stan-
where Count(wi → wi ) = (Count(wi ) − Count(wi → wi 6=                           ford POS tagging. With mined unigram and bigrams from
wi0 ))/2.                                                                        9 days’ search queries, the tagging for phrases are obtained
                                                                                 by the normalized mutual information:
4.2      Language model
  Language modeling in our approach utilizes an n-gram                                                                   P (wi , wi+1 )2
language model that assigns the following probability to a                                      M I(wi , wi+1 ) =                                        (8)
                                                                                                                        P (wi )P (wi+1 )
string of words:
                          m
                          Y                        n
                                                   Y                             5.1.2    Query Term Replacement
     P (w1 . . . wm ) =         P (wi |w1i−1 ) '                 i−1
                                                         P (wi |wi−n+1 )   (6)
                                                                                   We remove all non-alphanumeric characters from the queries
                          i=1                      i=1
                                                                                 or spelling corrections pairs. After filtering, there are 380
  As shown in [15], language models solely built from query,                     millions of query pairs. A collection of data statistics of the
instead of the mixture of query and others, would be the                         training data is shown in Table 1.
                                                                     original query. Short queries with only 2-3 tokens are more
Table 1: Statistics of unique n-grams in language                    troublesome in term dropping with overlap rate 60.4%, as
model.                                                               dropping even one term may change the users’ intent largely.
               1-gram      2-grams  Noisy Channel                    For example, in the query “boden shoes 30”, dropping token
 Parameters     2.8 million    29 million         18 million         “30” (size 30) will alter the category from “kid’s clothing and
   Count        300 million    300 million       126 million         shoes” to “women’s shoes”. Similar issues can be relieved by
                                                                     term replacement. Our result achieved an overall 74.3% cat-
                                                                     egory overlap, compared to 74%, as reported in [10].1
5.2     Results on Search Metrics
  Our algorithm is fine-tuned in order to balance the ca-
pability of retrieving more items and their relevances. In           Table 4: Category overlap between original query
this section, we present the result of our experiment on the         and n-best query rewrite candidates.
search metrics.                                                                           1st    2nd      3rd overall
5.2.1     Recovery of user behaviors                                       Term dropping      71.5%    65.5%     63.3%    66.8%
  Users’ intent is reflected by their own query refinements               Term replacement    68.6%    93.1%     83.9%    81.9%
when they are not satisfied with the returned search result of
the original query. We first evaluate the probability of exact       5.3     Qualitative Results
predictions of our algorithm from one of the n-best query
rewrite candidates. The results are summarized in Table 2.              One important feature of our algorithms that lead to bet-
                                                                     ter search experiences is the explicit mechanism to rewrite
                                                                     search queries based on the context. In Table 5, we show
Table 2: Exact match between user’s refinements                      several examples of top query rewrite generated from the
and n-best query rewrite candidates.                                 algorithm.
                          Top-1 Top-5                                   As for term dropping, in the first query “tommy hilfiger”
                                                                     is detected as brand and “sport bra” as an unbreakable noun
                Dropping       23.3%     54.6%                       phrase. Our term dropping rewrites the query by dropping
               Replacement     31.9%     66.4%                       “prep” “free” and misspelled word “shipp”. In query “10w
                                                                     high quality speaker 6ohm”, our algorithms recognizes that
5.2.2     Recall Enhancement                                         it contains two UOM constraints “10w” “6ohm” and relaxes
                                                                     the query by dropping one of the UOM constraints. As men-
  In Table 3, we show the number of retrieved items, R, of
                                                                     tioned in section 3.1, the brand names were disambiguated
the original query and the best rewritten query. For com-
                                                                     by considering the shopping query. Though word “watt” of-
parison, we also show the number for users’ rewritten query.
                                                                     ten refers to a unit at most of the time, it is a high confident
We are recovering the null and low search result pages in
                                                                     brand when it occurs in queries in pottery&craft category.
general.
                                                                     In query “watt spagehetti platter”, our algorithm recognizes
                                                                     “watt” as a china brand and keeps it in query rewriting. For
Table 3: Medium of number of retrieved items, R,                     term replacement, the query “iphone watch” is rewritten as
of original query (Or) , user’s rewritten query (Re)                 “apple watch”, instead of “iphone clock”, because the bigram
and 3-best query rewrite candidates.                                 probability Pb (watch|iphone )is much greater than the emis-
                                                                     sion probability Pe (clock|watch). On the other hand, “am-
                          Or Re 1st 2nd 3rd
                                                                     ber ball insect” is replaced by “amber sphere insect” because
        Dropping (R = 0)       0     8     8      5     12           the Kneser-Ney smoothing correctly compensate the proba-
       Dropping (R ≤ 25)       2    42    75     45     75           bility of “amber sphere”, instead of recognizing “amber” as a
      Replacement (R = 0)      0    41     3     365     0           color.
      Replacement (R ≤ 25)     4    70    44     362     7
                                                                     6.    CONCLUSION AND OUTLOOK
5.2.3     Category Overlap                                             In this work, we have implemented algorithms that aim
   The distribution of queries over shopping categories is key       to cure the null and low search results. In particular, we
indicator of high-level relevance for query rewrite systems          showed that tagging technique developed in NLP could be
in eCommerce[10, 17]. It is a sufficiently strong signal that        used to generate more context relevant subqueries, and that
the alternative query distorts user’s search intent if the in-       additionally statistical natural language model provides a
ferred level-2 categories are changed. The category overlap          promising avenue for making progress on the path to build-
is computed as the Jaccard similarity of up-to top 50 items          ing full natural language understanding systems. Our nu-
                                         50
retrieved from the alternative query, CA    , to the dominant        1
                                                                       To our knowledge, the most relevant work to ours is Ref[10],
category of the original query, CO :                                 in which the authors reported an algorithm to generate
                                 50
                                CA  ∩ CO                             search sub-queries in the scenario of null search page with
                   J(A, O) =     50
                                                               (9)   additional taxa constraints. However, the result of this ap-
                               |CA ∪ CO |                            proach is not feasible to reproduce because the expense of
  In Table 4, we show the category overlap of top 3 sug-             inferred exact taxa constrains of each query is maintaining a
                                                                     dynamic database as large as the active inventory. For this
gestions by term dropping and term replacement. Overall              reason, this baseline is compared to the reported number in
71.5% (68.6%) of the items retrieved by top-1 term dropping          the original literature, instead of reproducing its result for
(replacement) suggestion has the same category with the              the same query set.
                           Table 5: Examples of top query rewrite from the algorithms.
                                                Or                                1st
                                tommy hilfiger prep sport bra nwt free shipp   tommy hilfiter sport bra nwt
                  Dropping           10w high quality speaker 6ohm               10w high quality speaker
                                          watt spagehetti platter                      watt platter
                                               iphone watch                            apple watch
                Replacement
                                              amber ball insect                     amber sphere insect


merical experiments indicate that user’s shopping experience           (New York, NY, USA, 2006), WWW ’06, ACM,
could be improved with higher level of engagement.                     pp. 387–396.
  The improvements achieved by our algorithms are not              [9] Hasan, M. A., Parikh, N., Singh, G., and
unique to vanilla null and low search pages, and are readily           Sundaresan, N. Query suggestion for e-commerce
applicable to even broader tasks with search engine, such as           sites. In Proceedings of the Fourth ACM International
extracting structured data from item titles, general query             Conference on Web Search and Data Mining (New
reformulation and anomalous query synopsis. It could lead              York, NY, USA, 2011), WSDM ’11, ACM,
to significant improvements in shopping experiences with               pp. 765–774.
similar models especially with text data of less grammatical      [10] Singh, G., Parikh, N., and Sundaresan, N.
strictness.                                                            Rewriting null e-commerce queries to recommend
                                                                       products. In Proceedings of the 21st International
7.   REFERENCES                                                        Conference on World Wide Web (New York, NY,
                                                                       USA, 2012), WWW ’12 Companion, ACM, pp. 73–82.
 [1] Kumaran, G., and Carvalho, V. R. Reducing long               [11] Riezler, S., and Liu, Y. Query rewriting using
     queries using query quality predictors. In Proceedings            monolingual statistical machine translation.
     of the 32nd international ACM SIGIR conference on                 Computational Linguistics 36, 3 (sep 2010), 569–582.
     Research and development in information retrieval
                                                                  [12] Chen, Q., Li, M., and Zhou, M. Improving query
     (2009), ACM, pp. 564–571.
                                                                       spelling correction using web search results. In
 [2] Chen, Y., and Zhang, Y.-Q. A query                                Proceedings of the 2007 Joint Conference on Empirical
     substitution-search result refinement approach for long           Methods in Natural Language Processing and
     query web searches. In Proceedings of the 2009                    Computational Natural Language Learning (2007),
     IEEE/WIC/ACM International Joint Conference on                    pp. 181–189.
     Web Intelligence and Intelligent Agent Technology -
                                                                  [13] Gao, J., Li, X., Micol, D., Quirk, C., and Sun,
     Volume 01 (Washington, DC, USA, 2009), WI-IAT
                                                                       X. A large scale ranker-based system for search query
     ’09, IEEE Computer Society, pp. 245–251.
                                                                       spelling correction. In Proceedings of the 23rd
 [3] Duan, H., and Hsu, B.-J. P. Online spelling                       International Conference on Computational
     correction for query completion. In Proceedings of the            Linguistics (2010), Association for Computational
     20th International Conference on World Wide Web                   Linguistics, pp. 358–366.
     (New York, NY, USA, 2011), WWW ’11, ACM,
                                                                  [14] Guo, J., Xu, G., Li, H., and Cheng, X. A unified
     pp. 117–126.
                                                                       and discriminative model for query refinement. In
 [4] Maxwell, K. T., and Croft, W. B. Compact                          Proceedings of the 31st annual international ACM
     query term selection using topically related text. In             SIGIR conference on Research and development in
     Proceedings of the 36th Intl. ACM SIGIR Conf. on                  information retrieval (2008), pp. 379–386.
     Research and Development in Information Retrieval
                                                                  [15] Huang, J., Gao, J., Miao, J., Li, X., Wang, K.,
     (2013).
                                                                       Behr, F., and Giles, C. L. Exploring web scale
 [5] Zhang, J., Deng, B., and Li, X. Concept based                     language models for search query processing. In
     query expansion using wordnet. In AST ’09                         WWW ’10 Proceedings of the 19th international
     Proceedings of the 2009 International e-Conference on             conference on World wide web (2010), pp. 451–460.
     Advanced Science and Technology (2009), pp. 52–55.
                                                                  [16] Kneser, R., and Ney, H. Improved backing-off for
 [6] Mays, E., Damerau, F. J., and Mercer, R. L.                       m-gram language modeling. In Acoustics, Speech, and
     Context based spelling correction. In Information                 Signal Processing, 1995. ICASSP-95., 1995
     Processing and Management: An International                       International Conference on (1995), vol. 1, IEEE,
     Journal (1991), vol. 27, pp. 517–522.                             pp. 181–184.
 [7] Ahmad, F., and Kondrak, G. Learning a spelling               [17] Hasan, S., Heger, C., and Mansour, S. Spelling
     error model from search query logs. In Proceedings of             correction of user search queries through statistical
     the Conference on Human Language Technology and                   machine translation. In EMNLP (2015), L. MÃ rquez,
     Empirical Methods in Natural Language Processing                  C. Callison-Burch, J. Su, D. Pighin, and Y. Marton,
     (Stroudsburg, PA, USA, 2005), HLT ’05, Association                Eds., The Association for Computational Linguistics,
     for Computational Linguistics, pp. 955–962.                       pp. 451–460.
 [8] Jones, R., Rey, B., Madani, O., and Greiner, W.
     Generating query substitutions. In Proceedings of the
     15th International Conference on World Wide Web

</pre>