pairs the
SIGIR 2017 eCom, August 2017, Tokyo, Japan Andrew Trotman, Jon Degenhardt, and Surya Kallumadi
difficulty of resolving self-containing regions (e.g. a element
Term Postings
inside a element) is diminished. Another reason is efficiency.
When a query contains a field restricted phrase it can be resolved by all <1,1>
first resolving the phrase query then resolving the field restriction.
blue <1,1>
Region algebra, although provably correct, does not scale well for
semi-structured document collections. For a field that occurs many converse <1,1>
times in a document (such as in an HTML document) the lists
that contain the start and end positions of that field can become low <2,1>
very long – longer than the postings lists for even the most frequent red <2,1>
terms. Processing these long lists takes time and can overwhelm
the overall processing time of a query. This does not happen within star <1,1>
eBay because documents are structured (not semi-structured) and
tops <2,1>
so structures occur at most once per document.
An obvious and far simpler approach to structured information title:all <1,1>
retrieval is to build an inverted index over each separate field in the
title:converse <1,1>
collection. This approach also does not scale, this time if there are
a large number of fields within a document. If the user is able to title:low <2,1>
choose any combination of fields to search over then each inverted
list for each field must be examined – which is computationally title:star <1,1>
expensive. title:tops <2,1>
The eBay indexer builds separate inverted indexes over a small
number of fields as well as an index over the entire document, description:blue <1,1>
this time without word positions. A user is then able to restrict a
description:red <2.1>
query term to one of these fields or to the entire collection, but
never to more than one of these (and they cannot use phrases).
This approach has the advantage of decreasing the complexity of Figure 3: The fielded index for document collection in Fig-
a field restricted search because the number of postings in a field ure 2, assuming each document is separated by - tags.
is typically smaller than in the entire collection, and smaller again Postings are represented .
because word positions are not stored.
Figure 3 illustrates the two documents from Figure 2 each with
two fields (title and description) sharing a single vocabulary. If
into the result retrieval process. For example, for the query “2006
the user searches for “star” anywhere in the collection then the
Golf”, the search system should understand that the search intent
postings for “star” are examined. If the user searches for “star”
is more likely to be cars and less likely to be sports.
in the title then the postings for “title:star” are examined. If the
The objective of the rewrite is to encapsulate user intent so as to
user searches for “star” in the description then, since there is no
retrieve a different set of relevant results to rank. This is especially
“description:star” in the vocabulary, there is a vocabulary mismatch
useful when there is a long tail of items. Rewrite is achieved in two
and no documents will be found. Recall that field restricted phrase
ways 1) Query expansion and 2) Query relaxation. Query expan-
search can be performed using region algebra.
sion broadens the intent of the query by adding additional relevant
tokens. Query relaxation removes tokens from the query, the ob-
5 QUERY TRANSFORMATION jective of query relaxation is to make the queries less restrictive
When a query arrives at the eBay search engine it is immediately thus increasing recall. Of course, a query might have all its terms
sent to a number of subsystems that are responsible for re-writing relaxed and then be expanded with a completely new set of terms.
it in several different ways. These query transformations can be Query suggestion is another important aspect of all modern
either explicit or implicit. In explicit query transformation, such as search engines. At eBay, the objective of query suggestion is to
spelling correction, the user sees the transformed query. In implicit explicitly nudge the user to construct a query that is aligned with
query transformation, the user does not. Search engines use query their intent. Query suggestions are created by mining the query logs
transformation as a way to increase the recall of queries while and looking at reformulations. The transient nature of products at
trying to minimize any negative effect on precision. But it is an eBay adds additional complexity. We do not want to suggest queries
especially critical aspect of eCommerce search as it affects the for items that are no longer available because doing so results in a
purchasing behavior of the user. If the transformation results in negative user experience.
items that are not relevant then it reflects badly on the site. Equally,
if the transformation results in retrieving too few items then it 5.1 Automatic Category Navigation (DSBE)
affects revenues and reduces the purchasing options for the user. The behavioral data generated by users searching and buying items
Query transformation often happens through a process of query on eBay is a valuable source of query disambiguation data (amongst
understanding then query rewriting. The objective of query under- other things). For example, if a user enters “bread” as a query, eBay
standing is to identify the meaning of a query and incorporate that knows, from that behavioral data, that the user is more likely to be
The Architecture of eBay Search SIGIR 2017 eCom, August 2017, Tokyo, Japan
interested in bakeware than music. This knowledge is mined from boosted decision trees is learned and used at the top level of a multi-
the query logs. stage ranking (i.e. learning to rank) function.
When a user enters a query it is logged, when they click on items, There is no reason why these trees should be constant across
that too is logged. Eventually, when the user purchases an item queries, users, or locations – and they are not. There are commonly
that is logged. In an eCommerce environment it is more difficult to many different ranking functions in the system simultaneously.
determine this click chain that in a web environment. A web session They may serve different search applications or different query
is often considered to be a continuous sequence of requests termi- segments within the same search application. A/B tests are an
nating after a period of about 10-15 minutes of user inactivity [4]. especially important case. To perform an A/B test of a ranking
However, in an eCommerce environment a user might search for function its necessary to “tie” the user to a particular function for
an item (for example, a car), watch that item for a period of time, the duration of the test.
periodically returning to examine characteristics of the item (for These capabilities are provided by the Profile Lookup Service.
example, the type of stereo), compare that item to several others, It allows eBay search to define and house many different ranking
then eventually purchase it (or not). The determination of a session functions. It allows them to be retrieved by search application, user
on an eCommerce site is beyond the scope of this contribution. (for A/B tests), and other dimensions of the search environment. It
Periodically the query and click streams are mined. One purpose also serves to isolate front-end applications from the details of the
of the mining task is to determine, for the most frequent queries, recall and ranking algorithms.
how to disambiguate those queries. The difficulty within eBay is
that any user can list almost any item for sale, and many of these are
6 RANKING
rare or one-off items – so it is often not feasible to associate frequent
queries with items via a product catalog alone. However, it is often Although we are not at liberty to release details of the eBay rank-
feasible to use associations with structured data components, such ing function, some details are either standard practice or already
as category, brand, etc. that can be further associated with items. released.
The eBay query disambiguation engine takes this approach. By default, only the titles of the documents are searched. How-
One of the more valuable associations is the most likely cate- ever, the queries are re-written to include two parts: first, those
gory that will satisfy the query. For the example, “bread” is more terms useful in determining a suitable set of documents to rank
often bakeware than music so the results list is limited to just the (the recall base); and second, those terms used in ranking. In this
bakeware section of eBay. Another is any special considerations on way terms that are necessary to identify a relevant document (but
how to rank the query. That is, the query itself carries details of frequent in that set of documents) can be ignored in ranking, and
the ranking function to be applied by the search engine. terms that are frequent in the collection but not in the recall base
can be used for ranking. These terms are mined from user logs.
Multi-stage raking is used. The final stage uses a forest of gradi-
5.2 Query Rewrite (SIBE) ent boosted decision trees to perform the final ordering. These trees
A more complex mining task is that of determining user intent (and lower levels too) draw from textural features of the document
rather than simply disambiguating the query. This, too, is mined (such as term proximity), as well as features the user has supplied
from user behavior logs and the document collection. For each (such as price). They are deduced from the customer and seller (lo-
frequent query a set of purchases and a set of non-purchases forms cation information used for postage computation), seller reputation,
a dataset that is analogous to a relevance feedback [9] dataset. and over 500 features eBay stores about an item (ranging from color
It is clear from relevance feedback experiments [14] that a new to condition to size). Much of this feature information is stored in
query can be formed from the original and the labeled data, and a forward index (or docdata).
that query will better fulfills the user’s information need than the Unusual in academic search but common in industrial search,
original query. At eBay these rewritten queries are stored keyed on eBay search avoids returning no results whenever possible. If a
the user’s original query and a quick lookup returns the replacement query is about to return no results, or a small number of results, then
query. the null-and-low subsystem is notified. There are several reasons
Examination of these re-written queries shows that they tend recall might be low; for example it might be due to the absence of
to include boosting of synonyms and stemming variants, but also products in the repository (the query is highly specific and the item
down-ranking of ambiguous and noise terms in order to clarify does not exist) or due to a misalignment of vocabulary between
the query. For some queries these re-writes can be large in size – the user and the product (for example, “apple tablet” rather than
not infrequently exceeding hundreds of kilobytes. Unfortunately, “iPad”).
these long queries take a considerable amount of the computational The null-and-low component takes the query, re-writes it to a
resources to resolve, a necessary trade off in order to successfully new query with similar semantics, and sends it back to the search
fulfill the user’s needs. engine. This process repeats until either a suitable number of results
is identified, or the process gives up. We cannot release the details
of the null-and-low subsystems, however a user can observe the
5.3 Profile Lookup Service (PLS) eBay spelling checker at work if a single nonsense word is entered.
The eBay ranking functions are stored externally from the main They can also observe the dropping of query terms if a nonsense
source code of the search engine. They are often generated through word and a non-nonsense word are entered as the query. Section 5.1
data mining and machine learning. For example, a forest of gradient introduced automatic category restriction for queries. Experiments
SIGIR 2017 eCom, August 2017, Tokyo, Japan Andrew Trotman, Jon Degenhardt, and Surya Kallumadi
showed that this has several effects on the search engine. First, by serving node, that node is defragmented as the new bulk index lies
reducing the size of the recall base the query latency goes down. consecutively in memory and all old patches can be purged. Other
Second, by removing noise from the results list the precision goes approaches are possible, for example, a periodic process could move
up. postings lists about in memory in a de-fragmentation pass.
Another important and pragmatic role of bulk indexes is to
provide a vehicle for large updates affecting many documents, but
7 INDEXING
that do not have freshness requirements offered by mini index.
Maintaining an inverted index over a rapidly changing document Distributing large updates via bulk indexes has strong efficiency
collection has received little attention in the past. Indeed, it is often benefits.
assumed that indexing efficiency is unimportant as indexing only The 8-hour bulk reload also ensures that three times a day all
occurs once. This is not the case at eBay or with any other rapidly machines in the grid are synchronised. When a new machine is
changing collection that must be kept up-to-date (for example a added to the grid it simply needs to wait for the bulk to arrive to
news site). be in sync.
Lester et al. [7] offer three update strategies for maintaining If a machine fails (for whatever reason) and comes back on-line,
an inverted index, in-place, re-merge, and re-build. The in-place it can simply compare time stamps of which indexes are stored
approach accumulates changes to postings lists in memory then locally against a global registry of indexes to work out what needs
merges those into a master index, keeping the postings list at the to be done to re-sync – and that registry need not be concerned
same location if it still fits – and if it does not then it appends it to with indexes prior to the previous bulk.
the end of the index. Re-merge simply merges the existing index An alternative strategy of keeping up-to-date is to build the
with a set of diffs to create a new index. Re-build discards the old index on the query serving node in real-time as changes occur. This
index and re-indexes everything from scratch. Their experiments approach results in a substantial increase in the amount of network
were on-disk and so the efficiency results are not applicable to an traffic seen at the query serving nodes. It is also difficult to re-sync
in-memory index. in the case of node failure. That is, even if real-time changes are
At eBay a combination of approaches is used. Every 8 hours maintained at each node, a master index of that node must be build
a complete re-build of the gold-standard document collection is and maintained so that it can be distributed for a restart.
performed. This “bulk” is then shipped to the index serving grid, The index building process is, itself, relatively straightforward.
trickling from row to row so that, over time, the entire grid has A large HBase cluster is used as the gold-standard document collec-
a fresh index. On a more frequent basis (minutes), all changes to tion. That no-sql database is built and maintained through a series
the gold-standard document collection are computed and shipped of processes that listen to the eBay internal item messaging pipeline
to the grid as a “mini” index. These minis contain newly indexed for messages about changes to items and sellers, and updates are
documents and a list of deleted documents (document changes are applied to HBase accordingly. Although not the topic of this contri-
deletes followed by adds). There is, clearly, a short period when both bution, different processes in different parts of eBay are responsible
bulks and minis are being built at the same time. This is resolved at for determining different characteristics of an item; for example, it
the query nodes and in distribution where it is ensured that indexes is automatically classified, spam detection algorithms are applied,
arrive in the correct locations and in the correct order. language translation occurs, and more. Updates to a document can
The in-place index merging approach of Lester et al. is used to occur at any time and HBase ensures these updates are applied
combine bulks and minis, except that the index resides in memory in the correct order even if they arrive out of order. This HBase
not on disk. One at a time each postings list that requires append- database is not the eBay-wide gold-standard (that is elsewhere), it
ing is pulled from memory. If it contains a posting for a deleted is the search department’s gold-standard.
document then that posting is removed from the list, then the HBase is built on top of Hadoop, the Apache open-source map-
new postings are added to the end. The postings list is put back reduce platform. Periodically a map-reduce process kicks-off, the
in memory in-place if it fits and if not then elsewhere in mem- indexer is shipped to the data, indexing occurs in a distributed
ory. The consequence of this approach is that postings lists that manner, then the index parts are reduced to form a number of
are not added to are also not purged of deleted documents. There indexes that matches the topology of the query serving grid. These
are several reasons for choosing this approach. One is that it is are then picked up by the index distribution agent and moved to
not necessary to exactly re-parse the document in order to deter- the query serving nodes.
mine which postings lists need purging. A second is that it reduces
fragmentation because fewer lists are modified.
As a postings list might contain a posting for a document that 8 SELLERS, ITEMS, INDEXES, RANKING
has been deleted it is also necessary to keep track of deleted items. The ranking function draws from many hundreds of different fea-
Documents that have been deleted between the construction of the tures. Those features are derived from the item, the seller, and the
mini and the time of the search are removed from the results list buyer. Features of the buyer are determined high in the protocol
further up the protocol stack. stack and are constant for each item. Features of the item are stored
Each merge of a mini results in a little more memory fragmenta- in the inverted index and forward index. It is not clear where to
tion as some postings lists no-longer fit back in memory where they store the features of the seller in this eCommerce search problem.
came from. This is one of the key reasons for an 8-hour bulk re- If the seller features are stored in HBase along with the item
indexing. Each time a bulk index is loaded the memory on the query then any change to a seller would require a change to all items
The Architecture of eBay Search SIGIR 2017 eCom, August 2017, Tokyo, Japan
Seller Forward Index
Feature Value
feedback_score 146
Item Forward Index
followers 1
Feature Value Seller Lookup
seller 3 id ptr
Feature Value
price 3.00 1
feedback_score 1170
2
followers 10
Feature Value
3
seller 2
Feature Value
price 9.00
feedback_score 3898
followers 90
Figure 4: Seller data is stored in a forward index pointed to indirectly from the item forward index.
that seller is selling – which can be many hundreds of thousands REFERENCES
of items (especially for sellers of Compact Disks). If the seller data [1] Luiz André Barroso, Jeffrey Dean, and Urs Hölzle. 2003. Web Search for a Planet:
is “crossed” with the item data at indexing time then a change to The Google Cluster Architecture. IEEE Micro 23, 2 (March 2003), 22–28.
[2] Charles L. A. Clarke, G. V. Cormack, and F. J. Burkowski. 1995. An Algebra for
the seller data would require a substantial update to the index with Structured Text Search and a Framework for its Implementation. Comput. J. 38,
a mini. Such a change in seller data might occur if, for example, 1 (1995), 43. https://doi.org/10.1093/comjnl/38.1.43
[3] O. de Kretser, A. Moffat, T. Shimmin, and J. Zobel. 1998. Methodologies for
the seller declares that they are on holiday and so the expected distributed information retrieval. In Proceedings. 18th International Conference on
shipping date of all their items is to be postponed a short time. Distributed Computing Systems (Cat. No.98CB36183). 66–73.
As part of the indexing process eBay builds two indexes, one for [4] Ayse Goker and Daqing He. 2000. Analysing Web Search Logs to Determine
Session Boundaries for User-Oriented Learning. In AH.
the items and one for the sellers. Each item has a seller id associated [5] Lin Guo, Feng Shao, Chavdar Botev, and Jayavel Shanmugasundaram. 2003.
with it, but the reverse it not true (to find all items of a given seller, XRANK: Ranked Keyword Search over XML Documents. In Proceedings of the
an item search for the seller’s id is performed). To extract the seller 2003 ACM SIGMOD International Conference on Management of Data (SIGMOD
’03). ACM, New York, NY, USA, 16–27.
features during ranking a seller lookup is performed on the seller’s [6] Yubin Kim, Jamie Callan, J. Shane Culpepper, and Alistair Moffat. 2017. Efficient
id and the seller’s doc-data is examined. Distributed Selective Search. Inf. Retr. 20, 3 (June 2017), 221–252.
[7] Nicholas Lester, Justin Zobel, and Hugh E. Williams. 2004. In-place Versus Re-
Figure 4 illustrates this process. The document ids from the build Versus Re-merge: Index Maintenance Strategies for Text Retrieval Systems.
inverted index point to the item doc-data (forward index) which In Proceedings of the 27th Australasian Conference on Computer Science - Volume 26
contains a seller id which is used to find the seller forward index (ACSC ’04). Australian Computer Society, Inc., Darlinghurst, Australia, Australia,
15–23.
which contains the seller’s features used in ranking. [8] Jimmy Lin and Andrew Trotman. 2015. Anytime Ranking for Impact-Ordered
The seller index only contains data for sellers who have items Indexes. In Proceedings of the 2015 International Conference on The Theory of
for sale at the moment the index is constructed. The alternative Information Retrieval (ICTIR ’15). ACM, New York, NY, USA, 301–304.
[9] J. J. Rocchio. 1971. Relevance Feedback in Information Retrieval.
is to include all potential sellers, but since all users are potential [10] Milad Shokouhi, Falk Scholer, and Justin Zobel. 2006. Sample Sizes for Query
sellers this table would be large and contain many unused entries Probing in Uncooperative Distributed Information Retrieval. In Proceedings of the
8th Asia-Pacific Web Conference on Frontiers of WWW Research and Development
(which is space inefficient). It is clear that the closure of the set of (APWeb’06). Springer-Verlag, Berlin, Heidelberg, 63–75.
current sellers can be constructed from the closure of the set of [11] Luo Si and Jamie Callan. 2003. Relevant Document Distribution Estimation
items, and indeed that is how it is constructed. Method for Resource Selection. In Proceedings of the 26th Annual International
ACM SIGIR Conference on Research and Development in Informaion Retrieval (SIGIR
’03). ACM, New York, NY, USA, 298–305.
[12] Andrew Trotman. 2004. Searching Structured Documents. Inf. Process. Manage.
9 CONCLUSION 40, 4 (May 2004), 619–632.
[13] Andrew Trotman, Nils Pharo, and Miro Lehtonen. 2007. XML-IR Users and Use
In this position paper we outlined much of how the eBay search Cases. Springer Berlin Heidelberg, Berlin, Heidelberg, 400–412.
engine currently functions along with the engineering reasons for [14] Andrew Trotman, Antti Puurula, and Blake Burgess. 2014. Improvements to
it working in this way. BM25 and Language Models Examined. In Proceedings of the 2014 Australasian
Document Computing Symposium (ADCS ’14). ACM, New York, NY, USA, Article
Although many standard techniques are used (inverted indexes, 58, 8 pages.
learning-to-rank, etc.), it is not possible to simply take an off-the-
shelf web search engine and apply it to eCommerce as other stan-
dard techniques (caching, index-once philosophy) are simply un-
tenable. This, along with the rapid rate of change are the reasons
eBay chose to build a search engine – and this paper outlines many
of the engineering decisions taken.