=Paper=
{{Paper
|id=Vol-3878/34_main_long
|storemode=property
|title=Scalable Query Understanding for E-commerce: An Ensemble Architecture with Graph-based Optimization
|pdfUrl=https://ceur-ws.org/Vol-3878/34_main_long.pdf
|volume=Vol-3878
|authors=Giuseppe Di Fabbrizio,Evgeny Stepanov,Ludovico Frizziero,Filippo Tessaro
|dblpUrl=https://dblp.org/rec/conf/clic-it/FabbrizioSFT24
}}
==Scalable Query Understanding for E-commerce: An Ensemble Architecture with Graph-based Optimization==
Scalable Query Understanding for E-commerce: An
Ensemble Architecture with Graph-based Optimization
Giuseppe Di Fabbrizio† , Evgeny Stepanov† , Ludovico Frizziero† and Filippo Tessaro†
Abstract
Query understanding is a critical component in e-commerce platforms, facilitating accurate interpretation of user intent
and efficient retrieval of relevant products. This study investigates scalable query understanding techniques applied to a
real-world use case in the e-commerce grocery domain. We propose a novel architecture that integrates deep learning models
with traditional machine learning approaches to capture query nuances and deliver robust performance across diverse query
types and categories. Experimental evaluations conducted on real-life datasets demonstrate the efficacy of our proposed
solution in terms of both accuracy and scalability. The implementation of an optimized graph-based architecture utilizing
the Ray framework enables efficient processing of high-volume traffic. Our ensemble approach achieves an absolute 2%
improvement in accuracy over the best individual model. The findings underscore the advantages of combining diverse
models in addressing the complexities of e-commerce query understanding.
Keywords
Query classification, Query understanding, Distributed and scalable machine learning.
1. Introduction E-commerce queries are often short, lacking context, and
can have multiple interpretations [8]. Moreover, the
Accurately understanding and classifying user queries large-scale product catalogs in e-commerce platforms,
is crucial for providing a seamless shopping experience spanning thousands of categories and millions of prod-
by boosting the product search results relevance in e- ucts, pose a significant challenge in accurately mapping
commerce [1]. Query understanding enables e-commerce queries to relevant categories and products.
platforms to interpret users’ intents, retrieve relevant Various approaches have been proposed to address
products, and personalize the user’s journey through the these challenges, leveraging traditional machine learning
shopping experience. However, the task of query under- techniques and deep learning models. Rule-based sys-
standing in e-commerce presents several challenges due tems and keyword matching have been widely used for
to the diverse nature of queries, the large-scale product query classification and entity recognition [9]. However,
catalogs, and the need for efficient processing of high- these approaches often struggle with the variability and
volume traffic with noisy behavioral signals [2, 3]. complexity of natural language queries. Different query
Query understanding in e-commerce involves multiple intents require different algorithms to yield optimum re-
sub-tasks, such as query classification, entity recognition, sults [10]. Queries can be classified into navigational (e.g.,
and intent detection. Query classification aims to cate- product category, brand, title) and informational (e.g.,
gorize user queries into predefined product categories, product-related questions). While navigational queries
facilitating improved product retrieval and ranking [4, 5]. require exact matching to catalog products, informational
Entity recognition identifies key information within the queries necessitate applying more complex understand-
query, such as brand names, product attributes, and nu- ing techniques.
merical values, which can be used to refine the search Another critical aspect of query understanding in e-
results [6]. Intent detection focuses on understanding commerce is efficiently processing high-volume traffic.
the user’s underlying goal, such as product discovery, E-commerce platforms receive millions of queries daily,
comparison, or purchase [7]. requiring scalable and real-time query understanding
One of the primary challenges in query understanding systems. Distributed computing frameworks, such as
is the inherent ambiguity and diversity of user queries. Apache Spark and Ray, have been employed to paral-
lelize query processing and handle the massive scale of
CLiC-it 2024: Tenth Italian Conference on Computational Linguistics, e-commerce data [11, 12].
Dec 04 — 06, 2024, Pisa, Italy In this paper, we propose an ensemble approach for
†
Work done when at VUI, Inc. query understanding in e-commerce, combining deep
$ difabbrizio@gmail.com (G. Di Fabbrizio); learning models and traditional techniques. Our ap-
stepanov.evgeny.a@gmail.com (E. Stepanov);
proach leverages the strengths of both deep learning,
ludovico.frizziero@gmail.com (L. Frizziero);
filippotessaro96@gmail.com (F. Tessaro) such as DistilBERT [13], and traditional models, includ-
https://difabbrizio.com/ (G. Di Fabbrizio) ing logistic regression and rule-based systems. By in-
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0). tegrating these diverse models, we aim to capture the
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
Pacific chicken broth organic gluten free
pantry>>soup Brand Product Nutrition Nutrition
Category Entities
(a) Query understanding parsing (b) Search results
Figure 1: Query understanding parsing example with search results leveraging the query understanding signals
nuances of user queries and provide robust performance is prohibitive. Consequently, the query understanding
across various query types and categories. problem is cast as a document classification problem for
We introduce an optimized graph-based architecture matching user queries to the product taxonomy tree (cat-
based on the Ray framework [12], enabling efficient pro- egories) and a sequence labeling problem for entities of
cessing of high-volume traffic and ensuring scalability. interest. For each problem, we propose using an ensem-
ble approach with multiple models having different label
sets and relations. Specifically, we predict two levels of
2. Query understanding ensemble the product taxonomy tree (L1 and L2) and extract the
architecture corresponding entities mentioned in the queries. Each
level is predicted by an ensemble of models composed of
In this paper, we focus on navigational queries and clas- business rules and machine learning models. Similarly,
sify them into product taxonomy categories while apply- different machine learning and rule-based models are
ing named entity recognition (NER) to capture relevant used to extract entities of interest.
product attributes, such as Brand, Nutrition, Flavor, and
numeric attributes like quantities and measurements. Fig- 2.1. Query understanding pipeline and
ure 1 shows a typical example of a navigational search
query in an e-commerce grocery domain where the query ensemble components
“Pacific chicken broth organic gluten free” is parsed into The query understanding pipeline’s classification and
its attributes and categorized into its taxonomy label. entity extraction components are trained and tested on
Classifying user queries into product taxonomy cate- pre-processed user queries. Common text pre-processing
gories is a typical document classification problem that is steps are applied, including spaCy’s tokenization, lower-
complex and actively researched. The problem is compli- casing, and number normalization [14].
cated by the nature of available data, which can be either The classification ensemble consists of business rules,
product descriptions with user-provided categories or implemented as a lookup table, and two machine learning
user queries associated with catalog categories from user models: logistic regression and DistilBERT. DistilBERT
click-stream data. Products in the catalog are described is a compressed version of BERT [15] that retains 97%
in terms of attributes with associated values, and a subset of the original model’s performance while being 40%
of this mapping constitutes a set of entities that should smaller and 60% faster at inference time. The key idea is
be identified to build a search query and provide better to leverage knowledge distillation during the pre-training
search results. phase to learn a compact model that can be fine-tuned for
Due to the rate of change in e-commerce, the classi- downstream tasks. Integrating DistilBERT into a query
cal approach of query annotation and model training understanding pipeline, alongside business rules and lo-
gistic regression, enhances the system’s accuracy and 3. Models and ensemble
robustness.
The entity extraction ensemble comprises: (1) a condi-
evaluation
tional random fields model; (2) a catalog-based lookup The engine’s configuration represents the ensemble as
table to extract Brand, Flavor, and Nutrition; and (3) a rule- a sequence of operations, called nodes, organized into
based Duckling library1 to extract numerical entities such a graph. The edges of this graph represent the interde-
as Price and Quantity. pendencies between nodes. The engine organizes and
dispatches computations to maximize parallelism. Ma-
2.2. Classification decision fusion chine learning models for query classification are trained
on product catalog data and tested on user queries, ensur-
In our ensemble learning scenario, the models are trained ing equal representation of head, torso, and tail queries
on different data and have different, potentially over- in terms of frequency. Table 1 shows the sizes of the
lapping label spaces, unlike typical ensemble learning, training and testing data, and the output categories. We
where the same data is used to train all models. Due to predict two levels of product taxonomy: L1 with 17 cate-
the label space differences, decision fusion is performed gories and L2 with 169 categories. However, not all L1
on the predictor-by-label prediction matrix of confidence categories have L2 labels, making the L2 sets subsets of
scores rather than using a simple majority voting strategy. the L1 data. The NER test set is a subset of the manually
Rule-triggered hypotheses are assigned to a confidence annotated test data for non-numerical entities.
score of 1.0 taking priority on model-based predictions. The performance evaluation of the component models
The decision fusion process takes a matrix of confi- and the ensemble utilizes precision, recall, and F1-score
dence scores as input and outputs a vector of aggregated metrics. For multi-class classification tasks, we report
confidence scores. The label space difference is addressed accuracy along with macro-averaged precision, recall,
by applying a max operation on the column of predic- and F1-score to account for dataset imbalance. Entity
tion scores per label, ignoring the values with respect to extraction performance is assessed using micro-averaged
the label space membership. Taking the maximum score metrics and token-level accuracy, adhering to CoNLL-
per prediction approximates the product rule [16]. The style evaluation protocols.
final label is decided as the 𝑎𝑟𝑔𝑚𝑎𝑥 of this confidence To quantify the efficacy of the model ensemble, we con-
score vector. Unlike voting-based decision fusion, such ducted a comparative analysis against logistic regression
an approach allows aggregation of decisions from rules and DistilBERT for level one predictions, with results
and any number of predictors. presented in Table 2. DistilBERT demonstrates superior
performance compared to logistic regression across all
2.3. Entity span consolidation metrics. The ensemble model, however, consistently out-
performs both individual models.
Span consolidation aggregates entity extraction hypothe-
Consequently, the query understanding system adopts
ses from one or several entity extractors into a shallow
the ensemble approach in lieu of individual models. Rule-
parse containing only non-overlapping spans. By default,
based components are excluded from this evaluation due
this process is performed for spans from the same model,
to their limited data coverage and restricted label subsets.
but it can also be enabled for an ensemble of extractors.
Level two models show similar performance patterns
Inspired by [17], the span consolidation is performed
to level one, though with lower performance due to the
in three steps: (1) Identity consolidation: Resolves identi-
larger label space and fewer training documents per label.
cal spans by keeping the span with higher confidence, or
Entity ensemble performance aligns with other ensem-
randomly if confidences are equal; (2) Containment con-
bles, favoring precision.
solidation: Resolves spans contained within each other
While the ensemble approach demonstrates improved
by keeping the longer span, i.e., the one that contains
performance, it faces challenges with certain query types.
the other; (3) Overlap consolidation: Resolves overlap-
Extremely short queries (e.g., "chips" can refer to potato,
ping spans by keeping the longer span, or alternatively
tortilla, or chocolate) can be ambiguous without con-
merging them and assigning the label of the longest span.
text. Highly ambiguous queries (e.g., "greens") may span
Priority consolidation can be used to give higher weights
multiple categories within the grocery domain. Novel
to predictions from extractors with higher confidence.
products or brands not present in the training data pose
The decision fusion and span consolidation are gen-
difficulties. Complex, multi-intent queries (e.g., "organic
erally applied as the final step of the query understand-
gluten-free pasta sauce and whole grain spaghetti") can
ing pipeline to yield hypotheses containing only a non-
lead to misclassifications or incomplete entity extraction.
overlapping set of entities and a single classification pre-
Future work could explore incorporating user session
diction per level, as described in Section 4.
data or personalization techniques to provide additional
1
https://github.com/facebook/duckling
Table 1
Dataset sizes used to train and test components of the ensemble
Training Testing Labels
Level 1 230,463 5,445 17
Level 2 212,087 4,486 169
NER 17,862 544 3
Brands Lookup 9,924 – 1
Table 2
Models and ensemble performance
Model precision recall f1-score accuracy
L1 DistilBERT 0.77 0.77 0.77 0.81
L1 Logistic Regression 0.76 0.70 0.73 0.75
L1 Model Ensemble 0.79 0.79 0.79 0.82
L2 Model Ensemble 0.68 0.67 0.66 0.70
Entity Ensemble 0.83 0.59 0.69 0.74
context for ambiguous queries and improve handling of server. Each node is then mapped to a separate system
out-of-vocabulary terms and multi-intent queries. process using the Actor model [18] for inter-process com-
munication, with message passing between processes
handled using Ray [12].
4. Graph-based architecture for Each node is initialized by loading the models into
scalable processing memory, leveraging shared memory and copy-on-write
primitives provided by the server’s operating system.
Query understanding systems in e-commerce search en- Each node is loaded only once, and subsequent processes
gines must generate real-time responses within strict assigned the same node reference the original memory.
service level agreements (SLA). They execute complex Since the models are used for inference, not training,
logic involving different models interacting both in series there are no write operations, reducing memory foot-
and parallel. print and improving loading times. Finally, the batching
Our engine is constructed as a sequence of operations service handles the backpressure control system and the
(nodes) arranged in a graph showing their interdepen- REST API for listening to incoming requests.
dencies (edges). Like neural networks, the graph-based At startup, the engine performs several optimizations
engine organizes and dispatches each computation to on the graph topology. The simplest is graph culling,
maximize parallelism. removing nodes that do not interact with others. Each
Parallelization occurs at multiple levels, including node’s expected computational burden can be specified.
inter-operation parallelism and entire graph replicas, de- Simple nodes (e.g., string regex preprocessors) are less
pending on deployment requirements. Each operation resource-intensive than full neural network nodes. The
within the graph is a complex model component, requir- engine modifies the graph by combining nodes or inlining
ing specific optimization strategies, such as data vec- to facilitate parallel operations and minimize costly inter-
torization and memory sharing, to optimize the overall process communications. This results in lighter nodes
graph structure. being replicated multiple times and fused into heavier
We represent the graph using the notation node: nodes, each mapped to a single system process.
[arg1, arg2, ..., argN], where node requires After inlining, the engine performs graph linearization,
incoming edges from arg1 through argN. The full con- converting the graph into a linear sequence, where each
figuration of the graph can be seen in Appendix A node depends only on preceding nodes, not subsequent
The engine processes the notation by following these ones. The engine dispatches nodes in order, synchroniz-
steps: First, it optimizes the graph by joining (inlining) ing results only when necessary. This strategy minimizes
nodes based on certain criteria, which increases parallel pauses and maximizes parallelism. Nodes with a higher
operations as much as possible. Next, it decides how computational burden are prioritized, reducing the need
many replicas of the graph to run on a single physical for the backpressure control system, leveraging the fact
Full graph
Distilbert L1 TF-IDF L1 Sklearn L1 Fusor L1 Rules
Spacy
User Query Preproc CRF Fuse ALL END
Duckling
Distilbert L2 TF-IDF L2 Sklearn L2
Optimized graph Distilbert L1
Preproc | TF-IDF L1 | Sklearn L1
User Query Preproc | Duckling Fusor L1 | Rules | Fuse ALL END
Preproc | Spacy CRF
Preproc | TF-IDF L2 | Sklearn L2
Distilbert L2
Node's computational burden light normal heavy
Figure 2: Visualization of the ensemble as a computational graph. (Top) The graph as defined in Appendix A. (Bottom) The
graph after optimization by the engine.
that CPU and data transmission tasks are handled by 5. Performance analysis at scale
separate CPU circuitry.
Query understanding systems receive hundreds of in- Multiple tests were conducted using different AWS2 EC2
dividual requests per second. Processing a single request instances on the engine described in Section 4 and the en-
is expensive due to inter-node communications. Batch- semble configuration as in Appendix A. The optimal bal-
ing multiple requests reduces overhead and enables vec- ance between cost, latency, and throughput was achieved
torization, leveraging hardware primitives for efficient with the m6i.2xlarge instance, which features 8-Cores
processing. The batching algorithm uses two thresholds: Intel Xeon vCPU @ 3.5GHz, for which we report the
batch size and waiting time for further samples. This results.
balances server resource utilization and processing time. The test’s target SLA stipulated that response times
Lastly, the engine addresses CPU oversubscription [19], for 99% of requests should remain below 100ms.
which occurs when parallel execution threads exceed All tests initiate a single instance of the engine with a
available CPU cores, leading to overhead from context graph replication factor of one3 . Another server, which
switching. The backpressure control system ensures no hosts the client simulator implemented using a Python
more than 𝑁 nodes run in parallel, enhancing perfor- package called Locust, is instantiated. Both servers share
mance by reducing oversubscription. The number 𝑁 de- the same AWS network. The simulator issues multiple
pends on available CPU resources and the code executed queries to the engine’s server, each randomly sampled
within each node. A simple formula for determining 𝑁 from a dataset of actual queries over a sustained duration
is: of 30 seconds. The rate of each request follows an expo-
nential distribution with a rate of 𝑇 requests per second,
mimicking a Poisson process, a common model for traffic
⌊︂ ⌋︂
𝑀𝑐𝑝𝑢
𝑁= +1 (1)
max𝑖∈𝑛𝑜𝑑𝑒𝑠 (𝑡ℎ𝑟𝑒𝑎𝑑𝑠𝑖 ) patterns.
Table 3 reports the execution times of each node, along
where 𝑡ℎ𝑟𝑒𝑎𝑑𝑠𝑖 is the number of threads or processes
that an individual node can utilize independently, and 2
https://aws.amazon.com/
𝑀𝑐𝑝𝑢 denotes the available CPU cores on the server. 3
Replication factors greater than one were also tested, but they
caused immediate CPU oversubscription problems, as anticipated.
The SLA targets were unattainable without resorting to costly
GPUs.
Table 3
Quantiles for 𝑇 = 30𝑡𝑝𝑠, times are in milliseconds. Nodes are sorted from fastest to slowest.
name 50% 95% 99% name 50% 95% 99%
preprocessor 0.05 0.06 0.07 preprocessor 0.05 0.06 0.07
fusor l1 0.38 0.53 0.58 fusor l1 0.40 0.66 0.83
fuse all 0.58 0.95 1.11 fuse all 0.59 1.16 1.65
rules 0.87 1.29 1.56 rules 0.84 1.33 1.70
crf 0.94 1.35 1.96 crf 0.96 1.62 2.01
tfidf l2 1.36 2.39 5.22 tfidf l2 1.39 2.03 4.88
tfidf l1 1.41 2.29 4.75 tfidf l1 1.43 2.05 3.88
sklearn l1 1.62 2.68 4.86 sklearn l1 1.64 2.53 5.81
duckling 2.80 11.95 35.71 duckling 3.33 18.59 30.84
spacy 12.87 24.70 33.68 spacy 12.33 18.68 25.42
distilbert l1 14.77 27.27 39.20 sklearn l2 15.71 18.99 22.87
distilbert l2 14.90 27.75 37.58 distilbert l2 15.44 27.49 36.29
sklearn l2 17.40 29.53 39.93 distilbert l1 15.60 27.34 37.23
main loop 53.23 142.63 206.22 main loop 30.51 41.18 51.74
rest api 58.57 154.50 219.90 rest api 55.85 84.35 92.82
Batching disabled Batching enabled
with the main engine loop responsible for scheduling on cheaper hardware without a GPU and at low rates of
them and the outer REST API handling incoming requests 𝑇 . In production, multiple instances would handle fluctu-
and facilitating the connection between the engine and ating traffic, making batching efficient for scaling while
the outside world. The runtime of each individual node meeting the SLA. The optimal batching period should
must be strictly shorter than the main engine loop, repre- match the main loop time @ 99%, which is around 50𝑚𝑠
senting the actual time taken for parallel graph execution. in this case.
Node runtimes do not consider inter-process communi- From a single request’s perspective, with 𝑇 = 30𝑡𝑝𝑠,
cation, which is accounted for in the main loop. On the batches are dispatched precisely every 50𝑚𝑠, meaning
other hand, the Rest API contributes to the main loop by requests encounter a uniform distribution over this inter-
including the time required to handle the HTTP connec- val with an average wait of 25𝑚𝑠 in the batch queue. The
tion with the requesting client. The outer Rest API time entire batch is then processed, typically taking 𝑋 time to
must stay below 100ms @ 99% percentile to comply with complete before the response is extracted and forwarded
the target SLA. through the HTTP channel, taking an additional 𝑌 . Em-
When batching is disabled, at the given rate 𝑇 , new pirically, 𝑋 represents the main loop runtime, averaging
requests arrive while the server is still processing pre- around 30𝑚𝑠 @ 50%. The Rest API, implemented using
vious ones. These requests are immediately dispatched, FastAPI4 , has been benchmarked to yield a duration of
leading to CPU oversubscription, which slows down all 𝑌 ≈ 2 − 5𝑚𝑠, giving us
requests. This effect tends to cascade, as the increased
processing time makes it more likely that other requests
will arrive, further slowing the system. REST API @50% = 25ms + 30ms + 2ms ≈ 56.25ms
When batching is enabled, the engine pauses to accu-
mulate requests into a batch until thresholds of 5 sam- For REST API @ 99%, the wait time is always 25𝑚𝑠
ples or 50𝑚𝑠 are met. Given each request arrives every on average, but 𝑋 and 𝑌 change accordingly, giving
1/𝑇 ≈ 30𝑚𝑠, the average batch size is around 1.5 sam- approx 90 − 95𝑚𝑠.
ples. Therefore, vectorization alone cannot explain the
server’s ability to meet the target SLA. The process un- 6. Conclusion and future work
folds as follows: (1) the first batch is dispatched for pro-
cessing, (2) for the next 50𝑚𝑠, new requests are queued This paper proposed a novel ensemble approach for query
into a new batch while (3) the engine likely completes the understanding in e-commerce, combining deep learning
first batch within 51.7𝑚𝑠 (with 99% probability), (4) the models like DistilBERT with traditional techniques like
second batch is then dispatched, utilizing just released
resources. Thus, batching acts as backpressure control
4
https://fastapi.tiangolo.com/
logistic regression and rule-based systems. The ensem- indicator to the engine of what to select as the final result.
ble architecture aimed to capture the nuances of user The output key is also vital for the process of graph topol-
queries and provide robust performance across query ogy optimization and linearization described in Section 4.
types and categories. Data augmentation techniques This representation not only makes it easier to track data
were employed to improve the DistilBERT model’s han- flow but also helps optimize the query understanding
dling of brands, misspellings, and short queries. An opti- ensemble for real-time processing in e-commerce envi-
mized graph-based architecture using the Ray framework ronments.
enabled efficient, scalable processing of high-volume traf-
fic. Figure 3: Graph Representation of Query Understanding
While the ensemble performed well, there are limita- Ensemble
tions to address in future work. The system focused only
on navigational queries for product categorization and execution_graph:
entity extraction. Extending it to handle informational preprocessor: [ user_query ]
and other query types could further improve relevance. distilbert_l1: [ user_query ]
Exploring more advanced data augmentation, model com- distilbert_l2: [ user_query ]
tfidf_l1: [ preprocessor ]
pression, and hardware acceleration techniques could
tfidf_l2: [ preprocessor ]
enhance accuracy and efficiency. vui_duckling: [ preprocessor ]
The query understanding ensemble demonstrated spacy: [ preprocessor ]
the value of combining diverse models and leverag- crf: [ spacy ]
ing distributed computing frameworks for scalability sklearn_l1: [ tfidf_l1 ]
in e-commerce search engines. E-commerce platforms sklearn_l2: [ tfidf_l2 ]
can benefit from adopting similar, ensemble-based ap- fusor_l1: [ distilbert_l1, sklearn_l1 ]
proaches customized to their query traffic and product rules: [ spacy, fusor_l1 ]
data. The architecture enables efficient real-time query fuse_all: [
processing while meeting strict latency requirements, rules, crf, distilbert_l1, sklearn_l1,
distilbert_l2, sklearn_l2, vui_duckling
critical for delivering a seamless shopping experience.
]
outputs: [ user_query, preprocessor, parse ]
7. Appendix
A. Graph configuration Figure 3 illustrates the graph structure that defines the
Query Understanding Ensemble. The nodes represent
In our query understanding system, the relationships components that work together to process user queries
between various models and preprocessing components and extract meaningful insights. The graph starts with
are organized within a graph-based architecture. This preprocessing steps that normalize and clean the user in-
architecture plays a crucial role in managing the interde- put. Subsequently, components such as DistilBERT and
pendencies between different models, ensuring efficient TF-IDF are leveraged to extract semantic features and
computation and scalability. contextual information. Additional models like the CRF
The graph representation is designed to handle the (Conditional Random Fields) and vui_duckling focus
integration of multiple machine learning and rule-based on identifying specific entities such as brands, quantities,
models while facilitating optimized parallel processing. and attributes.
Each key in the graph corresponds to a node, which The outputs from these models are fused to-
indicates a component or model, and the associated value gether through specific nodes such as fusor_l1 and
is a list of other nodes that provide input to it. This fuse_all, which combine signals from the intermedi-
differs from traditional adjacency lists, where the focus ate models based on confidence scores and rule-based
is on child nodes. Instead, in our graph, the value lists decisions. The final outputs represent the processed user
contain ancestor nodes, indicating which components query, refined and enriched through multiple layers of
feed information into the current node. analysis, ready for downstream tasks such as categoriza-
A key aspect of this architecture is that certain el- tion and search relevance adjustments.
ements, such as user_query, are considered implicit This architecture’s flexibility and efficiency enable it
nodes representing external inputs to the system. These to handle the complexities of e-commerce queries in real
external inputs play a foundational role in initiating the time while supporting high-volume traffic and diverse
data flow throughout the graph. The architecture is query types. It also lays the groundwork for the perfor-
designed to handle multiple outputs, listed within the mance optimizations and parallel processing strategies
outputs key. This is not a graph node but serves as an
outlined in Section 4.
References tion for E-Commerce Search Queries, in: 2018
IEEE International Conference on Big Data (Big
[1] H. Deng, Y. Zhang (Eds.), Query Understanding Data), 2020. URL: https://api.semanticscholar.org/
for Search Engines, 1st ed., Springer, 2020. doi:10. CorpusID:219530417.
1007/978-3-030-58334-7. [10] M. Tsagkias, T. H. King, S. Kallumadi, V. Murdock,
[2] S. Jiang, Y. Hu, C. Kang, T. Daly, D. Yin, Y. Chang, M. de Rijke, Challenges and research opportunities
C. Zhai, Learning query and document relevance in ecommerce search and recommendations, SIGIR
from a web-scale click graph, in: Proceedings Forum (2020).
of the 39th International ACM SIGIR Conference [11] E. Shaikh, I. Mohiuddin, Y. Alufaisan, I. Nahvi,
on Research and Development in Information Re- Apache spark: A big data processing engine, 2019
trieval, SIGIR ’16, Association for Computing Ma- 2nd IEEE Middle East and North Africa COM-
chinery, New York, NY, USA, 2016, p. 185–194. Munications Conference (MENACOMM) (2019) 1–
doi:10.1145/2911451.2911531. 6. URL: https://api.semanticscholar.org/CorpusID:
[3] P. Nigam, Y. Song, V. Mohan, V. Lakshman, W. A. 211120979.
Ding, A. Shingavi, C. H. Teo, H. Gu, B. Yin, Semantic [12] P. Moritz, R. Nishihara, S. Wang, A. Tumanov,
product search, in: Proceedings of the 25th ACM R. Liaw, E. Liang, M. Elibol, Z. Yang, W. Paul, M. I.
SIGKDD International Conference on Knowledge Jordan, I. Stoica, Ray: a distributed framework for
Discovery & Data Mining, KDD ’19, Association for emerging ai applications, in: Proceedings of the
Computing Machinery, New York, NY, USA, 2019, 13th USENIX Conference on Operating Systems
p. 2876–2885. doi:10.1145/3292500.3330759. Design and Implementation, OSDI’18, USENIX As-
[4] Y.-C. Lin, A. Datta, G. Di Fabbrizio, E-commerce sociation, USA, 2018, p. 561–577.
Product Query Classification Using Implicit User’s [13] V. Sanh, L. Debut, J. Chaumond, T. Wolf, Distil-
Feedback from Clicks, in: 2018 IEEE International BERT, a distilled version of BERT: smaller, faster,
Conference on Big Data (Big Data), 2018, pp. 1955– cheaper and lighter, ArXiv abs/1910.01108 (2019).
1959. doi:10.1109/BigData.2018.8622008. URL: https://api.semanticscholar.org/CorpusID:
[5] G. Di Fabbrizio, E. Stepanov, F. Tessaro, Extreme 203626972.
Multi-label Query Classification for E-commerce, [14] M. Honnibal, I. Montani, S. Van Landeghem,
in: eCom’24: ACM SIGIR Workshop on eCommerce, A. Boyd, spaCy: Industrial-strength Natural Lan-
July 18, 2024, USA, 2024. guage Processing in Python (2020).
[6] J.-W. Ha, H. Pyo, J. Kim, Large-scale item cat- [15] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT:
egorization in e-commerce using multiple recur- Pre-training of Deep Bidirectional Transformers for
rent neural networks, in: Proceedings of the 22nd Language Understanding, in: J. Burstein, C. Do-
ACM SIGKDD International Conference on Knowl- ran, T. Solorio (Eds.), Proceedings of the 2019 Con-
edge Discovery and Data Mining, KDD ’16, As- ference of the North American Chapter of the As-
sociation for Computing Machinery, New York, sociation for Computational Linguistics: Human
NY, USA, 2016, p. 107–115. doi:10.1145/2939672. Language Technologies, Volume 1 (Long and Short
2939678. Papers), Association for Computational Linguis-
[7] Y. Qiu, C. Zhao, H. Zhang, J. Zhuo, T. Li, X. Zhang, tics, Minneapolis, Minnesota, 2019, pp. 4171–4186.
S. Wang, S. Xu, B. Long, W.-Y. Yang, Pre-training doi:10.18653/v1/N19-1423.
Tasks for User Intent Detection and Embedding Re- [16] J. Kittler, M. Hatef, R. Duin, J. Matas, On combining
trieval in E-commerce Search, in: Proceedings of classifiers, Pattern Analysis and Machine Intelli-
the 31st ACM International Conference on Infor- gence, IEEE Transactions on 20 (2002) 226–239.
mation & Knowledge Management, CIKM ’22, As- [17] F. Reiss, S. Raghavan, R. Krishnamurthy, H. Zhu,
sociation for Computing Machinery, New York, NY, S. Vaithyanathan, An algebraic approach to rule-
USA, 2022, p. 4424–4428. doi:10.1145/3511808. based information extraction, in: 2008 IEEE 24th In-
3557670. ternational Conference on Data Engineering, IEEE,
[8] D. Shen, Y. Li, X. Li, D. Zhou, Product query classi- 2008, pp. 933–942.
fication, in: Proceedings of the 18th ACM Confer- [18] C. Hewitt, Actor model of computation:
ence on Information and Knowledge Management, Scalable robust information systems, 2015.
CIKM ’09, Association for Computing Machinery, arXiv:1008.1459.
New York, NY, USA, 2009, p. 741–750. URL: https: [19] C. Iancu, S. Hofmeyr, F. Blagojević, Y. Zheng, Over-
//doi.org/10.1145/1645953.1646047. doi:10.1145/ subscription on multicore processors, in: 2010 IEEE
1645953.1646047. International Symposium on Parallel & Distributed
[9] B. Ramesh, Bhange, X. Cheng, M. Bowden, P. Goyal, Processing (IPDPS), 2010, pp. 1–11. doi:10.1109/
T. Packer, F. Javed, Named Entity Recogni- IPDPS.2010.5470434.