=Paper= {{Paper |id=Vol-3878/34_main_long |storemode=property |title=Scalable Query Understanding for E-commerce: An Ensemble Architecture with Graph-based Optimization |pdfUrl=https://ceur-ws.org/Vol-3878/34_main_long.pdf |volume=Vol-3878 |authors=Giuseppe Di Fabbrizio,Evgeny Stepanov,Ludovico Frizziero,Filippo Tessaro |dblpUrl=https://dblp.org/rec/conf/clic-it/FabbrizioSFT24 }} ==Scalable Query Understanding for E-commerce: An Ensemble Architecture with Graph-based Optimization== https://ceur-ws.org/Vol-3878/34_main_long.pdf
                                Scalable Query Understanding for E-commerce: An
                                Ensemble Architecture with Graph-based Optimization
                                Giuseppe Di Fabbrizio† , Evgeny Stepanov† , Ludovico Frizziero† and Filippo Tessaro†


                                               Abstract
                                               Query understanding is a critical component in e-commerce platforms, facilitating accurate interpretation of user intent
                                               and efficient retrieval of relevant products. This study investigates scalable query understanding techniques applied to a
                                               real-world use case in the e-commerce grocery domain. We propose a novel architecture that integrates deep learning models
                                               with traditional machine learning approaches to capture query nuances and deliver robust performance across diverse query
                                               types and categories. Experimental evaluations conducted on real-life datasets demonstrate the efficacy of our proposed
                                               solution in terms of both accuracy and scalability. The implementation of an optimized graph-based architecture utilizing
                                               the Ray framework enables efficient processing of high-volume traffic. Our ensemble approach achieves an absolute 2%
                                               improvement in accuracy over the best individual model. The findings underscore the advantages of combining diverse
                                               models in addressing the complexities of e-commerce query understanding.

                                               Keywords
                                               Query classification, Query understanding, Distributed and scalable machine learning.



                                1. Introduction                                                                                        E-commerce queries are often short, lacking context, and
                                                                                                                                       can have multiple interpretations [8]. Moreover, the
                                Accurately understanding and classifying user queries large-scale product catalogs in e-commerce platforms,
                                is crucial for providing a seamless shopping experience spanning thousands of categories and millions of prod-
                                by boosting the product search results relevance in e- ucts, pose a significant challenge in accurately mapping
                                commerce [1]. Query understanding enables e-commerce queries to relevant categories and products.
                                platforms to interpret users’ intents, retrieve relevant                                                  Various approaches have been proposed to address
                                products, and personalize the user’s journey through the these challenges, leveraging traditional machine learning
                                shopping experience. However, the task of query under- techniques and deep learning models. Rule-based sys-
                                standing in e-commerce presents several challenges due tems and keyword matching have been widely used for
                                to the diverse nature of queries, the large-scale product query classification and entity recognition [9]. However,
                                catalogs, and the need for efficient processing of high- these approaches often struggle with the variability and
                                volume traffic with noisy behavioral signals [2, 3].                                                   complexity of natural language queries. Different query
                                    Query understanding in e-commerce involves multiple intents require different algorithms to yield optimum re-
                                sub-tasks, such as query classification, entity recognition, sults [10]. Queries can be classified into navigational (e.g.,
                                and intent detection. Query classification aims to cate- product category, brand, title) and informational (e.g.,
                                gorize user queries into predefined product categories, product-related questions). While navigational queries
                                facilitating improved product retrieval and ranking [4, 5]. require exact matching to catalog products, informational
                                Entity recognition identifies key information within the queries necessitate applying more complex understand-
                                query, such as brand names, product attributes, and nu- ing techniques.
                                merical values, which can be used to refine the search                                                    Another critical aspect of query understanding in e-
                                results [6]. Intent detection focuses on understanding commerce is efficiently processing high-volume traffic.
                                the user’s underlying goal, such as product discovery, E-commerce platforms receive millions of queries daily,
                                comparison, or purchase [7].                                                                           requiring scalable and real-time query understanding
                                    One of the primary challenges in query understanding systems. Distributed computing frameworks, such as
                                is the inherent ambiguity and diversity of user queries. Apache Spark and Ray, have been employed to paral-
                                                                                                                                       lelize query processing and handle the massive scale of
                                CLiC-it 2024: Tenth Italian Conference on Computational Linguistics, e-commerce data [11, 12].
                                Dec 04 — 06, 2024, Pisa, Italy                                                                            In this paper, we propose an ensemble approach for
                                †
                                  Work done when at VUI, Inc.                                                                          query understanding in e-commerce, combining deep
                                $ difabbrizio@gmail.com (G. Di Fabbrizio);                                                             learning models and traditional techniques. Our ap-
                                stepanov.evgeny.a@gmail.com (E. Stepanov);
                                                                                                                                       proach leverages the strengths of both deep learning,
                                ludovico.frizziero@gmail.com (L. Frizziero);
                                filippotessaro96@gmail.com (F. Tessaro)                                                                such as DistilBERT [13], and traditional models, includ-
                                € https://difabbrizio.com/ (G. Di Fabbrizio)                                                           ing logistic regression and rule-based systems. By in-
                                          © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
                                          Attribution 4.0 International (CC BY 4.0).                                                   tegrating these diverse models, we aim to capture the




CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
      Pacific chicken broth organic gluten free


    pantry>>soup   Brand      Product     Nutrition   Nutrition



     Category                  Entities




                   (a) Query understanding parsing                                  (b) Search results

Figure 1: Query understanding parsing example with search results leveraging the query understanding signals



nuances of user queries and provide robust performance      is prohibitive. Consequently, the query understanding
across various query types and categories.                  problem is cast as a document classification problem for
  We introduce an optimized graph-based architecture        matching user queries to the product taxonomy tree (cat-
based on the Ray framework [12], enabling efficient pro-    egories) and a sequence labeling problem for entities of
cessing of high-volume traffic and ensuring scalability.    interest. For each problem, we propose using an ensem-
                                                            ble approach with multiple models having different label
                                                            sets and relations. Specifically, we predict two levels of
2. Query understanding ensemble the product taxonomy tree (L1 and L2) and extract the
     architecture                                           corresponding entities mentioned in the queries. Each
                                                            level is predicted by an ensemble of models composed of
In this paper, we focus on navigational queries and clas- business rules and machine learning models. Similarly,
sify them into product taxonomy categories while apply- different machine learning and rule-based models are
ing named entity recognition (NER) to capture relevant used to extract entities of interest.
product attributes, such as Brand, Nutrition, Flavor, and
numeric attributes like quantities and measurements. Fig- 2.1. Query understanding pipeline and
ure 1 shows a typical example of a navigational search
query in an e-commerce grocery domain where the query              ensemble components
“Pacific chicken broth organic gluten free” is parsed into The query understanding pipeline’s classification and
its attributes and categorized into its taxonomy label.     entity extraction components are trained and tested on
   Classifying user queries into product taxonomy cate- pre-processed user queries. Common text pre-processing
gories is a typical document classification problem that is steps are applied, including spaCy’s tokenization, lower-
complex and actively researched. The problem is compli- casing, and number normalization [14].
cated by the nature of available data, which can be either     The classification ensemble consists of business rules,
product descriptions with user-provided categories or implemented as a lookup table, and two machine learning
user queries associated with catalog categories from user models: logistic regression and DistilBERT. DistilBERT
click-stream data. Products in the catalog are described is a compressed version of BERT [15] that retains 97%
in terms of attributes with associated values, and a subset of the original model’s performance while being 40%
of this mapping constitutes a set of entities that should smaller and 60% faster at inference time. The key idea is
be identified to build a search query and provide better to leverage knowledge distillation during the pre-training
search results.                                             phase to learn a compact model that can be fine-tuned for
   Due to the rate of change in e-commerce, the classi- downstream tasks. Integrating DistilBERT into a query
cal approach of query annotation and model training understanding pipeline, alongside business rules and lo-
gistic regression, enhances the system’s accuracy and            3. Models and ensemble
robustness.
   The entity extraction ensemble comprises: (1) a condi-
                                                                    evaluation
tional random fields model; (2) a catalog-based lookup           The engine’s configuration represents the ensemble as
table to extract Brand, Flavor, and Nutrition; and (3) a rule-   a sequence of operations, called nodes, organized into
based Duckling library1 to extract numerical entities such       a graph. The edges of this graph represent the interde-
as Price and Quantity.                                           pendencies between nodes. The engine organizes and
                                                                 dispatches computations to maximize parallelism. Ma-
2.2. Classification decision fusion                              chine learning models for query classification are trained
                                                                 on product catalog data and tested on user queries, ensur-
In our ensemble learning scenario, the models are trained        ing equal representation of head, torso, and tail queries
on different data and have different, potentially over-          in terms of frequency. Table 1 shows the sizes of the
lapping label spaces, unlike typical ensemble learning,          training and testing data, and the output categories. We
where the same data is used to train all models. Due to          predict two levels of product taxonomy: L1 with 17 cate-
the label space differences, decision fusion is performed        gories and L2 with 169 categories. However, not all L1
on the predictor-by-label prediction matrix of confidence        categories have L2 labels, making the L2 sets subsets of
scores rather than using a simple majority voting strategy.      the L1 data. The NER test set is a subset of the manually
Rule-triggered hypotheses are assigned to a confidence           annotated test data for non-numerical entities.
score of 1.0 taking priority on model-based predictions.            The performance evaluation of the component models
   The decision fusion process takes a matrix of confi-          and the ensemble utilizes precision, recall, and F1-score
dence scores as input and outputs a vector of aggregated         metrics. For multi-class classification tasks, we report
confidence scores. The label space difference is addressed       accuracy along with macro-averaged precision, recall,
by applying a max operation on the column of predic-             and F1-score to account for dataset imbalance. Entity
tion scores per label, ignoring the values with respect to       extraction performance is assessed using micro-averaged
the label space membership. Taking the maximum score             metrics and token-level accuracy, adhering to CoNLL-
per prediction approximates the product rule [16]. The           style evaluation protocols.
final label is decided as the 𝑎𝑟𝑔𝑚𝑎𝑥 of this confidence             To quantify the efficacy of the model ensemble, we con-
score vector. Unlike voting-based decision fusion, such          ducted a comparative analysis against logistic regression
an approach allows aggregation of decisions from rules           and DistilBERT for level one predictions, with results
and any number of predictors.                                    presented in Table 2. DistilBERT demonstrates superior
                                                                 performance compared to logistic regression across all
2.3. Entity span consolidation                                   metrics. The ensemble model, however, consistently out-
                                                                 performs both individual models.
Span consolidation aggregates entity extraction hypothe-
                                                                    Consequently, the query understanding system adopts
ses from one or several entity extractors into a shallow
                                                                 the ensemble approach in lieu of individual models. Rule-
parse containing only non-overlapping spans. By default,
                                                                 based components are excluded from this evaluation due
this process is performed for spans from the same model,
                                                                 to their limited data coverage and restricted label subsets.
but it can also be enabled for an ensemble of extractors.
                                                                    Level two models show similar performance patterns
   Inspired by [17], the span consolidation is performed
                                                                 to level one, though with lower performance due to the
in three steps: (1) Identity consolidation: Resolves identi-
                                                                 larger label space and fewer training documents per label.
cal spans by keeping the span with higher confidence, or
                                                                 Entity ensemble performance aligns with other ensem-
randomly if confidences are equal; (2) Containment con-
                                                                 bles, favoring precision.
solidation: Resolves spans contained within each other
                                                                    While the ensemble approach demonstrates improved
by keeping the longer span, i.e., the one that contains
                                                                 performance, it faces challenges with certain query types.
the other; (3) Overlap consolidation: Resolves overlap-
                                                                 Extremely short queries (e.g., "chips" can refer to potato,
ping spans by keeping the longer span, or alternatively
                                                                 tortilla, or chocolate) can be ambiguous without con-
merging them and assigning the label of the longest span.
                                                                 text. Highly ambiguous queries (e.g., "greens") may span
Priority consolidation can be used to give higher weights
                                                                 multiple categories within the grocery domain. Novel
to predictions from extractors with higher confidence.
                                                                 products or brands not present in the training data pose
   The decision fusion and span consolidation are gen-
                                                                 difficulties. Complex, multi-intent queries (e.g., "organic
erally applied as the final step of the query understand-
                                                                 gluten-free pasta sauce and whole grain spaghetti") can
ing pipeline to yield hypotheses containing only a non-
                                                                 lead to misclassifications or incomplete entity extraction.
overlapping set of entities and a single classification pre-
                                                                    Future work could explore incorporating user session
diction per level, as described in Section 4.
                                                                 data or personalization techniques to provide additional
1
    https://github.com/facebook/duckling
Table 1
Dataset sizes used to train and test components of the ensemble

                                                      Training       Testing      Labels
                                Level 1                 230,463         5,445         17
                                Level 2                 212,087         4,486        169
                                NER                       17,862            544          3
                                Brands Lookup              9,924              –          1


Table 2
Models and ensemble performance

                    Model                          precision       recall     f1-score       accuracy
                    L1 DistilBERT                         0.77       0.77         0.77           0.81
                    L1 Logistic Regression                0.76       0.70         0.73           0.75
                    L1 Model Ensemble                     0.79       0.79         0.79           0.82
                    L2 Model Ensemble                     0.68       0.67         0.66           0.70
                    Entity Ensemble                       0.83       0.59         0.69           0.74


context for ambiguous queries and improve handling of         server. Each node is then mapped to a separate system
out-of-vocabulary terms and multi-intent queries.             process using the Actor model [18] for inter-process com-
                                                              munication, with message passing between processes
                                                              handled using Ray [12].
4. Graph-based architecture for                                  Each node is initialized by loading the models into
   scalable processing                                        memory, leveraging shared memory and copy-on-write
                                                              primitives provided by the server’s operating system.
Query understanding systems in e-commerce search en-          Each node is loaded only once, and subsequent processes
gines must generate real-time responses within strict         assigned the same node reference the original memory.
service level agreements (SLA). They execute complex          Since the models are used for inference, not training,
logic involving different models interacting both in series   there are no write operations, reducing memory foot-
and parallel.                                                 print and improving loading times. Finally, the batching
   Our engine is constructed as a sequence of operations      service handles the backpressure control system and the
(nodes) arranged in a graph showing their interdepen-         REST API for listening to incoming requests.
dencies (edges). Like neural networks, the graph-based           At startup, the engine performs several optimizations
engine organizes and dispatches each computation to           on the graph topology. The simplest is graph culling,
maximize parallelism.                                         removing nodes that do not interact with others. Each
   Parallelization occurs at multiple levels, including       node’s expected computational burden can be specified.
inter-operation parallelism and entire graph replicas, de-    Simple nodes (e.g., string regex preprocessors) are less
pending on deployment requirements. Each operation            resource-intensive than full neural network nodes. The
within the graph is a complex model component, requir-        engine modifies the graph by combining nodes or inlining
ing specific optimization strategies, such as data vec-       to facilitate parallel operations and minimize costly inter-
torization and memory sharing, to optimize the overall        process communications. This results in lighter nodes
graph structure.                                              being replicated multiple times and fused into heavier
   We represent the graph using the notation node:            nodes, each mapped to a single system process.
[arg1, arg2, ..., argN], where node requires                     After inlining, the engine performs graph linearization,
incoming edges from arg1 through argN. The full con-          converting the graph into a linear sequence, where each
figuration of the graph can be seen in Appendix A             node depends only on preceding nodes, not subsequent
   The engine processes the notation by following these       ones. The engine dispatches nodes in order, synchroniz-
steps: First, it optimizes the graph by joining (inlining)    ing results only when necessary. This strategy minimizes
nodes based on certain criteria, which increases parallel     pauses and maximizes parallelism. Nodes with a higher
operations as much as possible. Next, it decides how          computational burden are prioritized, reducing the need
many replicas of the graph to run on a single physical        for the backpressure control system, leveraging the fact
   Full graph

                     Distilbert L1           TF-IDF L1      Sklearn L1                   Fusor L1            Rules


                                              Spacy

     User Query        Preproc                                                               CRF                               Fuse ALL       END


                                                                                         Duckling

                     Distilbert L2           TF-IDF L2      Sklearn L2




   Optimized graph                                         Distilbert L1


                                                           Preproc | TF-IDF L1 | Sklearn L1


     User Query                                            Preproc | Duckling                                   Fusor L1 | Rules | Fuse ALL   END


                                                           Preproc | Spacy                             CRF


                                                           Preproc | TF-IDF L2 | Sklearn L2


                                                           Distilbert L2



                                     Node's computational burden                     light          normal           heavy


Figure 2: Visualization of the ensemble as a computational graph. (Top) The graph as defined in Appendix A. (Bottom) The
graph after optimization by the engine.



that CPU and data transmission tasks are handled by                             5. Performance analysis at scale
separate CPU circuitry.
    Query understanding systems receive hundreds of in-                         Multiple tests were conducted using different AWS2 EC2
dividual requests per second. Processing a single request                       instances on the engine described in Section 4 and the en-
is expensive due to inter-node communications. Batch-                           semble configuration as in Appendix A. The optimal bal-
ing multiple requests reduces overhead and enables vec-                         ance between cost, latency, and throughput was achieved
torization, leveraging hardware primitives for efficient                        with the m6i.2xlarge instance, which features 8-Cores
processing. The batching algorithm uses two thresholds:                         Intel Xeon vCPU @ 3.5GHz, for which we report the
batch size and waiting time for further samples. This                           results.
balances server resource utilization and processing time.                          The test’s target SLA stipulated that response times
    Lastly, the engine addresses CPU oversubscription [19],                     for 99% of requests should remain below 100ms.
which occurs when parallel execution threads exceed                                All tests initiate a single instance of the engine with a
available CPU cores, leading to overhead from context                           graph replication factor of one3 . Another server, which
switching. The backpressure control system ensures no                           hosts the client simulator implemented using a Python
more than 𝑁 nodes run in parallel, enhancing perfor-                            package called Locust, is instantiated. Both servers share
mance by reducing oversubscription. The number 𝑁 de-                            the same AWS network. The simulator issues multiple
pends on available CPU resources and the code executed                          queries to the engine’s server, each randomly sampled
within each node. A simple formula for determining 𝑁                            from a dataset of actual queries over a sustained duration
is:                                                                             of 30 seconds. The rate of each request follows an expo-
                                                                                nential distribution with a rate of 𝑇 requests per second,
                                                                                mimicking a Poisson process, a common model for traffic
                  ⌊︂                        ⌋︂
                            𝑀𝑐𝑝𝑢
            𝑁=                                 +1       (1)
                     max𝑖∈𝑛𝑜𝑑𝑒𝑠 (𝑡ℎ𝑟𝑒𝑎𝑑𝑠𝑖 )                                     patterns.
                                                                                   Table 3 reports the execution times of each node, along
  where 𝑡ℎ𝑟𝑒𝑎𝑑𝑠𝑖 is the number of threads or processes
that an individual node can utilize independently, and                          2
                                                                                    https://aws.amazon.com/
𝑀𝑐𝑝𝑢 denotes the available CPU cores on the server.                             3
                                                                                    Replication factors greater than one were also tested, but they
                                                                                    caused immediate CPU oversubscription problems, as anticipated.
                                                                                    The SLA targets were unattainable without resorting to costly
                                                                                    GPUs.
Table 3
Quantiles for 𝑇 = 30𝑡𝑝𝑠, times are in milliseconds. Nodes are sorted from fastest to slowest.

              name               50%        95%       99%            name                  50%     95%    99%
              preprocessor       0.05      0.06       0.07           preprocessor       0.05    0.06      0.07
              fusor l1           0.38      0.53       0.58           fusor l1           0.40    0.66      0.83
              fuse all           0.58      0.95       1.11           fuse all           0.59    1.16      1.65
              rules              0.87      1.29       1.56           rules              0.84    1.33      1.70
              crf                0.94      1.35       1.96           crf                0.96    1.62      2.01
              tfidf l2           1.36      2.39       5.22           tfidf l2           1.39    2.03      4.88
              tfidf l1           1.41      2.29       4.75           tfidf l1           1.43    2.05      3.88
              sklearn l1         1.62      2.68       4.86           sklearn l1         1.64    2.53      5.81
              duckling           2.80     11.95      35.71           duckling           3.33 18.59       30.84
              spacy             12.87     24.70      33.68           spacy             12.33 18.68       25.42
              distilbert l1     14.77     27.27      39.20           sklearn l2        15.71 18.99       22.87
              distilbert l2     14.90     27.75      37.58           distilbert l2     15.44 27.49       36.29
              sklearn l2        17.40     29.53      39.93           distilbert l1     15.60 27.34       37.23
              main loop         53.23 142.63        206.22           main loop         30.51 41.18       51.74
              rest api          58.57 154.50        219.90           rest api          55.85 84.35       92.82
                            Batching disabled                                     Batching enabled


with the main engine loop responsible for scheduling           on cheaper hardware without a GPU and at low rates of
them and the outer REST API handling incoming requests         𝑇 . In production, multiple instances would handle fluctu-
and facilitating the connection between the engine and         ating traffic, making batching efficient for scaling while
the outside world. The runtime of each individual node         meeting the SLA. The optimal batching period should
must be strictly shorter than the main engine loop, repre-     match the main loop time @ 99%, which is around 50𝑚𝑠
senting the actual time taken for parallel graph execution.    in this case.
Node runtimes do not consider inter-process communi-              From a single request’s perspective, with 𝑇 = 30𝑡𝑝𝑠,
cation, which is accounted for in the main loop. On the        batches are dispatched precisely every 50𝑚𝑠, meaning
other hand, the Rest API contributes to the main loop by       requests encounter a uniform distribution over this inter-
including the time required to handle the HTTP connec-         val with an average wait of 25𝑚𝑠 in the batch queue. The
tion with the requesting client. The outer Rest API time       entire batch is then processed, typically taking 𝑋 time to
must stay below 100ms @ 99% percentile to comply with          complete before the response is extracted and forwarded
the target SLA.                                                through the HTTP channel, taking an additional 𝑌 . Em-
   When batching is disabled, at the given rate 𝑇 , new        pirically, 𝑋 represents the main loop runtime, averaging
requests arrive while the server is still processing pre-      around 30𝑚𝑠 @ 50%. The Rest API, implemented using
vious ones. These requests are immediately dispatched,         FastAPI4 , has been benchmarked to yield a duration of
leading to CPU oversubscription, which slows down all          𝑌 ≈ 2 − 5𝑚𝑠, giving us
requests. This effect tends to cascade, as the increased
processing time makes it more likely that other requests
will arrive, further slowing the system.                           REST API @50% = 25ms + 30ms + 2ms ≈ 56.25ms
   When batching is enabled, the engine pauses to accu-
mulate requests into a batch until thresholds of 5 sam-          For REST API @ 99%, the wait time is always 25𝑚𝑠
ples or 50𝑚𝑠 are met. Given each request arrives every         on average, but 𝑋 and 𝑌 change accordingly, giving
1/𝑇 ≈ 30𝑚𝑠, the average batch size is around 1.5 sam-          approx 90 − 95𝑚𝑠.
ples. Therefore, vectorization alone cannot explain the
server’s ability to meet the target SLA. The process un-       6. Conclusion and future work
folds as follows: (1) the first batch is dispatched for pro-
cessing, (2) for the next 50𝑚𝑠, new requests are queued        This paper proposed a novel ensemble approach for query
into a new batch while (3) the engine likely completes the     understanding in e-commerce, combining deep learning
first batch within 51.7𝑚𝑠 (with 99% probability), (4) the      models like DistilBERT with traditional techniques like
second batch is then dispatched, utilizing just released
resources. Thus, batching acts as backpressure control
                                                               4
                                                                   https://fastapi.tiangolo.com/
logistic regression and rule-based systems. The ensem-        indicator to the engine of what to select as the final result.
ble architecture aimed to capture the nuances of user         The output key is also vital for the process of graph topol-
queries and provide robust performance across query           ogy optimization and linearization described in Section 4.
types and categories. Data augmentation techniques            This representation not only makes it easier to track data
were employed to improve the DistilBERT model’s han-          flow but also helps optimize the query understanding
dling of brands, misspellings, and short queries. An opti-    ensemble for real-time processing in e-commerce envi-
mized graph-based architecture using the Ray framework        ronments.
enabled efficient, scalable processing of high-volume traf-
fic.                                                          Figure 3: Graph Representation of Query Understanding
   While the ensemble performed well, there are limita-       Ensemble
tions to address in future work. The system focused only
on navigational queries for product categorization and        execution_graph:
entity extraction. Extending it to handle informational         preprocessor: [ user_query ]
and other query types could further improve relevance.          distilbert_l1: [ user_query ]
Exploring more advanced data augmentation, model com-           distilbert_l2: [ user_query ]
                                                                tfidf_l1: [ preprocessor ]
pression, and hardware acceleration techniques could
                                                                tfidf_l2: [ preprocessor ]
enhance accuracy and efficiency.                                vui_duckling: [ preprocessor ]
   The query understanding ensemble demonstrated                spacy: [ preprocessor ]
the value of combining diverse models and leverag-              crf: [ spacy ]
ing distributed computing frameworks for scalability            sklearn_l1: [ tfidf_l1 ]
in e-commerce search engines. E-commerce platforms              sklearn_l2: [ tfidf_l2 ]
can benefit from adopting similar, ensemble-based ap-           fusor_l1: [ distilbert_l1, sklearn_l1 ]
proaches customized to their query traffic and product          rules: [ spacy, fusor_l1 ]
data. The architecture enables efficient real-time query        fuse_all: [
processing while meeting strict latency requirements,             rules, crf, distilbert_l1, sklearn_l1,
                                                                  distilbert_l2, sklearn_l2, vui_duckling
critical for delivering a seamless shopping experience.
                                                                ]
                                                              outputs: [ user_query, preprocessor, parse ]
7. Appendix
A. Graph configuration                                       Figure 3 illustrates the graph structure that defines the
                                                           Query Understanding Ensemble. The nodes represent
In our query understanding system, the relationships components that work together to process user queries
between various models and preprocessing components and extract meaningful insights. The graph starts with
are organized within a graph-based architecture. This preprocessing steps that normalize and clean the user in-
architecture plays a crucial role in managing the interde- put. Subsequently, components such as DistilBERT and
pendencies between different models, ensuring efficient TF-IDF are leveraged to extract semantic features and
computation and scalability.                               contextual information. Additional models like the CRF
   The graph representation is designed to handle the (Conditional Random Fields) and vui_duckling focus
integration of multiple machine learning and rule-based on identifying specific entities such as brands, quantities,
models while facilitating optimized parallel processing. and attributes.
Each key in the graph corresponds to a node, which           The outputs from these models are fused to-
indicates a component or model, and the associated value gether through specific nodes such as fusor_l1 and
is a list of other nodes that provide input to it. This fuse_all, which combine signals from the intermedi-
differs from traditional adjacency lists, where the focus ate models based on confidence scores and rule-based
is on child nodes. Instead, in our graph, the value lists decisions. The final outputs represent the processed user
contain ancestor nodes, indicating which components query, refined and enriched through multiple layers of
feed information into the current node.                    analysis, ready for downstream tasks such as categoriza-
   A key aspect of this architecture is that certain el- tion and search relevance adjustments.
ements, such as user_query, are considered implicit          This architecture’s flexibility and efficiency enable it
nodes representing external inputs to the system. These to handle the complexities of e-commerce queries in real
external inputs play a foundational role in initiating the time while supporting high-volume traffic and diverse
data flow throughout the graph. The architecture is query types. It also lays the groundwork for the perfor-
designed to handle multiple outputs, listed within the mance optimizations and parallel processing strategies
outputs key. This is not a graph node but serves as an
                                                           outlined in Section 4.
References                                                       tion for E-Commerce Search Queries, in: 2018
                                                                 IEEE International Conference on Big Data (Big
[1] H. Deng, Y. Zhang (Eds.), Query Understanding                Data), 2020. URL: https://api.semanticscholar.org/
    for Search Engines, 1st ed., Springer, 2020. doi:10.         CorpusID:219530417.
    1007/978-3-030-58334-7.                                 [10] M. Tsagkias, T. H. King, S. Kallumadi, V. Murdock,
[2] S. Jiang, Y. Hu, C. Kang, T. Daly, D. Yin, Y. Chang,         M. de Rijke, Challenges and research opportunities
    C. Zhai, Learning query and document relevance               in ecommerce search and recommendations, SIGIR
    from a web-scale click graph, in: Proceedings                Forum (2020).
    of the 39th International ACM SIGIR Conference          [11] E. Shaikh, I. Mohiuddin, Y. Alufaisan, I. Nahvi,
    on Research and Development in Information Re-               Apache spark: A big data processing engine, 2019
    trieval, SIGIR ’16, Association for Computing Ma-            2nd IEEE Middle East and North Africa COM-
    chinery, New York, NY, USA, 2016, p. 185–194.                Munications Conference (MENACOMM) (2019) 1–
    doi:10.1145/2911451.2911531.                                 6. URL: https://api.semanticscholar.org/CorpusID:
[3] P. Nigam, Y. Song, V. Mohan, V. Lakshman, W. A.              211120979.
    Ding, A. Shingavi, C. H. Teo, H. Gu, B. Yin, Semantic   [12] P. Moritz, R. Nishihara, S. Wang, A. Tumanov,
    product search, in: Proceedings of the 25th ACM              R. Liaw, E. Liang, M. Elibol, Z. Yang, W. Paul, M. I.
    SIGKDD International Conference on Knowledge                 Jordan, I. Stoica, Ray: a distributed framework for
    Discovery & Data Mining, KDD ’19, Association for            emerging ai applications, in: Proceedings of the
    Computing Machinery, New York, NY, USA, 2019,                13th USENIX Conference on Operating Systems
    p. 2876–2885. doi:10.1145/3292500.3330759.                   Design and Implementation, OSDI’18, USENIX As-
[4] Y.-C. Lin, A. Datta, G. Di Fabbrizio, E-commerce             sociation, USA, 2018, p. 561–577.
    Product Query Classification Using Implicit User’s      [13] V. Sanh, L. Debut, J. Chaumond, T. Wolf, Distil-
    Feedback from Clicks, in: 2018 IEEE International            BERT, a distilled version of BERT: smaller, faster,
    Conference on Big Data (Big Data), 2018, pp. 1955–           cheaper and lighter, ArXiv abs/1910.01108 (2019).
    1959. doi:10.1109/BigData.2018.8622008.                      URL: https://api.semanticscholar.org/CorpusID:
[5] G. Di Fabbrizio, E. Stepanov, F. Tessaro, Extreme            203626972.
    Multi-label Query Classification for E-commerce,        [14] M. Honnibal, I. Montani, S. Van Landeghem,
    in: eCom’24: ACM SIGIR Workshop on eCommerce,                A. Boyd, spaCy: Industrial-strength Natural Lan-
    July 18, 2024, USA, 2024.                                    guage Processing in Python (2020).
[6] J.-W. Ha, H. Pyo, J. Kim, Large-scale item cat-         [15] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT:
    egorization in e-commerce using multiple recur-              Pre-training of Deep Bidirectional Transformers for
    rent neural networks, in: Proceedings of the 22nd            Language Understanding, in: J. Burstein, C. Do-
    ACM SIGKDD International Conference on Knowl-                ran, T. Solorio (Eds.), Proceedings of the 2019 Con-
    edge Discovery and Data Mining, KDD ’16, As-                 ference of the North American Chapter of the As-
    sociation for Computing Machinery, New York,                 sociation for Computational Linguistics: Human
    NY, USA, 2016, p. 107–115. doi:10.1145/2939672.              Language Technologies, Volume 1 (Long and Short
    2939678.                                                     Papers), Association for Computational Linguis-
[7] Y. Qiu, C. Zhao, H. Zhang, J. Zhuo, T. Li, X. Zhang,         tics, Minneapolis, Minnesota, 2019, pp. 4171–4186.
    S. Wang, S. Xu, B. Long, W.-Y. Yang, Pre-training            doi:10.18653/v1/N19-1423.
    Tasks for User Intent Detection and Embedding Re-       [16] J. Kittler, M. Hatef, R. Duin, J. Matas, On combining
    trieval in E-commerce Search, in: Proceedings of             classifiers, Pattern Analysis and Machine Intelli-
    the 31st ACM International Conference on Infor-              gence, IEEE Transactions on 20 (2002) 226–239.
    mation & Knowledge Management, CIKM ’22, As-            [17] F. Reiss, S. Raghavan, R. Krishnamurthy, H. Zhu,
    sociation for Computing Machinery, New York, NY,             S. Vaithyanathan, An algebraic approach to rule-
    USA, 2022, p. 4424–4428. doi:10.1145/3511808.                based information extraction, in: 2008 IEEE 24th In-
    3557670.                                                     ternational Conference on Data Engineering, IEEE,
[8] D. Shen, Y. Li, X. Li, D. Zhou, Product query classi-        2008, pp. 933–942.
    fication, in: Proceedings of the 18th ACM Confer-       [18] C. Hewitt, Actor model of computation:
    ence on Information and Knowledge Management,                Scalable robust information systems, 2015.
    CIKM ’09, Association for Computing Machinery,               arXiv:1008.1459.
    New York, NY, USA, 2009, p. 741–750. URL: https:        [19] C. Iancu, S. Hofmeyr, F. Blagojević, Y. Zheng, Over-
    //doi.org/10.1145/1645953.1646047. doi:10.1145/              subscription on multicore processors, in: 2010 IEEE
    1645953.1646047.                                             International Symposium on Parallel & Distributed
[9] B. Ramesh, Bhange, X. Cheng, M. Bowden, P. Goyal,            Processing (IPDPS), 2010, pp. 1–11. doi:10.1109/
    T. Packer, F. Javed,        Named Entity Recogni-            IPDPS.2010.5470434.