<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Scalable Query Understanding for E-commerce: An Ensemble Architecture with Graph-based Optimization</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giuseppe Di Fabbrizio</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Evgeny Stepanov</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ludovico Frizziero</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Filippo Tessaro</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>Query understanding is a critical component in e-commerce platforms, facilitating accurate interpretation of user intent and eficient retrieval of relevant products. This study investigates scalable query understanding techniques applied to a real-world use case in the e-commerce grocery domain. We propose a novel architecture that integrates deep learning models with traditional machine learning approaches to capture query nuances and deliver robust performance across diverse query types and categories. Experimental evaluations conducted on real-life datasets demonstrate the eficacy of our proposed solution in terms of both accuracy and scalability. The implementation of an optimized graph-based architecture utilizing the Ray framework enables eficient processing of high-volume trafic. Our ensemble approach achieves an absolute 2% improvement in accuracy over the best individual model. The findings underscore the advantages of combining diverse models in addressing the complexities of e-commerce query understanding.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Query classification</kwd>
        <kwd>Query understanding</kwd>
        <kwd>Distributed and scalable machine learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>E-commerce queries are often short, lacking context, and</title>
        <p>can have multiple interpretations [8]. Moreover, the
Accurately understanding and classifying user queries large-scale product catalogs in e-commerce platforms,
is crucial for providing a seamless shopping experience spanning thousands of categories and millions of
prodby boosting the product search results relevance in e- ucts, pose a significant challenge in accurately mapping
commerce [1]. Query understanding enables e-commerce queries to relevant categories and products.
platforms to interpret users’ intents, retrieve relevant Various approaches have been proposed to address
products, and personalize the user’s journey through the these challenges, leveraging traditional machine learning
shopping experience. However, the task of query under- techniques and deep learning models. Rule-based
sysstanding in e-commerce presents several challenges due tems and keyword matching have been widely used for
to the diverse nature of queries, the large-scale product query classification and entity recognition [ 9]. However,
catalogs, and the need for eficient processing of high- these approaches often struggle with the variability and
volume trafic with noisy behavioral signals [2, 3]. complexity of natural language queries. Diferent query</p>
        <p>Query understanding in e-commerce involves multiple intents require diferent algorithms to yield optimum
resub-tasks, such as query classification, entity recognition, sults [10]. Queries can be classified into navigational (e.g.,
and intent detection. Query classification aims to cate- product category, brand, title) and informational (e.g.,
gorize user queries into predefined product categories, product-related questions). While navigational queries
facilitating improved product retrieval and ranking [4, 5]. require exact matching to catalog products, informational
Entity recognition identifies key information within the queries necessitate applying more complex
understandquery, such as brand names, product attributes, and nu- ing techniques.
merical values, which can be used to refine the search Another critical aspect of query understanding in
eresults [6]. Intent detection focuses on understanding commerce is eficiently processing high-volume trafic.
the user’s underlying goal, such as product discovery, E-commerce platforms receive millions of queries daily,
comparison, or purchase [7]. requiring scalable and real-time query understanding</p>
        <p>One of the primary challenges in query understanding systems. Distributed computing frameworks, such as
is the inherent ambiguity and diversity of user queries. Apache Spark and Ray, have been employed to
parallelize query processing and handle the massive scale of
CLiC-it 2024: Tenth Italian Conference on Computational Linguistics, e-commerce data [11, 12].</p>
        <p>Dec 04 — 06, 2024, Pisa, Italy In this paper, we propose an ensemble approach for
†Work done when at VUI, Inc. query understanding in e-commerce, combining deep
$ difabbrizio@gmail.com (G. Di Fabbrizio); learning models and traditional techniques. Our
apstepanov.evgeny.a@gmail.com (E. Stepanov); proach leverages the strengths of both deep learning,
iflliupdpoovtiecsos.afrrioz9z6ie@rog@mgamil.caoilm.co(mF. (TLe.sFsarirzoz)iero); such as DistilBERT [13], and traditional models,
includ https://difabbrizio.com/ (G. Di Fabbrizio) ing logistic regression and rule-based systems. By
in© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License tegrating these diverse models, we aim to capture the
Attribution 4.0 International (CC BY 4.0).</p>
        <p>Pacific chicken broth organic gluten free
pantry&gt;&gt;soup</p>
        <p>Brand</p>
        <p>Product</p>
        <p>Nutrition</p>
        <p>Nutrition
Category</p>
        <p>Entities
(a) Query understanding parsing
(b) Search results
nuances of user queries and provide robust performance
across various query types and categories.</p>
        <p>We introduce an optimized graph-based architecture
based on the Ray framework [12], enabling eficient
processing of high-volume trafic and ensuring scalability.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Query understanding ensemble architecture</title>
      <p>In this paper, we focus on navigational queries and
classify them into product taxonomy categories while
applying named entity recognition (NER) to capture relevant
product attributes, such as Brand, Nutrition, Flavor, and
numeric attributes like quantities and measurements. Fig- 2.1. Query understanding pipeline and
ure 1 shows a typical example of a navigational search
query in an e-commerce grocery domain where the query ensemble components
“Pacific chicken broth organic gluten free” is parsed into The query understanding pipeline’s classification and
its attributes and categorized into its taxonomy label. entity extraction components are trained and tested on</p>
      <p>Classifying user queries into product taxonomy cate- pre-processed user queries. Common text pre-processing
gories is a typical document classification problem that is steps are applied, including spaCy’s tokenization,
lowercomplex and actively researched. The problem is compli- casing, and number normalization [14].
cated by the nature of available data, which can be either The classification ensemble consists of business rules,
product descriptions with user-provided categories or implemented as a lookup table, and two machine learning
user queries associated with catalog categories from user models: logistic regression and DistilBERT. DistilBERT
click-stream data. Products in the catalog are described is a compressed version of BERT [15] that retains 97%
in terms of attributes with associated values, and a subset of the original model’s performance while being 40%
of this mapping constitutes a set of entities that should smaller and 60% faster at inference time. The key idea is
be identified to build a search query and provide better to leverage knowledge distillation during the pre-training
search results. phase to learn a compact model that can be fine-tuned for</p>
      <p>Due to the rate of change in e-commerce, the classi- downstream tasks. Integrating DistilBERT into a query
cal approach of query annotation and model training understanding pipeline, alongside business rules and
lois prohibitive. Consequently, the query understanding
problem is cast as a document classification problem for
matching user queries to the product taxonomy tree
(categories) and a sequence labeling problem for entities of
interest. For each problem, we propose using an
ensemble approach with multiple models having diferent label
sets and relations. Specifically, we predict two levels of
the product taxonomy tree (L1 and L2) and extract the
corresponding entities mentioned in the queries. Each
level is predicted by an ensemble of models composed of
business rules and machine learning models. Similarly,
diferent machine learning and rule-based models are
used to extract entities of interest.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Models and ensemble evaluation</title>
      <p>gistic regression, enhances the system’s accuracy and
robustness.</p>
      <p>The entity extraction ensemble comprises: (1) a
conditional random fields model; (2) a catalog-based lookup The engine’s configuration represents the ensemble as
table to extract Brand, Flavor, and Nutrition; and (3) a rule- a sequence of operations, called nodes, organized into
based Duckling library1 to extract numerical entities such a graph. The edges of this graph represent the
interdeas Price and Quantity. pendencies between nodes. The engine organizes and
dispatches computations to maximize parallelism.
Ma2.2. Classification decision fusion chine learning models for query classification are trained
on product catalog data and tested on user queries,
ensurIn our ensemble learning scenario, the models are trained ing equal representation of head, torso, and tail queries
on diferent data and have diferent, potentially over- in terms of frequency. Table 1 shows the sizes of the
lapping label spaces, unlike typical ensemble learning, training and testing data, and the output categories. We
where the same data is used to train all models. Due to predict two levels of product taxonomy: L1 with 17
catethe label space diferences, decision fusion is performed gories and L2 with 169 categories. However, not all L1
on the predictor-by-label prediction matrix of confidence categories have L2 labels, making the L2 sets subsets of
scores rather than using a simple majority voting strategy. the L1 data. The NER test set is a subset of the manually
Rule-triggered hypotheses are assigned to a confidence annotated test data for non-numerical entities.
score of 1.0 taking priority on model-based predictions. The performance evaluation of the component models</p>
      <p>The decision fusion process takes a matrix of confi- and the ensemble utilizes precision, recall, and F1-score
dence scores as input and outputs a vector of aggregated metrics. For multi-class classification tasks, we report
confidence scores. The label space diference is addressed accuracy along with macro-averaged precision, recall,
by applying a max operation on the column of predic- and F1-score to account for dataset imbalance. Entity
tion scores per label, ignoring the values with respect to extraction performance is assessed using micro-averaged
the label space membership. Taking the maximum score metrics and token-level accuracy, adhering to
CoNLLper prediction approximates the product rule [16]. The style evaluation protocols.
ifnal label is decided as the  of this confidence To quantify the eficacy of the model ensemble, we
conscore vector. Unlike voting-based decision fusion, such ducted a comparative analysis against logistic regression
an approach allows aggregation of decisions from rules and DistilBERT for level one predictions, with results
and any number of predictors. presented in Table 2. DistilBERT demonstrates superior
performance compared to logistic regression across all
2.3. Entity span consolidation metrics. The ensemble model, however, consistently
outSpan consolidation aggregates entity extraction hypothe- performs both individual models.
ses from one or several entity extractors into a shallow Consequently, the query understanding system adopts
parse containing only non-overlapping spans. By default, the ensemble approach in lieu of individual models.
Rulethis process is performed for spans from the same model, based components are excluded from this evaluation due
but it can also be enabled for an ensemble of extractors. to their limited data coverage and restricted label subsets.</p>
      <p>Inspired by [17], the span consolidation is performed Level two models show similar performance patterns
in three steps: (1) Identity consolidation: Resolves identi- to level one, though with lower performance due to the
cal spans by keeping the span with higher confidence, or larger label space and fewer training documents per label.
randomly if confidences are equal; (2) Containment con- Entity ensemble performance aligns with other
ensemsolidation: Resolves spans contained within each other bles, favoring precision.
by keeping the longer span, i.e., the one that contains While the ensemble approach demonstrates improved
the other; (3) Overlap consolidation: Resolves overlap- performance, it faces challenges with certain query types.
ping spans by keeping the longer span, or alternatively Extremely short queries (e.g., "chips" can refer to potato,
merging them and assigning the label of the longest span. tortilla, or chocolate) can be ambiguous without
conPriority consolidation can be used to give higher weights text. Highly ambiguous queries (e.g., "greens") may span
to predictions from extractors with higher confidence. multiple categories within the grocery domain. Novel</p>
      <p>The decision fusion and span consolidation are gen- products or brands not present in the training data pose
erally applied as the final step of the query understand- dificulties. Complex, multi-intent queries (e.g., "organic
ing pipeline to yield hypotheses containing only a non- gluten-free pasta sauce and whole grain spaghetti") can
overlapping set of entities and a single classification pre- lead to misclassifications or incomplete entity extraction.
diction per level, as described in Section 4. Future work could explore incorporating user session
data or personalization techniques to provide additional
17
169
3
1
context for ambiguous queries and improve handling of server. Each node is then mapped to a separate system
out-of-vocabulary terms and multi-intent queries. process using the Actor model [18] for inter-process
communication, with message passing between processes
handled using Ray [12].
4. Graph-based architecture for Each node is initialized by loading the models into
scalable processing memory, leveraging shared memory and copy-on-write
primitives provided by the server’s operating system.</p>
      <p>Query understanding systems in e-commerce search en- Each node is loaded only once, and subsequent processes
gines must generate real-time responses within strict assigned the same node reference the original memory.
service level agreements (SLA). They execute complex Since the models are used for inference, not training,
logic involving diferent models interacting both in series there are no write operations, reducing memory
footand parallel. print and improving loading times. Finally, the batching</p>
      <p>Our engine is constructed as a sequence of operations service handles the backpressure control system and the
(nodes) arranged in a graph showing their interdepen- REST API for listening to incoming requests.
dencies (edges). Like neural networks, the graph-based At startup, the engine performs several optimizations
engine organizes and dispatches each computation to on the graph topology. The simplest is graph culling,
maximize parallelism. removing nodes that do not interact with others. Each</p>
      <p>Parallelization occurs at multiple levels, including node’s expected computational burden can be specified.
inter-operation parallelism and entire graph replicas, de- Simple nodes (e.g., string regex preprocessors) are less
pending on deployment requirements. Each operation resource-intensive than full neural network nodes. The
within the graph is a complex model component, requir- engine modifies the graph by combining nodes or inlining
ing specific optimization strategies, such as data vec- to facilitate parallel operations and minimize costly
intertorization and memory sharing, to optimize the overall process communications. This results in lighter nodes
graph structure. being replicated multiple times and fused into heavier</p>
      <p>We represent the graph using the notation node: nodes, each mapped to a single system process.
[arg1, arg2, ..., argN], where node requires After inlining, the engine performs graph linearization,
incoming edges from arg1 through argN. The full con- converting the graph into a linear sequence, where each
ifguration of the graph can be seen in Appendix A node depends only on preceding nodes, not subsequent</p>
      <p>The engine processes the notation by following these ones. The engine dispatches nodes in order,
synchronizsteps: First, it optimizes the graph by joining (inlining) ing results only when necessary. This strategy minimizes
nodes based on certain criteria, which increases parallel pauses and maximizes parallelism. Nodes with a higher
operations as much as possible. Next, it decides how computational burden are prioritized, reducing the need
many replicas of the graph to run on a single physical for the backpressure control system, leveraging the fact
TF-IDF L1</p>
      <p>Spacy
Distilbert L2</p>
      <p>TF-IDF L2</p>
      <p>Sklearn L2
Distilbert L1</p>
      <p>Sklearn L1</p>
      <p>Rules
Fusor L1
CRF
Duckling
Full graph
Optimized graph</p>
      <p>User Query
User Query</p>
      <p>Preproc
Fuse ALL</p>
      <p>END
Distilbert L1
Preproc | TF-IDF L1 | Sklearn L1
Preproc | Duckling
Preproc | Spacy
Preproc | TF-IDF L2 | Sklearn L2
Distilbert L2</p>
      <p>CRF</p>
      <p>Fusor L1 | Rules | Fuse ALL</p>
      <p>END
Node's computational burden
light
normal
heavy
processing. The batching algorithm uses two thresholds: Intel Xeon vCPU @ 3.5GHz, for which we report the</p>
      <p>Lastly, the engine addresses CPU oversubscription [19], for 99% of requests should remain below 100ms.
mance by reducing oversubscription. The number  de- the same AWS network. The simulator issues multiple
that CPU and data transmission tasks are handled by
separate CPU circuitry.</p>
      <p>Query understanding systems receive hundreds of
individual requests per second. Processing a single request
is expensive due to inter-node communications.
Batching multiple requests reduces overhead and enables
vectorization, leveraging hardware primitives for eficient
batch size and waiting time for further samples. This
balances server resource utilization and processing time.
which occurs when parallel execution threads exceed
available CPU cores, leading to overhead from context
switching. The backpressure control system ensures no
more than  nodes run in parallel, enhancing
perforpends on available CPU resources and the code executed
within each node. A simple formula for determining 
is:
 =
︂⌊</p>
      <p>max∈ (ℎ)
︂⌋</p>
      <p>+ 1
where ℎ is the number of threads or processes
that an individual node can utilize independently, and
 denotes the available CPU cores on the server.</p>
      <p>(1)
patterns.</p>
    </sec>
    <sec id="sec-4">
      <title>5. Performance analysis at scale</title>
      <p>Multiple tests were conducted using diferent AWS 2 EC2
instances on the engine described in Section 4 and the
ensemble configuration as in Appendix A. The optimal
balance between cost, latency, and throughput was achieved
with the m6i.2xlarge instance, which features 8-Cores
results.</p>
      <sec id="sec-4-1">
        <title>The test’s target SLA stipulated that response times</title>
        <p>All tests initiate a single instance of the engine with a
graph replication factor of one3. Another server, which
hosts the client simulator implemented using a Python
package called Locust, is instantiated. Both servers share
queries to the engine’s server, each randomly sampled
from a dataset of actual queries over a sustained duration
of 30 seconds. The rate of each request follows an
exponential distribution with a rate of  requests per second,
mimicking a Poisson process, a common model for trafic</p>
        <p>Table 3 reports the execution times of each node, along</p>
      </sec>
      <sec id="sec-4-2">
        <title>2https://aws.amazon.com/</title>
      </sec>
      <sec id="sec-4-3">
        <title>3Replication factors greater than one were also tested, but they</title>
        <p>caused immediate CPU oversubscription problems, as anticipated.</p>
        <p>The SLA targets were unattainable without resorting to costly
GPUs.
with the main engine loop responsible for scheduling on cheaper hardware without a GPU and at low rates of
them and the outer REST API handling incoming requests  . In production, multiple instances would handle
fluctuand facilitating the connection between the engine and ating trafic, making batching eficient for scaling while
the outside world. The runtime of each individual node meeting the SLA. The optimal batching period should
must be strictly shorter than the main engine loop, repre- match the main loop time @ 99%, which is around 50
senting the actual time taken for parallel graph execution. in this case.</p>
        <p>Node runtimes do not consider inter-process communi- From a single request’s perspective, with  = 30,
cation, which is accounted for in the main loop. On the batches are dispatched precisely every 50, meaning
other hand, the Rest API contributes to the main loop by requests encounter a uniform distribution over this
interincluding the time required to handle the HTTP connec- val with an average wait of 25 in the batch queue. The
tion with the requesting client. The outer Rest API time entire batch is then processed, typically taking  time to
must stay below 100ms @ 99% percentile to comply with complete before the response is extracted and forwarded
the target SLA. through the HTTP channel, taking an additional  .
Em</p>
        <p>When batching is disabled, at the given rate  , new pirically,  represents the main loop runtime, averaging
requests arrive while the server is still processing pre- around 30 @ 50%. The Rest API, implemented using
vious ones. These requests are immediately dispatched, FastAPI4, has been benchmarked to yield a duration of
leading to CPU oversubscription, which slows down all  ≈ 2 − 5, giving us
requests. This efect tends to cascade, as the increased
processing time makes it more likely that other requests
will arrive, further slowing the system. REST API @50% = 25ms + 30ms + 2ms ≈ 56.25ms</p>
        <p>When batching is enabled, the engine pauses to
accumulate requests into a batch until thresholds of 5 sam- For REST API @ 99%, the wait time is always 25
ples or 50 are met. Given each request arrives every on average, but  and  change accordingly, giving
1/ ≈ 30, the average batch size is around 1.5 sam- approx 90 − 95.
ples. Therefore, vectorization alone cannot explain the
server’s ability to meet the target SLA. The process un- 6. Conclusion and future work
folds as follows: (1) the first batch is dispatched for
processing, (2) for the next 50, new requests are queued This paper proposed a novel ensemble approach for query
into a new batch while (3) the engine likely completes the understanding in e-commerce, combining deep learning
ifrst batch within 51.7 (with 99% probability), (4) the models like DistilBERT with traditional techniques like
second batch is then dispatched, utilizing just released
resources. Thus, batching acts as backpressure control
logistic regression and rule-based systems. The ensem- indicator to the engine of what to select as the final result.
ble architecture aimed to capture the nuances of user The output key is also vital for the process of graph
topolqueries and provide robust performance across query ogy optimization and linearization described in Section 4.
types and categories. Data augmentation techniques This representation not only makes it easier to track data
were employed to improve the DistilBERT model’s han- flow but also helps optimize the query understanding
dling of brands, misspellings, and short queries. An opti- ensemble for real-time processing in e-commerce
envimized graph-based architecture using the Ray framework ronments.
enabled eficient, scalable processing of high-volume
trafifc.</p>
        <p>While the ensemble performed well, there are
limitations to address in future work. The system focused only
on navigational queries for product categorization and execution_graph:
entity extraction. Extending it to handle informational preprocessor: [ user_query ]
and other query types could further improve relevance. distilbert_l1: [ user_query ]
Exploring more advanced data augmentation, model com- distilbert_l2: [ user_query ]
pression, and hardware acceleration techniques could tfidf_l1: [ preprocessor ]
enhance accuracy and eficiency. tvfuiid_fd_ulc2k:lin[g:pr[eprporceepsrsoocres]sor ]</p>
        <p>The query understanding ensemble demonstrated spacy: [ preprocessor ]
the value of combining diverse models and leverag- crf: [ spacy ]
ing distributed computing frameworks for scalability sklearn_l1: [ tfidf_l1 ]
in e-commerce search engines. E-commerce platforms sklearn_l2: [ tfidf_l2 ]
can benefit from adopting similar, ensemble-based ap- fusor_l1: [ distilbert_l1, sklearn_l1 ]
proaches customized to their query trafic and product rules: [ spacy, fusor_l1 ]
data. The architecture enables eficient real-time query fuse_all: [
processing while meeting strict latency requirements, rules, crf, distilbert_l1, sklearn_l1,
critical for delivering a seamless shopping experience. distilbert_l2, sklearn_l2, vui_duckling
]
outputs: [ user_query, preprocessor, parse ]</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>7. Appendix</title>
    </sec>
    <sec id="sec-6">
      <title>A. Graph configuration</title>
      <p>Figure 3 illustrates the graph structure that defines the
Query Understanding Ensemble. The nodes represent
In our query understanding system, the relationships components that work together to process user queries
between various models and preprocessing components and extract meaningful insights. The graph starts with
are organized within a graph-based architecture. This preprocessing steps that normalize and clean the user
inarchitecture plays a crucial role in managing the interde- put. Subsequently, components such as DistilBERT and
pendencies between diferent models, ensuring eficient TF-IDF are leveraged to extract semantic features and
computation and scalability. contextual information. Additional models like the CRF</p>
      <p>The graph representation is designed to handle the (Conditional Random Fields) and vui_duckling focus
integration of multiple machine learning and rule-based on identifying specific entities such as brands, quantities,
models while facilitating optimized parallel processing. and attributes.</p>
      <p>Each key in the graph corresponds to a node, which The outputs from these models are fused
toindicates a component or model, and the associated value gether through specific nodes such as fusor_l1 and
is a list of other nodes that provide input to it. This fuse_all, which combine signals from the
intermedidifers from traditional adjacency lists, where the focus ate models based on confidence scores and rule-based
is on child nodes. Instead, in our graph, the value lists decisions. The final outputs represent the processed user
contain ancestor nodes, indicating which components query, refined and enriched through multiple layers of
feed information into the current node. analysis, ready for downstream tasks such as
categoriza</p>
      <p>A key aspect of this architecture is that certain el- tion and search relevance adjustments.
ements, such as user_query, are considered implicit This architecture’s flexibility and eficiency enable it
nodes representing external inputs to the system. These to handle the complexities of e-commerce queries in real
external inputs play a foundational role in initiating the time while supporting high-volume trafic and diverse
data flow throughout the graph. The architecture is query types. It also lays the groundwork for the
perfordesigned to handle multiple outputs, listed within the mance optimizations and parallel processing strategies
outputs key. This is not a graph node but serves as an outlined in Section 4.
tion for E-Commerce Search Queries, in: 2018
IEEE International Conference on Big Data (Big
[1] H. Deng, Y. Zhang (Eds.), Query Understanding Data), 2020. URL: https://api.semanticscholar.org/
for Search Engines, 1st ed., Springer, 2020. doi:10. CorpusID:219530417.</p>
      <p>1007/978-3-030-58334-7. [10] M. Tsagkias, T. H. King, S. Kallumadi, V. Murdock,
[2] S. Jiang, Y. Hu, C. Kang, T. Daly, D. Yin, Y. Chang, M. de Rijke, Challenges and research opportunities
C. Zhai, Learning query and document relevance in ecommerce search and recommendations, SIGIR
from a web-scale click graph, in: Proceedings Forum (2020).
of the 39th International ACM SIGIR Conference [11] E. Shaikh, I. Mohiuddin, Y. Alufaisan, I. Nahvi,
on Research and Development in Information Re- Apache spark: A big data processing engine, 2019
trieval, SIGIR ’16, Association for Computing Ma- 2nd IEEE Middle East and North Africa
COMchinery, New York, NY, USA, 2016, p. 185–194. Munications Conference (MENACOMM) (2019) 1–
doi:10.1145/2911451.2911531. 6. URL: https://api.semanticscholar.org/CorpusID:
[3] P. Nigam, Y. Song, V. Mohan, V. Lakshman, W. A. 211120979.</p>
      <p>Ding, A. Shingavi, C. H. Teo, H. Gu, B. Yin, Semantic [12] P. Moritz, R. Nishihara, S. Wang, A. Tumanov,
product search, in: Proceedings of the 25th ACM R. Liaw, E. Liang, M. Elibol, Z. Yang, W. Paul, M. I.
SIGKDD International Conference on Knowledge Jordan, I. Stoica, Ray: a distributed framework for
Discovery &amp; Data Mining, KDD ’19, Association for emerging ai applications, in: Proceedings of the
Computing Machinery, New York, NY, USA, 2019, 13th USENIX Conference on Operating Systems
p. 2876–2885. doi:10.1145/3292500.3330759. Design and Implementation, OSDI’18, USENIX
As[4] Y.-C. Lin, A. Datta, G. Di Fabbrizio, E-commerce sociation, USA, 2018, p. 561–577.</p>
      <p>Product Query Classification Using Implicit User’s [13] V. Sanh, L. Debut, J. Chaumond, T. Wolf,
DistilFeedback from Clicks, in: 2018 IEEE International BERT, a distilled version of BERT: smaller, faster,
Conference on Big Data (Big Data), 2018, pp. 1955– cheaper and lighter, ArXiv abs/1910.01108 (2019).
1959. doi:10.1109/BigData.2018.8622008. URL: https://api.semanticscholar.org/CorpusID:
[5] G. Di Fabbrizio, E. Stepanov, F. Tessaro, Extreme 203626972.</p>
      <p>Multi-label Query Classification for E-commerce, [14] M. Honnibal, I. Montani, S. Van Landeghem,
in: eCom’24: ACM SIGIR Workshop on eCommerce, A. Boyd, spaCy: Industrial-strength Natural
LanJuly 18, 2024, USA, 2024. guage Processing in Python (2020).
[6] J.-W. Ha, H. Pyo, J. Kim, Large-scale item cat- [15] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT:
egorization in e-commerce using multiple recur- Pre-training of Deep Bidirectional Transformers for
rent neural networks, in: Proceedings of the 22nd Language Understanding, in: J. Burstein, C.
DoACM SIGKDD International Conference on Knowl- ran, T. Solorio (Eds.), Proceedings of the 2019
Conedge Discovery and Data Mining, KDD ’16, As- ference of the North American Chapter of the
Association for Computing Machinery, New York, sociation for Computational Linguistics: Human
NY, USA, 2016, p. 107–115. doi:10.1145/2939672. Language Technologies, Volume 1 (Long and Short
2939678. Papers), Association for Computational
Linguis[7] Y. Qiu, C. Zhao, H. Zhang, J. Zhuo, T. Li, X. Zhang, tics, Minneapolis, Minnesota, 2019, pp. 4171–4186.</p>
      <p>S. Wang, S. Xu, B. Long, W.-Y. Yang, Pre-training doi:10.18653/v1/N19-1423.</p>
      <p>Tasks for User Intent Detection and Embedding Re- [16] J. Kittler, M. Hatef, R. Duin, J. Matas, On combining
trieval in E-commerce Search, in: Proceedings of classifiers, Pattern Analysis and Machine
Intellithe 31st ACM International Conference on Infor- gence, IEEE Transactions on 20 (2002) 226–239.
mation &amp; Knowledge Management, CIKM ’22, As- [17] F. Reiss, S. Raghavan, R. Krishnamurthy, H. Zhu,
sociation for Computing Machinery, New York, NY, S. Vaithyanathan, An algebraic approach to
ruleUSA, 2022, p. 4424–4428. doi:10.1145/3511808. based information extraction, in: 2008 IEEE 24th
In3557670. ternational Conference on Data Engineering, IEEE,
[8] D. Shen, Y. Li, X. Li, D. Zhou, Product query classi- 2008, pp. 933–942.</p>
      <p>ifcation, in: Proceedings of the 18th ACM Confer- [18] C. Hewitt, Actor model of computation:
ence on Information and Knowledge Management, Scalable robust information systems, 2015.
CIKM ’09, Association for Computing Machinery, arXiv:1008.1459.</p>
      <p>New York, NY, USA, 2009, p. 741–750. URL: https: [19] C. Iancu, S. Hofmeyr, F. Blagojević, Y. Zheng,
Over//doi.org/10.1145/1645953.1646047. doi:10.1145/ subscription on multicore processors, in: 2010 IEEE
1645953.1646047. International Symposium on Parallel &amp; Distributed
[9] B. Ramesh, Bhange, X. Cheng, M. Bowden, P. Goyal, Processing (IPDPS), 2010, pp. 1–11. doi:10.1109/
T. Packer, F. Javed, Named Entity Recogni- IPDPS.2010.5470434.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>