1. Introduction

Scalable Query Understanding for E-commerce: An Ensemble Architecture with Graph-based Optimization

Giuseppe Di Fabbrizio

Evgeny Stepanov

Ludovico Frizziero

Filippo Tessaro

Query understanding is a critical component in e-commerce platforms, facilitating accurate interpretation of user intent and eficient retrieval of relevant products. This study investigates scalable query understanding techniques applied to a real-world use case in the e-commerce grocery domain. We propose a novel architecture that integrates deep learning models with traditional machine learning approaches to capture query nuances and deliver robust performance across diverse query types and categories. Experimental evaluations conducted on real-life datasets demonstrate the eficacy of our proposed solution in terms of both accuracy and scalability. The implementation of an optimized graph-based architecture utilizing the Ray framework enables eficient processing of high-volume trafic. Our ensemble approach achieves an absolute 2% improvement in accuracy over the best individual model. The findings underscore the advantages of combining diverse models in addressing the complexities of e-commerce query understanding.

eol>Query classification Query understanding Distributed and scalable machine learning

1. Introduction E-commerce queries are often short, lacking context, and

can have multiple interpretations [8]. Moreover, the Accurately understanding and classifying user queries large-scale product catalogs in e-commerce platforms, is crucial for providing a seamless shopping experience spanning thousands of categories and millions of prodby boosting the product search results relevance in e- ucts, pose a significant challenge in accurately mapping commerce [1]. Query understanding enables e-commerce queries to relevant categories and products. platforms to interpret users’ intents, retrieve relevant Various approaches have been proposed to address products, and personalize the user’s journey through the these challenges, leveraging traditional machine learning shopping experience. However, the task of query under- techniques and deep learning models. Rule-based sysstanding in e-commerce presents several challenges due tems and keyword matching have been widely used for to the diverse nature of queries, the large-scale product query classification and entity recognition [ 9]. However, catalogs, and the need for eficient processing of high- these approaches often struggle with the variability and volume trafic with noisy behavioral signals [2, 3]. complexity of natural language queries. Diferent query

Query understanding in e-commerce involves multiple intents require diferent algorithms to yield optimum resub-tasks, such as query classification, entity recognition, sults [10]. Queries can be classified into navigational (e.g., and intent detection. Query classification aims to cate- product category, brand, title) and informational (e.g., gorize user queries into predefined product categories, product-related questions). While navigational queries facilitating improved product retrieval and ranking [4, 5]. require exact matching to catalog products, informational Entity recognition identifies key information within the queries necessitate applying more complex understandquery, such as brand names, product attributes, and nu- ing techniques. merical values, which can be used to refine the search Another critical aspect of query understanding in eresults [6]. Intent detection focuses on understanding commerce is eficiently processing high-volume trafic. the user’s underlying goal, such as product discovery, E-commerce platforms receive millions of queries daily, comparison, or purchase [7]. requiring scalable and real-time query understanding

One of the primary challenges in query understanding systems. Distributed computing frameworks, such as is the inherent ambiguity and diversity of user queries. Apache Spark and Ray, have been employed to parallelize query processing and handle the massive scale of CLiC-it 2024: Tenth Italian Conference on Computational Linguistics, e-commerce data [11, 12].

Dec 04 — 06, 2024, Pisa, Italy In this paper, we propose an ensemble approach for †Work done when at VUI, Inc. query understanding in e-commerce, combining deep $ difabbrizio@gmail.com (G. Di Fabbrizio); learning models and traditional techniques. Our apstepanov.evgeny.a@gmail.com (E. Stepanov); proach leverages the strengths of both deep learning, iflliupdpoovtiecsos.afrrioz9z6ie@rog@mgamil.caoilm.co(mF. (TLe.sFsarirzoz)iero); such as DistilBERT [13], and traditional models, includ https://difabbrizio.com/ (G. Di Fabbrizio) ing logistic regression and rule-based systems. By in© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License tegrating these diverse models, we aim to capture the Attribution 4.0 International (CC BY 4.0).

Pacific chicken broth organic gluten free pantry>>soup

Brand

Product

Nutrition

Nutrition Category

Entities (a) Query understanding parsing (b) Search results nuances of user queries and provide robust performance across various query types and categories.

We introduce an optimized graph-based architecture based on the Ray framework [12], enabling eficient processing of high-volume trafic and ensuring scalability.

2. Query understanding ensemble architecture

In this paper, we focus on navigational queries and classify them into product taxonomy categories while applying named entity recognition (NER) to capture relevant product attributes, such as Brand, Nutrition, Flavor, and numeric attributes like quantities and measurements. Fig- 2.1. Query understanding pipeline and ure 1 shows a typical example of a navigational search query in an e-commerce grocery domain where the query ensemble components “Pacific chicken broth organic gluten free” is parsed into The query understanding pipeline’s classification and its attributes and categorized into its taxonomy label. entity extraction components are trained and tested on

Classifying user queries into product taxonomy cate- pre-processed user queries. Common text pre-processing gories is a typical document classification problem that is steps are applied, including spaCy’s tokenization, lowercomplex and actively researched. The problem is compli- casing, and number normalization [14]. cated by the nature of available data, which can be either The classification ensemble consists of business rules, product descriptions with user-provided categories or implemented as a lookup table, and two machine learning user queries associated with catalog categories from user models: logistic regression and DistilBERT. DistilBERT click-stream data. Products in the catalog are described is a compressed version of BERT [15] that retains 97% in terms of attributes with associated values, and a subset of the original model’s performance while being 40% of this mapping constitutes a set of entities that should smaller and 60% faster at inference time. The key idea is be identified to build a search query and provide better to leverage knowledge distillation during the pre-training search results. phase to learn a compact model that can be fine-tuned for

Due to the rate of change in e-commerce, the classi- downstream tasks. Integrating DistilBERT into a query cal approach of query annotation and model training understanding pipeline, alongside business rules and lois prohibitive. Consequently, the query understanding problem is cast as a document classification problem for matching user queries to the product taxonomy tree (categories) and a sequence labeling problem for entities of interest. For each problem, we propose using an ensemble approach with multiple models having diferent label sets and relations. Specifically, we predict two levels of the product taxonomy tree (L1 and L2) and extract the corresponding entities mentioned in the queries. Each level is predicted by an ensemble of models composed of business rules and machine learning models. Similarly, diferent machine learning and rule-based models are used to extract entities of interest.

3. Models and ensemble evaluation

gistic regression, enhances the system’s accuracy and robustness.

The entity extraction ensemble comprises: (1) a conditional random fields model; (2) a catalog-based lookup The engine’s configuration represents the ensemble as table to extract Brand, Flavor, and Nutrition; and (3) a rule- a sequence of operations, called nodes, organized into based Duckling library1 to extract numerical entities such a graph. The edges of this graph represent the interdeas Price and Quantity. pendencies between nodes. The engine organizes and dispatches computations to maximize parallelism. Ma2.2. Classification decision fusion chine learning models for query classification are trained on product catalog data and tested on user queries, ensurIn our ensemble learning scenario, the models are trained ing equal representation of head, torso, and tail queries on diferent data and have diferent, potentially over- in terms of frequency. Table 1 shows the sizes of the lapping label spaces, unlike typical ensemble learning, training and testing data, and the output categories. We where the same data is used to train all models. Due to predict two levels of product taxonomy: L1 with 17 catethe label space diferences, decision fusion is performed gories and L2 with 169 categories. However, not all L1 on the predictor-by-label prediction matrix of confidence categories have L2 labels, making the L2 sets subsets of scores rather than using a simple majority voting strategy. the L1 data. The NER test set is a subset of the manually Rule-triggered hypotheses are assigned to a confidence annotated test data for non-numerical entities. score of 1.0 taking priority on model-based predictions. The performance evaluation of the component models

The decision fusion process takes a matrix of confi- and the ensemble utilizes precision, recall, and F1-score dence scores as input and outputs a vector of aggregated metrics. For multi-class classification tasks, we report confidence scores. The label space diference is addressed accuracy along with macro-averaged precision, recall, by applying a max operation on the column of predic- and F1-score to account for dataset imbalance. Entity tion scores per label, ignoring the values with respect to extraction performance is assessed using micro-averaged the label space membership. Taking the maximum score metrics and token-level accuracy, adhering to CoNLLper prediction approximates the product rule [16]. The style evaluation protocols. ifnal label is decided as the of this confidence To quantify the eficacy of the model ensemble, we conscore vector. Unlike voting-based decision fusion, such ducted a comparative analysis against logistic regression an approach allows aggregation of decisions from rules and DistilBERT for level one predictions, with results and any number of predictors. presented in Table 2. DistilBERT demonstrates superior performance compared to logistic regression across all 2.3. Entity span consolidation metrics. The ensemble model, however, consistently outSpan consolidation aggregates entity extraction hypothe- performs both individual models. ses from one or several entity extractors into a shallow Consequently, the query understanding system adopts parse containing only non-overlapping spans. By default, the ensemble approach in lieu of individual models. Rulethis process is performed for spans from the same model, based components are excluded from this evaluation due but it can also be enabled for an ensemble of extractors. to their limited data coverage and restricted label subsets.

Inspired by [17], the span consolidation is performed Level two models show similar performance patterns in three steps: (1) Identity consolidation: Resolves identi- to level one, though with lower performance due to the cal spans by keeping the span with higher confidence, or larger label space and fewer training documents per label. randomly if confidences are equal; (2) Containment con- Entity ensemble performance aligns with other ensemsolidation: Resolves spans contained within each other bles, favoring precision. by keeping the longer span, i.e., the one that contains While the ensemble approach demonstrates improved the other; (3) Overlap consolidation: Resolves overlap- performance, it faces challenges with certain query types. ping spans by keeping the longer span, or alternatively Extremely short queries (e.g., "chips" can refer to potato, merging them and assigning the label of the longest span. tortilla, or chocolate) can be ambiguous without conPriority consolidation can be used to give higher weights text. Highly ambiguous queries (e.g., "greens") may span to predictions from extractors with higher confidence. multiple categories within the grocery domain. Novel

The decision fusion and span consolidation are gen- products or brands not present in the training data pose erally applied as the final step of the query understand- dificulties. Complex, multi-intent queries (e.g., "organic ing pipeline to yield hypotheses containing only a non- gluten-free pasta sauce and whole grain spaghetti") can overlapping set of entities and a single classification pre- lead to misclassifications or incomplete entity extraction. diction per level, as described in Section 4. Future work could explore incorporating user session data or personalization techniques to provide additional 17 169 3 1 context for ambiguous queries and improve handling of server. Each node is then mapped to a separate system out-of-vocabulary terms and multi-intent queries. process using the Actor model [18] for inter-process communication, with message passing between processes handled using Ray [12]. 4. Graph-based architecture for Each node is initialized by loading the models into scalable processing memory, leveraging shared memory and copy-on-write primitives provided by the server’s operating system.

Query understanding systems in e-commerce search en- Each node is loaded only once, and subsequent processes gines must generate real-time responses within strict assigned the same node reference the original memory. service level agreements (SLA). They execute complex Since the models are used for inference, not training, logic involving diferent models interacting both in series there are no write operations, reducing memory footand parallel. print and improving loading times. Finally, the batching

Our engine is constructed as a sequence of operations service handles the backpressure control system and the (nodes) arranged in a graph showing their interdepen- REST API for listening to incoming requests. dencies (edges). Like neural networks, the graph-based At startup, the engine performs several optimizations engine organizes and dispatches each computation to on the graph topology. The simplest is graph culling, maximize parallelism. removing nodes that do not interact with others. Each

Parallelization occurs at multiple levels, including node’s expected computational burden can be specified. inter-operation parallelism and entire graph replicas, de- Simple nodes (e.g., string regex preprocessors) are less pending on deployment requirements. Each operation resource-intensive than full neural network nodes. The within the graph is a complex model component, requir- engine modifies the graph by combining nodes or inlining ing specific optimization strategies, such as data vec- to facilitate parallel operations and minimize costly intertorization and memory sharing, to optimize the overall process communications. This results in lighter nodes graph structure. being replicated multiple times and fused into heavier

We represent the graph using the notation node: nodes, each mapped to a single system process. [arg1, arg2, ..., argN], where node requires After inlining, the engine performs graph linearization, incoming edges from arg1 through argN. The full con- converting the graph into a linear sequence, where each ifguration of the graph can be seen in Appendix A node depends only on preceding nodes, not subsequent

The engine processes the notation by following these ones. The engine dispatches nodes in order, synchronizsteps: First, it optimizes the graph by joining (inlining) ing results only when necessary. This strategy minimizes nodes based on certain criteria, which increases parallel pauses and maximizes parallelism. Nodes with a higher operations as much as possible. Next, it decides how computational burden are prioritized, reducing the need many replicas of the graph to run on a single physical for the backpressure control system, leveraging the fact TF-IDF L1

Spacy Distilbert L2

TF-IDF L2

Sklearn L2 Distilbert L1

Sklearn L1

Rules Fusor L1 CRF Duckling Full graph Optimized graph

User Query User Query

Preproc Fuse ALL

CRF

Fusor L1 | Rules | Fuse ALL

END Node's computational burden light normal heavy processing. The batching algorithm uses two thresholds: Intel Xeon vCPU @ 3.5GHz, for which we report the

Lastly, the engine addresses CPU oversubscription [19], for 99% of requests should remain below 100ms. mance by reducing oversubscription. The number de- the same AWS network. The simulator issues multiple that CPU and data transmission tasks are handled by separate CPU circuitry.

Query understanding systems receive hundreds of individual requests per second. Processing a single request is expensive due to inter-node communications. Batching multiple requests reduces overhead and enables vectorization, leveraging hardware primitives for eficient batch size and waiting time for further samples. This balances server resource utilization and processing time. which occurs when parallel execution threads exceed available CPU cores, leading to overhead from context switching. The backpressure control system ensures no more than nodes run in parallel, enhancing perforpends on available CPU resources and the code executed within each node. A simple formula for determining is: = ︂⌊

max∈ (ℎ) ︂⌋

+ 1 where ℎ is the number of threads or processes that an individual node can utilize independently, and denotes the available CPU cores on the server.

(1) patterns.

5. Performance analysis at scale

Multiple tests were conducted using diferent AWS 2 EC2 instances on the engine described in Section 4 and the ensemble configuration as in Appendix A. The optimal balance between cost, latency, and throughput was achieved with the m6i.2xlarge instance, which features 8-Cores results.

The test’s target SLA stipulated that response times

All tests initiate a single instance of the engine with a graph replication factor of one3. Another server, which hosts the client simulator implemented using a Python package called Locust, is instantiated. Both servers share queries to the engine’s server, each randomly sampled from a dataset of actual queries over a sustained duration of 30 seconds. The rate of each request follows an exponential distribution with a rate of requests per second, mimicking a Poisson process, a common model for trafic

Table 3 reports the execution times of each node, along

2https://aws.amazon.com/ 3Replication factors greater than one were also tested, but they

caused immediate CPU oversubscription problems, as anticipated.

The SLA targets were unattainable without resorting to costly GPUs. with the main engine loop responsible for scheduling on cheaper hardware without a GPU and at low rates of them and the outer REST API handling incoming requests . In production, multiple instances would handle fluctuand facilitating the connection between the engine and ating trafic, making batching eficient for scaling while the outside world. The runtime of each individual node meeting the SLA. The optimal batching period should must be strictly shorter than the main engine loop, repre- match the main loop time @ 99%, which is around 50 senting the actual time taken for parallel graph execution. in this case.

Node runtimes do not consider inter-process communi- From a single request’s perspective, with = 30, cation, which is accounted for in the main loop. On the batches are dispatched precisely every 50, meaning other hand, the Rest API contributes to the main loop by requests encounter a uniform distribution over this interincluding the time required to handle the HTTP connec- val with an average wait of 25 in the batch queue. The tion with the requesting client. The outer Rest API time entire batch is then processed, typically taking time to must stay below 100ms @ 99% percentile to comply with complete before the response is extracted and forwarded the target SLA. through the HTTP channel, taking an additional . Em

When batching is disabled, at the given rate , new pirically, represents the main loop runtime, averaging requests arrive while the server is still processing pre- around 30 @ 50%. The Rest API, implemented using vious ones. These requests are immediately dispatched, FastAPI4, has been benchmarked to yield a duration of leading to CPU oversubscription, which slows down all ≈ 2 − 5, giving us requests. This efect tends to cascade, as the increased processing time makes it more likely that other requests will arrive, further slowing the system. REST API @50% = 25ms + 30ms + 2ms ≈ 56.25ms

When batching is enabled, the engine pauses to accumulate requests into a batch until thresholds of 5 sam- For REST API @ 99%, the wait time is always 25 ples or 50 are met. Given each request arrives every on average, but and change accordingly, giving 1/ ≈ 30, the average batch size is around 1.5 sam- approx 90 − 95. ples. Therefore, vectorization alone cannot explain the server’s ability to meet the target SLA. The process un- 6. Conclusion and future work folds as follows: (1) the first batch is dispatched for processing, (2) for the next 50, new requests are queued This paper proposed a novel ensemble approach for query into a new batch while (3) the engine likely completes the understanding in e-commerce, combining deep learning ifrst batch within 51.7 (with 99% probability), (4) the models like DistilBERT with traditional techniques like second batch is then dispatched, utilizing just released resources. Thus, batching acts as backpressure control logistic regression and rule-based systems. The ensem- indicator to the engine of what to select as the final result. ble architecture aimed to capture the nuances of user The output key is also vital for the process of graph topolqueries and provide robust performance across query ogy optimization and linearization described in Section 4. types and categories. Data augmentation techniques This representation not only makes it easier to track data were employed to improve the DistilBERT model’s han- flow but also helps optimize the query understanding dling of brands, misspellings, and short queries. An opti- ensemble for real-time processing in e-commerce envimized graph-based architecture using the Ray framework ronments. enabled eficient, scalable processing of high-volume trafifc.

While the ensemble performed well, there are limitations to address in future work. The system focused only on navigational queries for product categorization and execution_graph: entity extraction. Extending it to handle informational preprocessor: [ user_query ] and other query types could further improve relevance. distilbert_l1: [ user_query ] Exploring more advanced data augmentation, model com- distilbert_l2: [ user_query ] pression, and hardware acceleration techniques could tfidf_l1: [ preprocessor ] enhance accuracy and eficiency. tvfuiid_fd_ulc2k:lin[g:pr[eprporceepsrsoocres]sor ]

The query understanding ensemble demonstrated spacy: [ preprocessor ] the value of combining diverse models and leverag- crf: [ spacy ] ing distributed computing frameworks for scalability sklearn_l1: [ tfidf_l1 ] in e-commerce search engines. E-commerce platforms sklearn_l2: [ tfidf_l2 ] can benefit from adopting similar, ensemble-based ap- fusor_l1: [ distilbert_l1, sklearn_l1 ] proaches customized to their query trafic and product rules: [ spacy, fusor_l1 ] data. The architecture enables eficient real-time query fuse_all: [ processing while meeting strict latency requirements, rules, crf, distilbert_l1, sklearn_l1, critical for delivering a seamless shopping experience. distilbert_l2, sklearn_l2, vui_duckling ] outputs: [ user_query, preprocessor, parse ]

7. Appendix A. Graph configuration

Figure 3 illustrates the graph structure that defines the Query Understanding Ensemble. The nodes represent In our query understanding system, the relationships components that work together to process user queries between various models and preprocessing components and extract meaningful insights. The graph starts with are organized within a graph-based architecture. This preprocessing steps that normalize and clean the user inarchitecture plays a crucial role in managing the interde- put. Subsequently, components such as DistilBERT and pendencies between diferent models, ensuring eficient TF-IDF are leveraged to extract semantic features and computation and scalability. contextual information. Additional models like the CRF

The graph representation is designed to handle the (Conditional Random Fields) and vui_duckling focus integration of multiple machine learning and rule-based on identifying specific entities such as brands, quantities, models while facilitating optimized parallel processing. and attributes.

Each key in the graph corresponds to a node, which The outputs from these models are fused toindicates a component or model, and the associated value gether through specific nodes such as fusor_l1 and is a list of other nodes that provide input to it. This fuse_all, which combine signals from the intermedidifers from traditional adjacency lists, where the focus ate models based on confidence scores and rule-based is on child nodes. Instead, in our graph, the value lists decisions. The final outputs represent the processed user contain ancestor nodes, indicating which components query, refined and enriched through multiple layers of feed information into the current node. analysis, ready for downstream tasks such as categoriza

A key aspect of this architecture is that certain el- tion and search relevance adjustments. ements, such as user_query, are considered implicit This architecture’s flexibility and eficiency enable it nodes representing external inputs to the system. These to handle the complexities of e-commerce queries in real external inputs play a foundational role in initiating the time while supporting high-volume trafic and diverse data flow throughout the graph. The architecture is query types. It also lays the groundwork for the perfordesigned to handle multiple outputs, listed within the mance optimizations and parallel processing strategies outputs key. This is not a graph node but serves as an outlined in Section 4. tion for E-Commerce Search Queries, in: 2018 IEEE International Conference on Big Data (Big [1] H. Deng, Y. Zhang (Eds.), Query Understanding Data), 2020. URL: https://api.semanticscholar.org/ for Search Engines, 1st ed., Springer, 2020. doi:10. CorpusID:219530417.

1007/978-3-030-58334-7. [10] M. Tsagkias, T. H. King, S. Kallumadi, V. Murdock, [2] S. Jiang, Y. Hu, C. Kang, T. Daly, D. Yin, Y. Chang, M. de Rijke, Challenges and research opportunities C. Zhai, Learning query and document relevance in ecommerce search and recommendations, SIGIR from a web-scale click graph, in: Proceedings Forum (2020). of the 39th International ACM SIGIR Conference [11] E. Shaikh, I. Mohiuddin, Y. Alufaisan, I. Nahvi, on Research and Development in Information Re- Apache spark: A big data processing engine, 2019 trieval, SIGIR ’16, Association for Computing Ma- 2nd IEEE Middle East and North Africa COMchinery, New York, NY, USA, 2016, p. 185–194. Munications Conference (MENACOMM) (2019) 1– doi:10.1145/2911451.2911531. 6. URL: https://api.semanticscholar.org/CorpusID: [3] P. Nigam, Y. Song, V. Mohan, V. Lakshman, W. A. 211120979.

Ding, A. Shingavi, C. H. Teo, H. Gu, B. Yin, Semantic [12] P. Moritz, R. Nishihara, S. Wang, A. Tumanov, product search, in: Proceedings of the 25th ACM R. Liaw, E. Liang, M. Elibol, Z. Yang, W. Paul, M. I. SIGKDD International Conference on Knowledge Jordan, I. Stoica, Ray: a distributed framework for Discovery & Data Mining, KDD ’19, Association for emerging ai applications, in: Proceedings of the Computing Machinery, New York, NY, USA, 2019, 13th USENIX Conference on Operating Systems p. 2876–2885. doi:10.1145/3292500.3330759. Design and Implementation, OSDI’18, USENIX As[4] Y.-C. Lin, A. Datta, G. Di Fabbrizio, E-commerce sociation, USA, 2018, p. 561–577.

Product Query Classification Using Implicit User’s [13] V. Sanh, L. Debut, J. Chaumond, T. Wolf, DistilFeedback from Clicks, in: 2018 IEEE International BERT, a distilled version of BERT: smaller, faster, Conference on Big Data (Big Data), 2018, pp. 1955– cheaper and lighter, ArXiv abs/1910.01108 (2019). 1959. doi:10.1109/BigData.2018.8622008. URL: https://api.semanticscholar.org/CorpusID: [5] G. Di Fabbrizio, E. Stepanov, F. Tessaro, Extreme 203626972.

Multi-label Query Classification for E-commerce, [14] M. Honnibal, I. Montani, S. Van Landeghem, in: eCom’24: ACM SIGIR Workshop on eCommerce, A. Boyd, spaCy: Industrial-strength Natural LanJuly 18, 2024, USA, 2024. guage Processing in Python (2020). [6] J.-W. Ha, H. Pyo, J. Kim, Large-scale item cat- [15] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: egorization in e-commerce using multiple recur- Pre-training of Deep Bidirectional Transformers for rent neural networks, in: Proceedings of the 22nd Language Understanding, in: J. Burstein, C. DoACM SIGKDD International Conference on Knowl- ran, T. Solorio (Eds.), Proceedings of the 2019 Conedge Discovery and Data Mining, KDD ’16, As- ference of the North American Chapter of the Association for Computing Machinery, New York, sociation for Computational Linguistics: Human NY, USA, 2016, p. 107–115. doi:10.1145/2939672. Language Technologies, Volume 1 (Long and Short 2939678. Papers), Association for Computational Linguis[7] Y. Qiu, C. Zhao, H. Zhang, J. Zhuo, T. Li, X. Zhang, tics, Minneapolis, Minnesota, 2019, pp. 4171–4186.

S. Wang, S. Xu, B. Long, W.-Y. Yang, Pre-training doi:10.18653/v1/N19-1423.

Tasks for User Intent Detection and Embedding Re- [16] J. Kittler, M. Hatef, R. Duin, J. Matas, On combining trieval in E-commerce Search, in: Proceedings of classifiers, Pattern Analysis and Machine Intellithe 31st ACM International Conference on Infor- gence, IEEE Transactions on 20 (2002) 226–239. mation & Knowledge Management, CIKM ’22, As- [17] F. Reiss, S. Raghavan, R. Krishnamurthy, H. Zhu, sociation for Computing Machinery, New York, NY, S. Vaithyanathan, An algebraic approach to ruleUSA, 2022, p. 4424–4428. doi:10.1145/3511808. based information extraction, in: 2008 IEEE 24th In3557670. ternational Conference on Data Engineering, IEEE, [8] D. Shen, Y. Li, X. Li, D. Zhou, Product query classi- 2008, pp. 933–942.

ifcation, in: Proceedings of the 18th ACM Confer- [18] C. Hewitt, Actor model of computation: ence on Information and Knowledge Management, Scalable robust information systems, 2015. CIKM ’09, Association for Computing Machinery, arXiv:1008.1459.

New York, NY, USA, 2009, p. 741–750. URL: https: [19] C. Iancu, S. Hofmeyr, F. Blagojević, Y. Zheng, Over//doi.org/10.1145/1645953.1646047. doi:10.1145/ subscription on multicore processors, in: 2010 IEEE 1645953.1646047. International Symposium on Parallel & Distributed [9] B. Ramesh, Bhange, X. Cheng, M. Bowden, P. Goyal, Processing (IPDPS), 2010, pp. 1–11. doi:10.1109/ T. Packer, F. Javed, Named Entity Recogni- IPDPS.2010.5470434.