Leveraging Knowledge Graph and DeepNER to Improve UoM Handling in Search Qunzhi Zhou, Zhe Wu, Jon Degenhardt, Ethan Hart, Petar Ristoski, Aritra Mandal, Julie Netzloff, and Anu Mandalam eBay Inc, San Jose, CA, 95125, USA {qunzhou,zwu1,jdegenhardt,ejhart,pristoski, arimandal,jnetzloff,amandalam}@ebay.com Abstract. Understanding Unit of Measurements (UoM) is critical to e- commerce search engines. It is a challenging problem since numeric values and unit symbols are typically treated as regular text tokens. In this paper, we introduce a framework that utilizes Knowledge Graph (KG) and deep learning based Named Entity Recognition (NER) to provide better semantic understanding of UoMs. Keywords: E-commerce, Knowledge Graph, Information Retrieval 1 Introduction Query understanding that allows identifying relevant items from millions of prod- ucts is fundamental for the success of any e-commerce platform. However, UoMs haven’t been extensively studied in the context of search queries. A UoM usu- ally consists of two parts, a numerical quantifier and a unit symbol. Identifying UoMs in text is challenging for several reasons: (i) UoMs can be represented in multiple measurement systems, e.g., metric and imperial system, (ii) UoMs have abbreviations that can be ambiguous, e.g., g for gram or gigabyte, (iii) user input doesn’t always follow standard representation, e.g., 64.5” TV, 64 1/2 inch TV, 164cm TV all have the same query intent, and results should be identical. Several UoM ontologies have been introduced in the literature [1], most of which are tailored to a specific application or domain, making it difficult to reuse [2], or they provide only a catalog of abstract UoM concepts. Furthermore, most of existing work focuses on extracting UoMs from semi-structured data, like tabular data, while identifying UoMs in natural language, especially short texts like search queries, remains a challenge. To address these challenges, we introduce a novel approach to support se- mantic query matching for UoMs based on Entity Linking over a KG, which contains a comprehensive set of unit entities and relations. 2 Approach Our approach for matching UoMs in text consists of three main components, (i) knowledge graph, (ii) entity resolution, (iii) query rewrite. Our KG integrates product data with real-world knowledge domains including brands, colors, mate- rials, UoM, etc. The UoM sub-graph captures around 800 entities and relations between them, including label variants, physical quantities, SI base units, and Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 2 Qunzhi Zhou et al. Fig. 1: UoM entity resolution for product data and search queries conversion rules. A generic entity resolution service was developed to link text mentions in product item and query data to KG entities. As shown in Figure 1, a deep hybrid neural network NER model [3] and the knowledge graph are leveraged to incorporate context-based features to recognize and disambiguate UoM entities. NER tags are mapped to ontology classes and used as features to rank the relevance of entity candidates sampled from the KG. To support UoM entity matching, the search engine leverages the resolution service in both item data and user query processing flow. For each recognized UoM phrase, two UoM URIs are generated including the original URI and the canonical URI which represents the equivalent measurement in SI base unit. Both URIs are indexed for items and included in query rewrites. 3 Results The proposed approach was evaluated in an offline and online setting. In the offline setting, a set of 10,000 random search queries were used to measure the change in recall and precision. The human evaluation with independent anno- tators showed considerable increase in recall, while the relevance remained the same. Secondly, we conducted an online A/B test, exposing the new experience to millions of users. For the treated traffic, we observed statistically significant drop in search abandonment rate and decrease in low recall search sessions. This matches well with the observed offline recall increase of 5%+ for the random queries containing UoMs. 4 Conclusion We have shown that through the use of KG and NER, we can improve UoM handling for query understanding and improve item ranking as a result. Given the genericness of the solution, our next step is to extend the application over domains such as sizes, colors and materials. Acknowledgement. We thank Nadia V., Sathish K., Sneha K., Steven X., Simon F., and Alan P. for their contributions. References 1. Markus D Steinberg, Sirko Schindler, and Jan Martin Keil. Use cases and suitability metrics for unit ontologies. In OWL: Experiences and Directions. 2016. 2. Foppiano et al. Automatic identification and normalisation of physical measure- ments in scientific literature. In ACM Symposium on Document Engineering, 2019. 3. Yingwei Xin, Ethan Hart, Vibhuti Mahajan, and Jean-David Ruvini. Learning better internal structure of words for sequence labeling. 2018.