Anu Question Answering System Balaji Ganesan1 , Avirup Saha2 , Jaydeep Sen1 , Matheen Ahmed Pasha3 , Sumit Bhatia1 , and Arvind Agarwal1 1 IBM Research, {bganesa1, jaydesen, sumitbhatia, arvagarw}@in.ibm.com 2 IIT Kharagpur, India, avirupsaha@iitkgp.ac.in 3 IBM Data and AI, matpasha@in.ibm.com AnuQA is a question answering system built on top of a search index and an enterprise knowledge graph. In this work, we describe five semantic technologies that have helped us address real world challenges in deploying this system. These challenges include bias in knowledge base population, entity re-resolution on streaming data, ontology alignment across data sources, explaining relationships, and providing a single unified query interface for business analytics. Anu [2] introduced the Anu Cognitive Compliance platform. It has enabled research in a number of fields including Search Index Optimization, Answer Sentence Se- lection, Document Similarity, Hypernym Discovery, Fine Grained Entity Clas- sification, Ontology Creation and Link Prediction. We now present five seman- tic technologies that we have implemented to enable real world deployments of AnuQA system. Data Augmentation for Knowledge Base Population Data Augmentation is the process of increasing the diversity in the training data without necessarily having to acquire more data. In the context of Knowledge Bases, we have found data augmentation using IBM’s rule based SystemT to be effective in increasing the diversity of the populated knowledge graphs and also in making downstream tasks like Link Prediction less dependent on gender, ethnicity, religion and other protected attributes. Entity Re-resolution using Temporal Point Processes We define Entity Re-resolution as the localized creation, updation and elimi- nation of entities in a Knowledge Graph based on streaming updates. We have used Dirichlet Hawkes Processes (DHPs) to model both textual similarity and temporal closeness of the updates to the graph. Scaled using IBM’s Master Data Management platform, we find DHP to be a suitable substitute to neural model predictions which are harder to explain to end users of the system. Copyright c 2020 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0). B Ganesan, A Saha, et al. Unified Hierarchical Label Set model for Ontology Alignment AnuQA requires fusing information from different data sources to enable natural language querying. This is typically handled by manual processes which become cumbersome as the number of sources increases. We use the Unified Hierarchical Label Set (UHLS) model [1], based on collective learning of entity types, to integrate labels from different data sources and standard ontologies. Explainable Link Prediction While a number of interpretability solutions have been proposed for link pre- diction by graph neural networks, human understandable explanations are de- sirable in real world applications. Based on [3], we extract supporting text from unstructured documents, logs, lineage data and relational tables. We also look at existing paths to explain new links predicted between nodes in our knowledge graph. Reasoning for Natural Language Interpretation Natural Language Query [4] interfaces allow end-users to ask questions without knowing any specialized query language or data storage and schema details. We use logical reasoning over domain semantics and knowledge to support a wide variety of domain-specific queries in natural language. Domain reasoning helps us to make better interpretation of implicit intents in natural language queries, especially analytic queries typically posed to information access systems. Deployments Different parts of this question answering system have been deployed in various customer engagements and product offerings of IBM, especially in the financial services domain. http://covid19-india-qa.mybluemix.net is a sample instance of the AnuQA system for answering questions on COVID19. References 1. Abhishek, A., Azad, A.P., Ganesan, B., Anand, A., Awekar, A.: Collective learn- ing from diverse datasets for entity typing in the wild. In: Proceedings of the 2nd International Workshop on EntitY REtrieval. pp. 16–23. CEUR-WS (2019) 2. Agarwal, A., Ganesan, B., Gupta, A., Jain, N., Karanam, H.P., Kumar, A., Madaan, N., Munigala, V., Tamilselvam, S.G.: Cognitive compliance for financial regulations. IT Professional 19(4), 28–35 (2017) 3. Bhatia, S., Dwivedi, P., Kaur, A.: That’s interesting, tell me more! finding descrip- tive support passages for knowledge graph relationships. In: International Semantic Web Conference. pp. 250–267. Springer (2018) 4. Sen, J., Ozcan, F., Quamar, A., Stager, G., Mittal, A., Jammi, M., Lei, C., Saha, D., Sankaranarayanan, K.: Natural language querying of complex business intelligence queries. In: Proceedings of the 2019 International Conference on Management of Data. pp. 1997–2000 (2019)