Did you mean A or B? Supporting Clarification Dialog for Entity Disambiguation Anni Coden, Daniel Gruhl, Neal Lewis, Pablo N. Mendes IBM Research, USA Abstract. When interacting with a system, users often request infor- mation about an entity by specifying a string that may have multiple possible interpretations. Humans are quite good at recognizing when an ambiguity exists and resolving the ambiguity given contextual cues. This disambiguation task is more complicated in automated systems. As sys- tems have more and more different entities and entity types available to them, they may better detect potential ambiguities (e.g., ‘orange’ as a color or fruit). However, it becomes harder to resolve entities automati- cally and effectively. In this position paper we discuss challenges in inter- acting with users to ask clarifying questions for entity identification. We propose three approaches, illustrating their strengths and weaknesses. 1 Introduction Let us assume we are executing a Natural Language Processing (NLP) task with the help of a system which uses a knowledge base containing multiple entities and entity types. For example, assume that a user asks for information about an entity by entering a keyword (such as ‘orange’). The meaning of this string is ambiguous: the system may have multiple entities known as ‘orange’ (e.g., a fruit, a company, a color and possible others). In some cases, the meaning of ‘orange’ may be disambiguated by qualifying it with the entity type (e.g., fruit or company). Other times the entity type alone may not be sufficient to disambiguate a term like ‘apple’. Consider the sentence “Apple is quite popular for these cases studies” which could be found for instance business management course. Here one desires to not only differentiate between apple (a fruit) and apple (a company), but also differentiate between Apple Inc. (the computer company) and Apple Corps (the multi-media company founded by the Beatles). Knowledge sources such as DBpedia [5] provide hundreds of thousands of entity types in the form of ontology classes, Wikipedia categories, etc. An entity type is often comprised of thousands of entities – e.g. there are more than 60,000 entities of type company. Meanwhile, a given string may have multiple possible meanings belonging to different entity types – e.g. the string ‘apple’ may refer to tens of entity types (e.g. plant, company, band, place, etc.) and even to several different entities of the same type in DBpedia (e.g. Apple Inc. and Apple Corps). This richness of knowledge sources comes at the cost of increasing the likelihood of ambiguity both with respect to entity type and entities. Previous work has demonstrated effective ways to select candidate entities from DBpedia for a given string [9]. Often, it is possible to automatically decide on the correct interpretation for an ambiguous string [10, 3]. In other cases (e.g. for lack of context) either an automatic decision cannot be made or it is more advantageous to interact with the user to clarify the intended meaning. To ef- fectively communicate with users, a system needs the ability to summarize its knowledge about possible entities given a string and concisely present it to the user in the form of a clarifying question. In this paper we present three different methods for automatically generating such questions. We will highlight the main challenges and discuss pros and cons of the approaches. 2 Related Work Schlangen [14] studied causes and strategies for requesting clarification in spoken dialog. Their fine-grained model of problems that may arise during the process- ing of a question included lexical ambiguity, but did not go into details of how to clarify ambiguity. Stoyanchev et al. [15] built a rule based system to gen- erate clarification questions. In their experiments, the automatically generated questions performed better than a set of human generated questions. Rule-based systems have the disadvantage that they have to be manually adapted for new domains and new language patterns. Loos and Porzel [6] explore the use of hand- labeled data and ontological distance as methods of performing word sense dis- ambiguation in speech recognition, and find that they both do well, although require a fair bit of human effort to develop relevant ontologies or score domain relevant training data. De Boni and Manandhar [4] developed an algorithm for clarification dialog recognition, in other words, to determine when a set of given questions are related and hence provide context for each other. They establish that clarification dialogues simplify the task of answer retrieval. In our case, we are trying to come up with the clarification questions for the specific case of ambiguity in one or more entity types in the original question. There is a rich body of work on word sense disambiguation. Given the pos- sible senses of a word, such systems assign the appropriate sense within the context. In general, such disambiguation methods are machine learning based with features chosen for the task. In the medical domain, the Unified Medical Language System [8], provides a rich ontology of terms and phrases with as- signing different possible meanings to a word. For example the word ‘ms’ can refer to at least twelve quite different meanings. Previous work showed different types of documents (e.g. clinical text and biomedical literature) require different features to be used in machine learning algorithms to achieve the best disam- biguation accuracy [13]. Such disambiguation systems are therefore costly to develop. Furthermore, even the best machine learning systems for a particular domain may find certain entities difficult to disambiguate, and therefore gener- ate a low-confidence decision. For use cases that require high levels of precision (e.g. medical coding) our method can be applied to request human clarification when disambiguation systems are not very confident. 3 Entity Presentation in Natural Language Clarification Dialog We investigated three classes of clarifying query (CQ) approaches: type-based, example-based and usage-based. Type-based CQ – In some respects this is the most straightforward method. Suppose there are two entity types that have ‘orange’ as a member, and suppose furthermore that the labels of the types have semantically distinguishable names (e.g., a fruit type and a color type), then the system can use these labels to construct a clarifying query: “Do you mean ‘orange the fruit’ or ‘orange the color’ ?” Naturally, this method hinges on the availability of understandable and descriptive labels for the types. This method would be less helpful if the type labels are “noun” or “thing” or “type1234”. Furthermore such questions are not helpful to disambiguate entities within the same entity type. Example-based CQ – In cases where the entity class labels are not available (or useful) the next easiest way to clarify the meaning is using other example entities that are similar to the possible meanings. Suppose the system is trying to clarify whether an entity of type fruit or color is meant. It might propose clarifying examples using other exemplars of each possible meaning: Do you mean ‘orange’ like banana and apple, or ‘orange’ like yellow? To do so the system needs to choose how many and which examples to provide. Our experience is that just showing the alphabetically first few members (or a random subset) of a group (e.g. an entity type) does not work well, especially in cases where a list of type members may run to the thousands. Usage-based CQ – The last method we consider is to provide examples of natural language usage of a term in context. For example “Do you mean ‘orange’ as in ‘I’d like an orange juice.’ or ‘I love the 4G speed Orange now offers.’ ?” We can find the context by analyzing a large corpus where the entities of the target type occur many times. We look at these occurrences to identify contexts that are fairly specific to the type in question (e.g., fruits often occur in the pattern ‘* juice’.) This is difficult to do well but can be a very effective way to describe an entity: appropriate context snippets can capture nuances of meaning that even several examples cannot. 4 Qualitative Analysis There are many methods one might use to “clarify” which entity is meant among entities of potential interest. We will focus on the three methods mentioned above and provide some qualitative analysis of challenges and opportunities with each approach. 4.1 Materials: Data Sets For our preliminary analysis, we will restrict ourselves to entities of three entity types which have clearly very different meanings but contain overlapping terms: companies, colors and fruit. While our techniques apply also when the semantic meanings of the labels are quite similar and subjective (e.g., “warm colors” vs “sunset colors”), illustration of their pros and cons is simpler with pronounced differences in entity types. The text for usage-based clarifying questions was (in general) extracted based on the ukWaC corpus [1]. Documents in this corpus are from a large web crawl focused on UK sites, so there are some regional nuances (e.g., some companies such as Orange S.A. are more commonly discussed in the UK than the US). The lists of entities used for exampled-based clarifying questions were cre- ated by starting with a small seed set of members and expanding them with the ukWaC corpus using Concept Expansion1 . This resulted in 131 companies, 124 colors and 66 fruits. These entity sets are clearly not “complete” but they are large enough to illustrate the approaches we are discussing. In practice, no entity type lists can be complete as new entities are always being added to the knowledge base. 4.2 Results For the analysis in this section we will concentrate on a user query for ‘orange’ as our running examples, as it can be a fruit, a company, a color, etc. Questions are generated so that the user needs to choose among two alternatives: “Did you mean A or B?” Answers by the users will be ‘A’ if the first option is correct and ‘B’ if the second option is correct. If none of the alternatives is correct, the user will answer ‘None’ and the system will try again with another question. Type-based clarification The generated clarifying question would be in the form: Do you mean ‘orange’ as in a fruit or a company? In cases such as this where there are clear and semantically meaningful entity type labels available this method is straightforward. However, this is not always the case, especially if the type lists are federated from multiple sources. For example, the label of a list derived from an optics textbook might be “590- 620nm” (the wavelength of orange light). A list obtained from a design source might have technical names such as “warm colors” and “cool colors”. If the user does not know the meaning of the type names, the clarifying question will not be helpful. Lastly, some lists are generated from taxonomic downward closures or automatic clustering, and thus many have no names at all. In these cases a naı̈ve type-based clarification question scheme breaks down, and alternatives have to be considered. 1 http://ibm.biz/WatsonCE It is worth noting that when a single phrase is legitimately ambiguous within a type (e.g., ‘apple’ as a company being the name of both a technology company and a multi-media company) the system will be challenged to differentiate them. That being said, given more fine-grained entity types (e.g., technology companies and recording companies) it can work to separate these. Therefore one of the challenges for this method is to determine the best granularity level to trade-off ease of understanding and distinguishability of an entity type. Example-based clarification For this approach we wish to generate a small number (say three or four) examples from each potential interpretation to present to the user for clarification. The question would be of the form: Do you mean ‘orange’ like mauve, lilac and pink, or ‘orange’ like IBM, Compaq and Hewlett Packard? We would like to choose examples that make the question easy to understand and apt to distinguish between the competing interpretations. The challenge is how to choose appropriate examples. One effective way to select examples is to compute the distance between all term pairs in a type using a vector space model approach such as word2vec [11] and then for each member compute the sum of the distance to all other terms. Those with the smallest total sum are in a sense the “median” of the terms of that entity type. We can think of this as finding stereotypical terms from each type, which reflects our intuition of what would make good example terms for a clarifying question. In the above example, the stereotypical terms are mauve, lilac and pink – three colors that have less ambiguity than others such as coffee or orange. The same holds true for IBM, Compaq and Hewlett Packard with respect to companies. See Table 1 for more examples. In most of the peripheral cases one can see why the entity string might be a member of multiple types (e.g., brother is both a company and a family member, coffee is both a beverage and a color, etc.) A simpler (and perhaps more intuitive) approach would be to select just the closest (e.g., by cosine distance in word2vec vector space) member of the type with regard to the target term (e.g. ‘orange’). The examples generated in this manner turn out to be weighted towards those that are ambiguous in the same way the target term is. That is, suppose ‘blackberry’ belongs to the company, color and fruit types, ‘blackberry’ would end up close to ‘orange’ due to their shared ambiguity. Instead, we want terms that highlight the differences between the two possible interpretations. We refer to those “median” terms as being most central to the “meaning” of a type (e.g., apricots, cherries, plums are central to fruit). Terms that have multiple common meanings besides the one under consideration end up with a larger distance and thus are more “peripheral” (e.g., Sun and Brother, while companies, have other common usages). Usage-based clarification Examples of sentence snippets including the am- biguous term provide an elegant solution to clarifying which entity is desired. Type Central Peripheral Colors mauve, lilac, pink, taupe coal, sage, bordeaux, coffee Fruit apricots, cherries, plums, melon mulberry, berry, mandarin, orange Companies IBM, Compaq, Hewlet Packard, Lucent Sun, Brother, Hughes, Myspace Table 1. Examples of central (i.e. good) and peripheral (i.e. bad) terms from entity types. Note the “bads” have been filtered slightly to remove misspellings, etc. The challenge is to automatically generate snippets that are both self-contained (i.e. contain enough information to be understandable) and helpful to distinguish between two competing interpretations of a term. We have developed a system that generates sentence snippets by first finding “appropriate” sentences in a source corpus and then extracting the clarifying phrases. We start by leveraging Concept Expansion’s pattern generation [2, 12]. By doing so we identify hundreds of patterns that have a high probability to contain an entity type in them. For example, for the type color: – ‘black / *’ – ‘and * in colour.’ – ‘of * vellum’ – ‘red, * and blue’ – ‘shades of * and’ – ‘* and purple’ Patterns are scored as to the specificity of the context to the given type, and in the most selective patterns the * (in the list above) is replaced with the word of interest (‘orange’ in this case). We then query a corpus such as ukWaC for occurrences of such patterns (e.g., starting with ‘black / orange’). If we get a match, the sentence in which it appears is selected. Otherwise the process is repeated with the next best pattern and so forth until a match is found. This algorithm yields the following sentence: Orange and purple crayon streaks rainbowed across his briefcase surface. While in this case the sentence can be used as is as a clarifying statement, for others (e.g., run on sentences that can go to half a page) it is helpful to select just the clause of interest. In addition a sentence can contain multiple mentions of words of interest with different meanings for each of them: e.g “I like the orange shirt; it reminds me to eat an orange.” To determine an appropriate snippet we apply the Stanford dependency parser [7] to the sentence and examine all “paths” in the dependency tree containing the word of interest (WI). Depending on the position of the WI in the path, it is either shortened or augmented by another branch in the dependency tree. The resulting sentence snippet is then: Do you mean ‘orange’ like “orange and purple crayon streaks”? This approach does have some interesting failure modes – sometimes the question doesn’t help as much as one might hope, especially if the context is shortened too much; e.g., SENTENCE = The orange prize for fiction celebrates 10 years with best of the best. QUESTION = Do you mean ‘orange’ like ‘orange prize’ ? Unless you know that the Orange Prize is the name of a prize for novels that is sponsored by Orange S.A. it is hard to know that this means “Orange like the company” (indeed after inspection we can see that ‘* prize’ is a good but not great spotter for organizations). SENTENCE = “thunder iv gx cards are designed to work with apple’s implementation of the nubus slot.” QUESTION = Do you mean apple like “work with apple’s implementa- tion”? The above example shows how a sentence snippet clearly identifies the mean- ing of the word ‘apple’ referring to the computer company and not the multi- media company. More examples are in Table 2. In some cases no good example phrase was found in the ukWaC corpus. Type Suggestion Fruit Do you mean orange like “coffee or orange juice” ? Fruit Do you mean blackberry like “blackberry juice” ? Fruit Do you mean apple like “orange and apple juice” ? Company Do you mean blackberry like “explore the possibilities of blackberry beyond email” ? Company Do you mean orange like “orange prize” ? Company Do you mean apple like “works with apple’s implementation” ? Company Do you mean sage like “register with sage” ? Color Do you mean orange like “orange and purple crayon streaks”? Color Do you mean apple like “blue and apple red”? Color Do you mean sage like “cut from colorbok sage stripe”? Table 2. Examples of clarifying questions using phrases obtained from the corpus. 5 Conclusion Finding the entity for an ambiguous mention can be challenging in cases where there is not enough context for automatic disambiguation. As knowledge bases become more complete and broadly applicable, this problem will only increase. In cases where systems are interacting with users, it may be advantageous to summarize the ambiguity in a question that present alternative entities in natural language for users to select. We present three methods for implementing clarifying question generation to interact with users of these systems. It is clear there is no one method that works all the time, and that this task may best be done with an ensemble of methods. We hope that our discussion in this paper will inspire the creation of many more methods and improve the interaction of knowledge-based systems for complex tasks. Rigorous evaluation of our preliminary work is the focus of our ongoing effort. One of the challenges is that the “goodness” of a clarifying question is an inher- ently subjective opinion. We are also investigating more sophisticated methods that may mitigate some of the limitations we discussed here. Another research topic is on extending our approaches to be able to determine clarifying ques- tions for entities which are not members of any known type - in effect building a system to augment and create existing taxonomies and ontologies. Acknowledgements Authors listed in alphabetical order by last name. We would like to thank Alfredo Alba, Clemens Drews, Linda Kato, Chris Kau, Steve Welch and others that have helped in the development of Concept Expansion. References 1. Baroni, M., Bernardini, S., Ferraresi, A., Zanchetta, E.: The wacky wide web: A col- lection of very large linguistically processed web-crawled corpora. In: Language Re- sources and Evaluation 43 (3). pp. 209–226 (2009), http://wacky.sslmit.unibo. it/lib/exe/fetch.php?media=papers:wacky_2008.pdf 2. Coden, A., Gruhl, D., Lewis, N., Tanenblatt, M.A., Terdiman, J.: SPOT the drug! an unsupervised pattern matching method to extract drug names from very large clinical corpora. In: IEEE HISB 2012. pp. 33–39 (2012) 3. Daiber, J., Jakob, M., Hokamp, C., Mendes, P.N.: Improving efficiency and accu- racy in multilingual entity extraction. In: I-SEMANTICS. pp. 121–124 (2013) 4. De Boni, M., Manandhar, S.: An analysis of clarification dialogue for question answering. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1. pp. 48–55. NAACL ’03, Association for Computational Linguistics, Stroudsburg, PA, USA (2003), http://dx.doi.org/10.3115/1073445.1073452 5. Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., Bizer, C.: DBpedia - A Large- scale, Multilingual Knowledge Base Extracted from Wikipedia. Semantic Web Journal (2014) 6. Loos, B., Porzel, R.: Proceedings of the 5th SIGdial Workshop on Discourse and Dialogue at HLT-NAACL 2004, chap. Resolution of Lexical Ambiguities in Spoken Dialogue System (2004), http://aclweb.org/anthology/W04-2312 7. de Marneffe, M.C., MacCartney, B., Manning, C.D.: Generating typed dependency parses from phrase structure parses. In: Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06) (2006) 8. McCray, A., Aronson, A., Browne, A., Rindflesh, T., Razi, A., Srinivasann, S.: UMLS knowledge for biomedical language processing. Bull Med Libr As- soc.;81:184194 (1993) 9. Mendes, P.N., Jakob, M., Bizer, C.: DBpedia for NLP: A Multilingual Cross- domain Knowledge Base. In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12). pp. 23–25. Istanbul, Turkey (2012) 10. Mendes, P.N., Jakob, M., Garcı́a-Silva, A., Bizer, C.: DBpedia Spotlight: shedding light on the web of documents. In: Proceedings the 7th International Conference on Semantic Systems, I-SEMANTICS 2011. pp. 1–8 (2011) 11. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word repre- sentations in vector space. arXiv preprint arXiv:1301.3781 (2013) 12. Qadir, A., Mendes, P.N., Gruhl, D., Lewis, N.: Semantic lexicon induction from twitter with pattern relatedness and flexible term length. In: AAAI 2015. pp. 2432– 2439 (2015) 13. Savova, G.K., Coden, A.R., Sominsky, I.L., Johnson, R., Ogren, P.V., Groen, P.C.d., Chute, C.G.: Word sense disambiguation across two domains: Biomedical literature and clinical notes. Journal of Biomedical Informatics 41(6), 1088–1100 (2015/12/22 2008), http://dx.doi.org/10.1016/j.jbi.2008.02.003 14. Schlangen, D.: Causes and strategies for requesting clarification in dialogue. In: Proceedings of the 5th Workshop of the ACL SIG on Discourse and Dialogue (2004) 15. Stoyanchev, S., Liu, A., Hirschberg, J.: Towards Natural Clarification Questions in Dialogue Systems. In: AISB Symposium on ”Questions, discourse and dialogue: 20 years after Making it Explicit,” AISB-50. Goldsmiths, London, UK (2014)