=Paper=
{{Paper
|id=Vol-2601/kars2019_paper_03
|storemode=property
|title=A Distributed Semantic Model based Method for Instance Disambiguation in User-generated Short Texts
|pdfUrl=https://ceur-ws.org/Vol-2601/kars2019_paper_03.pdf
|volume=Vol-2601
|authors=Jiaqi Yang,Yongjun Li,Congjie Gao
|dblpUrl=https://dblp.org/rec/conf/cikm/YangLG19
}}
==A Distributed Semantic Model based Method for Instance Disambiguation in User-generated Short Texts==
A distributed semantic model based method for instance disambiguation in user-generated short texts Jiaqi Yang Yongjun Li∗ Congjie Gao School of Computer School of Computer School of Computer Northwestern Polytechnical Northwestern Polytechnical Northwestern Polytechnical University University University Xi’an, Shaanxi 710072, China Xi’an, Shaanxi 710072, China Xi’an, Shaanxi 710072, China 1468608569@qq.com lyj@nwpu.edu.cn 2451408761@qq.com ABSTRACT to improve the performance of disambiguation [1–3]. Generally, Instance disambiguation is to obtain the concept of the target in- there are two strategies. The first one is to use statistical models stance in context, which has been attracting much attention from to obtain the topic of the UGST, and then determine the meaning academia. Existing methods are highly dependent on similar or of the ambiguous instance based on the topic [3]. Due to the spar- related instances in context. However, the number of instances sity of textual content, building an effective statistical model may that can be extracted from a user-generated short text is limited. not be easy. The second strategy is to use other types of terms for To tackle this problem, we propose a distributed semantic model help. Wen et al. [1] found that verbs and adjectives are also helpful (DSM) based method, which consists of three parts. 1) Measuring for disambiguation. Thus, they constructed a co-occurrence net- the correlation between contextual terms and each concept of the work for typed terms, and then chose the most related contextual ambiguous instance based on DSMs; 2) Filtering out uninforma- term for disambiguation. However, the co-occurrence networks tive terms based on the correlations distribution over the concepts, are word-based, which cannot apply to multi-word expressions which reduces noise interference; 3) Prioritizing the informative (MWEs). terms to highlight their discriminating capabilities. The concept In this paper, we propose an Instance Disambiguation method with the maximum correlation score is considered as the meaning with Context Awareness (IDwCA), which focuses on utilizing vari- of the target instance. Experiment results demonstrate that the ous types of contextual terms for disambiguation. Generally, some proposed method outperforms baseline methods. contextual terms cannot provide us with useful disambiguation information. For convenience, we call them uninformative terms. KEYWORDS Otherwise, they are informative terms. To avoid noise interference, we calculate the correlation between contextual terms and each instance disambiguation; distributed semantic model; user-generated concept of the target instance to filter out uninformative terms. short text An important basis is the measurement of correlation. The DSMs ACM Reference Format: and Probase are used in the measurement of correlation, which Jiaqi Yang, Yongjun Li, and Congjie Gao. 2020. A distributed semantic model is effective and lightweight. Further, for the remaining contextual based method for instance disambiguation in user-generated short texts. terms (informative terms), we prioritize each term to highlight their In Proceedings of KaRS 2019 Second Workshop on Knowledge-Aware and discrimination. Finally, we recalculated the correlation between Conversational Recommender Systems (KaRS 2109). ACM, New York, NY, USA, 4 pages. informative terms and each concept of the target instance. The concept with the maximum score is considered as the meaning of the target instance. Experiments on ground-truth datasets illustrate 1 INTRODUCTION the superiority of IDwCA over the-state-of-art methods. In recent years, user-generated short texts (UGSTs) swept the world at an alarming rate. The study of these data could bring tremendous value for business organizations. To fully exploit these data, we 2 INSTANCE DISAMBIGUATION need to understand them better. However, there are some ambigu- 2.1 Problem definition ous instances in UGSTs, which has a great impact on understanding. A term t is a word or a MWE. In this paper, we only consider noun Therefore, instance disambiguation has been attracting much at- terms, verb (v) terms and adjective (adj) terms, which are very tention from academia. helpful for disambiguation. In addition, for noun terms, we refine Many scholars attempt to eliminate ambiguity based on instances them into instances and concepts. While an instance e is a concrete [6] in context. However, an inevitable challenge is the number of in- object and a concept c is a general and abstract description of a set stances contained in a UGST is limited. Recently, some efforts have of instances. For example, "banana" and "grape" are instances, and been made to learn knowledge from the context of target instance they can be explained by the concept "fruit". ∗ Yongjun Li is the corresponding author. Problem Formulation 1. Instance disambiguation. Given KaRS 2109, November 3rd-7th, 2019, Beijing, China. a UGST T = {t 1 , t 2 , ..., tm }, wherein ti denotes a term. Assume term 2019. ACM ISBN Copyright © 2019 for this paper by its authors. Use permitted under tk is an ambiguous term, and its candidate concept set is denoted by Creative Commons License Attribution 4.0 International (CC BY 4.0). C = {c j |j = 1, 2, ..., l }. We define tk as the target instance and other KaRS 2109, November 3rd-7th, 2019, Beijing, China. Yang and Li et al. terms in T as contextual terms for tk . The task of IDwCA is to identify - If t is an instance, the context is all the concepts it belongs the most approximate concept of tk from C. to. - If t is a verb, or an adjective, because it has no hypernyms The key issue of Problem 1 is to select related terms that have [7] in Probase, thus its context is empty. high discriminating capabilities for disambiguation. The main dif- ference from existing work is that we use the corpus and knowledge After then, we transfer the context S t into a vector It as shown in information together to measure the semantic correlation of terms Eq.(4), where each element is the typicality score between t and and then choose more types of contextual terms for disambiguation the term in its context. ( rather than solely relying on instances. {P(c i1 |t)|i1 = 1, ..., m1}, t .type = e It = (4) {P(ei2 |t)|i2 = 1, ..., m2}, t .type = c 2.2 Proposed approach Then, the measurement of correlation based on Probase can be In IDwCA, first, DSMs and Probase are used to measure the corre- expressed as Eq.(5) lation between all contextual terms and each concept of the target (Í instance. Second, the Kullback Leiber (KL) divergence is employed e i 2 ∈S t ∩S c P (e i 2 |c)∗P (e i 2 |t ) , t .type = c to filter out uninformative terms. Then for the remaining infor- R P (t, c) = Í | |I t | |∗| |Ic | | (5) mative terms, we prioritize them to highlight their discrimination. c i 1 ∈S t P(c i1 |t) ∗ R P (c i1 , c), t .type = e Finally, based on these informative terms, we obtain the concept of where || • || denotes the norm of a vector. the target instance. Finally, we use a strategy to integrate two parts linearly. In summary, the semantic correlation between terms and concepts 2.2.1 Correlation calculation between terms and concepts. We can be calculated by Eq.(6). could easily determine the most appropriate concept of the target ( instance, if we have the knowledge about the semantic correlation R D (t, c), t .type ∈ {v, adj} between contextual terms and concepts. We use DSMs for help, R(t, c) = θ ∗ R D (t, c) + (1 − θ ) ∗ R P (t, c), t .type ∈ {e, c} which focuses on surrounding context of a word and is ideal for (6) calculating correlation. However, they cannot deal with MWEs. We where θ is a tuning parameter. use semantic composition to solve this problem. Given a MWE, denotes as p. Assume there are N words in p. Given the semantic 2.2.2 Contextual term filtering. Normally, some contextual terms vector of each word, the vector of p can be calculated by Eq.(1). do not contains useful disambiguation information, so we filter them out to avoid noise interference. For clarity, we take "the apple N is really delicious" as an example. Based on "delicious", we know Õ v(p) = v(wc ) (1) c=1 "apple" is "a kind of fruit". This is because "delicious" is more related to "fruit" than to "company". However, if we filter out the unin- That is, the vector of p is the sum of the vectors of all the words in formative terms directly according to the correlation scores, we it. However, it ignores the syntactic relation between words and need to set a threshold dynamically, which poses a big challenge. may introduce too much noise. To solve this problem, we assign Following [1], we employ the KL divergence. First, we assume that weights to words based on their part-of-speech in p, where the the probabilities of concepts of the target instance are the same. weights of nouns, verbs and adjectives are set to 1, and the rest is That is, it fits a uniform distribution. Second we calculate the corre- set to 0. Then, the Eq.(1) can be further expressed as Eq.(2). lation between contextual terms and each concept, and normalize N Õ the scores to get a new distribution. Then, the KL divergence is v(p) = ac ∗ v(wc ) (2) used to measure the divergence between two distributions. The c=1 greater the divergence is, the more important the role of the term where ac denotes the weight of wc , ac ∈ {0, 1}. Finally, the cosine is. Finally, based on KL divergence, we set a threshold to filter out metric is used to calculate the correlation, as shown in Eq.(3). uninformative terms and obtain a new set of informative terms, denotes as ICT . R D (t, c) = cos(v(t), v(c)) (3) 2.2.3 Weights of informative terms. Generally, the concept of Preliminary evaluation shows that the DSM-based method works the target instance depends heavily on the choice of contextual reasonably well for many pairs of terms, but for some noun terms, terms. Take "the engineer is eating the apple" as an example, the the results are less satisfactory. We use Probase to fill this gap, which ICT is {"engineer","eating"}, the concept of "apple" is "company" provides isA knowledge for concepts and instances, and two typi- according to "engineer", while its concept is "fruit" if based on cality scores for a concept/instance pair: P(e |c) = n(c, e)/n(c) "eating". However, an ambiguous instance cannot has different and P(c |e) = n(c, e)/n(e), where n(•) refers to the number of occur- concepts simultaneously. To solve this problem, we prioritize each rences of a given term or a pair of terms in Probase. Following [5], informative term to highlight their contributions. Intuition is that we use the corresponding context of terms to calculate correlation. the closer the informative is to the target instance, the greater its Given a term t, we first extract its context S t from Probase ac- contribution. We propose a weighting function based on sigmoid, cording to its type. The context of term t is detailed as follows. which is described in Eq.(7). - If t is a concept, its context is all the instances that can be 1 explained by it. weiдht(ti ) = 1.5 − (7) 1 + e −x Instance Disambiguation in UGST KaRS 2109, November 3rd-7th, 2019, Beijing, China. where x represents the context distance, and the context distance refers to the number of terms between ti and the target instance. Based on Eq.(6) and Eq.(7), we define the semantic correlation between all informative terms and a concept of the target instance, R(ICT , c), as described in Eq.(8). Õ R(ICT , c) = weiдht(tp ) ∗ R(tp , c) (8) tp ∈I CT The concept with the maximum score is the result of IDwCA. 3 EXPERMIMENTS 3.1 Datasets and baseline algorithms As we know, there is no gold standard metric for evaluating in- stance disambiguation methods. Therefore, we evaluate our method in terms of classification. To verify the validity and generality of the method, we chose Foursquare, Twitter and Facebook as data sources. These social networking sites are popular sites and provide us with open data acquisition APIs. Then, we randomly selected UGSTs from the acquired data contained ambiguous instance "ap- ple", "Harry Potter" and "python". We classified the data manually. Figure 1: Results on TW, FS and FB For convenience, three datasets are abbreviated as FS, FB and TW, respectively. Table 1 shows the statistics of the ambiguous instance 1.0 "apple" on three datasets.And the continuous Bag-of-Words model 0.8 is used in our experiments to obtain the semantic vector of words, 0.6 which is the one of the most commonly used DMSs. The wiki1 PCC dataset is used for training the model. We compare our approach 0.4 with the following representative methods: STC-NB [6] and TD [4]. 0.2 Table 1: Details of FS, FB and TW 0.0 0.0 0.2 0.4 0.6 0.8 1.0 Datasets θ FS FB TW category Figure 2: Results w.r.t. θ Figure 3: Results on WP, WS fruit 134 10 131 company 42 674 19 well-known dataset WordSim353 2 (WS) for words and one labeled 3.2 Performance comparison between IDwCA data WP for MWEs created by [5]. We compare our method with and existing work the baseline algorithms. To evaluate the experiment, we computed the Pearson Correlation Coefficient (PCC) to measure the machine We illustrate the results on three datasets in Figure 1. From the ratings and the human ratings over the two datasets. From the results, we reach the following conclusions. IDwCA outperforms results shown in Figure 3, we observe that IDwCA performs the all baselines, which validates its effectiveness. It is reasonable since best on all datasets. This is because knowledge bases are more IDwCA 1) utilizes information from DSMs and Probase to mea- suitable for noun-based terms than for other types of terms, and sure the semantic correlation, and then chooses various types of IDwCA uses a combination of DSMs to solve ts problem. Meanwhile, contextual terms for disambiguation, not just relying on instances; as shown in Eq.(6), the threshold θ is used to tune the importance 2) assigns weights to informative terms based on their context of each part. To study the effect of θ , we conduct experiment based distances, which reduces noise interference. on different values of θ . The WP dataset is used in the experiment. The STC-NB performs worse than other methods, because it As shown in Figure 2, we can see DSMs contribute more to the only considers similar instances, and the correlation between terms correlation. This is mainly due to the fact that DSMs are more are calculated by their co-occurrence times in Probase. Compared suitable for oral expressions. In our experiments, we select the with IDwCA, TD achieves worse performances. This is because it value of θ = 0.75 as an optimal value. divides terms into two types: instances and concepts, which may lead to wrong judgements. And its correlation calculation method does not work well in oral expressions. 4 CONCLUSIONS In this paper, we use DSMs and Probase to measure the correlation 3.3 Performance of correlation calculation of terms and then choose various types of contextual terms for method disambiguation. Experiments on ground-truth datasets validate the Further, we explore the performance our correlation calculation effectiveness of the proposed method. method. We utilize two datasets in the following experiments: one 1 https://dumps.wikimedia.org/enwiki/latest/ 2 http://alfonseca.org/eng/research/wordsim353.html KaRS 2109, November 3rd-7th, 2019, Beijing, China. Yang and Li et al. REFERENCES Cybernetics 48, 9 (2018), 2697–2711. [1] Wen Hua, Zhongyuan Wang, Haixun Wang, Kai Zheng, and Xiaofang Zhou. 2017. [5] Pei-Pei Li, Haixun Wang, Kenny Q. Zhu, Zhongyuan Wang, Xuegang Hu, and Understand Short Texts by Harvesting and Analyzing Semantic Knowledge. IEEE Xindong Wu. 2015. A Large Probabilistic Semantic Network Based Approach to Trans. Knowl. Data Eng. 29, 3 (2017), 499–512. Compute Term Similarity. IEEE Trans. Knowl. Data Eng. 27, 10 (2015), 2604–2617. [2] Heyan Huang, Yashen Wang, Chong Feng, Zhirun Liu, and Qiang Zhou. 2018. [6] Yangqiu Song, Haixun Wang, Zhongyuan Wang, Hongsong Li, and Weizhu Chen. Leveraging Conceptualization for Short-Text Embedding. IEEE Trans. Knowl. Data 2011. Short Text Conceptualization Using a Probabilistic Knowledgebase. In IJCAI Eng. 30, 7 (2018), 1282–1295. 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, [3] Dongwoo Kim, Haixun Wang, and Alice H. Oh. 2013. Context-Dependent Con- Barcelona, Catalonia, Spain, July 16-22, 2011, Toby Walsh (Ed.). IJCAI/AAAI, Palo ceptualization. In IJCAI 2013, Proceedings of the 23rd International Joint Conference Alto, CA, USA, 2330–2336. on Artificial Intelligence, Beijing, China, August 3-9, 2013, Francesca Rossi (Ed.). [7] Wentao Wu, Hongsong Li, Haixun Wang, and Kenny Qili Zhu. 2012. Probase: a IJCAI/AAAI, Palo Alto, CA, USA, 2654–2661. probabilistic taxonomy for text understanding. In Proceedings of the ACM SIGMOD [4] Pei-Pei Li, Lu He, Haiyan Wang, Xuegang Hu, Yuhong Zhang, Lei Li, and Xindong International Conference on Management of Data, SIGMOD 2012, Scottsdale, AZ, Wu. 2018. Learning From Short Text Streams With Topic Drifts. IEEE Trans. USA, May 20-24, 2012. ACM, New York, NY, USA, 481–492.