=Paper= {{Paper |id=Vol-2601/kars2019_paper_03 |storemode=property |title=A Distributed Semantic Model based Method for Instance Disambiguation in User-generated Short Texts |pdfUrl=https://ceur-ws.org/Vol-2601/kars2019_paper_03.pdf |volume=Vol-2601 |authors=Jiaqi Yang,Yongjun Li,Congjie Gao |dblpUrl=https://dblp.org/rec/conf/cikm/YangLG19 }} ==A Distributed Semantic Model based Method for Instance Disambiguation in User-generated Short Texts== https://ceur-ws.org/Vol-2601/kars2019_paper_03.pdf
          A distributed semantic model based method for instance
                disambiguation in user-generated short texts
                      Jiaqi Yang                                            Yongjun Li∗                                    Congjie Gao
              School of Computer                                     School of Computer                               School of Computer
          Northwestern Polytechnical                             Northwestern Polytechnical                       Northwestern Polytechnical
                   University                                             University                                       University
          Xi’an, Shaanxi 710072, China                           Xi’an, Shaanxi 710072, China                     Xi’an, Shaanxi 710072, China
              1468608569@qq.com                                        lyj@nwpu.edu.cn                                2451408761@qq.com

ABSTRACT                                                                             to improve the performance of disambiguation [1–3]. Generally,
Instance disambiguation is to obtain the concept of the target in-                   there are two strategies. The first one is to use statistical models
stance in context, which has been attracting much attention from                     to obtain the topic of the UGST, and then determine the meaning
academia. Existing methods are highly dependent on similar or                        of the ambiguous instance based on the topic [3]. Due to the spar-
related instances in context. However, the number of instances                       sity of textual content, building an effective statistical model may
that can be extracted from a user-generated short text is limited.                   not be easy. The second strategy is to use other types of terms for
To tackle this problem, we propose a distributed semantic model                      help. Wen et al. [1] found that verbs and adjectives are also helpful
(DSM) based method, which consists of three parts. 1) Measuring                      for disambiguation. Thus, they constructed a co-occurrence net-
the correlation between contextual terms and each concept of the                     work for typed terms, and then chose the most related contextual
ambiguous instance based on DSMs; 2) Filtering out uninforma-                        term for disambiguation. However, the co-occurrence networks
tive terms based on the correlations distribution over the concepts,                 are word-based, which cannot apply to multi-word expressions
which reduces noise interference; 3) Prioritizing the informative                    (MWEs).
terms to highlight their discriminating capabilities. The concept                       In this paper, we propose an Instance Disambiguation method
with the maximum correlation score is considered as the meaning                      with Context Awareness (IDwCA), which focuses on utilizing vari-
of the target instance. Experiment results demonstrate that the                      ous types of contextual terms for disambiguation. Generally, some
proposed method outperforms baseline methods.                                        contextual terms cannot provide us with useful disambiguation
                                                                                     information. For convenience, we call them uninformative terms.
KEYWORDS                                                                             Otherwise, they are informative terms. To avoid noise interference,
                                                                                     we calculate the correlation between contextual terms and each
instance disambiguation; distributed semantic model; user-generated
                                                                                     concept of the target instance to filter out uninformative terms.
short text
                                                                                     An important basis is the measurement of correlation. The DSMs
ACM Reference Format:                                                                and Probase are used in the measurement of correlation, which
Jiaqi Yang, Yongjun Li, and Congjie Gao. 2020. A distributed semantic model          is effective and lightweight. Further, for the remaining contextual
based method for instance disambiguation in user-generated short texts.              terms (informative terms), we prioritize each term to highlight their
In Proceedings of KaRS 2019 Second Workshop on Knowledge-Aware and
                                                                                     discrimination. Finally, we recalculated the correlation between
Conversational Recommender Systems (KaRS 2109). ACM, New York, NY,
USA, 4 pages.
                                                                                     informative terms and each concept of the target instance. The
                                                                                     concept with the maximum score is considered as the meaning of
                                                                                     the target instance. Experiments on ground-truth datasets illustrate
1    INTRODUCTION
                                                                                     the superiority of IDwCA over the-state-of-art methods.
In recent years, user-generated short texts (UGSTs) swept the world
at an alarming rate. The study of these data could bring tremendous
value for business organizations. To fully exploit these data, we                    2 INSTANCE DISAMBIGUATION
need to understand them better. However, there are some ambigu-                      2.1 Problem definition
ous instances in UGSTs, which has a great impact on understanding.                   A term t is a word or a MWE. In this paper, we only consider noun
Therefore, instance disambiguation has been attracting much at-                      terms, verb (v) terms and adjective (adj) terms, which are very
tention from academia.                                                               helpful for disambiguation. In addition, for noun terms, we refine
   Many scholars attempt to eliminate ambiguity based on instances                   them into instances and concepts. While an instance e is a concrete
[6] in context. However, an inevitable challenge is the number of in-                object and a concept c is a general and abstract description of a set
stances contained in a UGST is limited. Recently, some efforts have                  of instances. For example, "banana" and "grape" are instances, and
been made to learn knowledge from the context of target instance                     they can be explained by the concept "fruit".
∗ Yongjun Li is the corresponding author.
                                                                                        Problem Formulation 1. Instance disambiguation. Given
KaRS 2109, November 3rd-7th, 2019, Beijing, China.                                   a UGST T = {t 1 , t 2 , ..., tm }, wherein ti denotes a term. Assume term
2019. ACM ISBN Copyright © 2019 for this paper by its authors. Use permitted under   tk is an ambiguous term, and its candidate concept set is denoted by
Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                                                                     C = {c j |j = 1, 2, ..., l }. We define tk as the target instance and other
KaRS 2109, November 3rd-7th, 2019, Beijing, China.                                                                                    Yang and Li et al.


terms in T as contextual terms for tk . The task of IDwCA is to identify         - If t is an instance, the context is all the concepts it belongs
the most approximate concept of tk from C.                                         to.
                                                                                 - If t is a verb, or an adjective, because it has no hypernyms
   The key issue of Problem 1 is to select related terms that have                 [7] in Probase, thus its context is empty.
high discriminating capabilities for disambiguation. The main dif-
ference from existing work is that we use the corpus and knowledge          After then, we transfer the context S t into a vector It as shown in
information together to measure the semantic correlation of terms           Eq.(4), where each element is the typicality score between t and
and then choose more types of contextual terms for disambiguation           the term in its context.
                                                                                             (
rather than solely relying on instances.                                                       {P(c i1 |t)|i1 = 1, ..., m1}, t .type = e
                                                                                        It =                                                 (4)
                                                                                               {P(ei2 |t)|i2 = 1, ..., m2}, t .type = c
2.2     Proposed approach
                                                                              Then, the measurement of correlation based on Probase can be
In IDwCA, first, DSMs and Probase are used to measure the corre-
                                                                            expressed as Eq.(5)
lation between all contextual terms and each concept of the target
                                                                                              (Í
instance. Second, the Kullback Leiber (KL) divergence is employed                               e i 2 ∈S t ∩S c P (e i 2 |c)∗P (e i 2 |t )
                                                                                                                                           , t .type = c
to filter out uninformative terms. Then for the remaining infor-                  R P (t, c) = Í            | |I t | |∗| |Ic | |                         (5)
mative terms, we prioritize them to highlight their discrimination.                              c i 1 ∈S t P(c i1 |t) ∗ R P (c i1 , c), t .type = e
Finally, based on these informative terms, we obtain the concept of         where || • || denotes the norm of a vector.
the target instance.                                                          Finally, we use a strategy to integrate two parts linearly. In
                                                                            summary, the semantic correlation between terms and concepts
   2.2.1 Correlation calculation between terms and concepts. We
                                                                            can be calculated by Eq.(6).
could easily determine the most appropriate concept of the target                        (
instance, if we have the knowledge about the semantic correlation                         R D (t, c),                             t .type ∈ {v, adj}
between contextual terms and concepts. We use DSMs for help,                  R(t, c) =
                                                                                          θ ∗ R D (t, c) + (1 − θ ) ∗ R P (t, c), t .type ∈ {e, c}
which focuses on surrounding context of a word and is ideal for                                                                                      (6)
calculating correlation. However, they cannot deal with MWEs. We            where θ is a tuning parameter.
use semantic composition to solve this problem. Given a MWE,
denotes as p. Assume there are N words in p. Given the semantic                 2.2.2 Contextual term filtering. Normally, some contextual terms
vector of each word, the vector of p can be calculated by Eq.(1).           do not contains useful disambiguation information, so we filter
                                                                            them out to avoid noise interference. For clarity, we take "the apple
                                    N
                                                                            is really delicious" as an example. Based on "delicious", we know
                                    Õ
                           v(p) =         v(wc )                     (1)
                                    c=1
                                                                            "apple" is "a kind of fruit". This is because "delicious" is more related
                                                                            to "fruit" than to "company". However, if we filter out the unin-
That is, the vector of p is the sum of the vectors of all the words in      formative terms directly according to the correlation scores, we
it. However, it ignores the syntactic relation between words and            need to set a threshold dynamically, which poses a big challenge.
may introduce too much noise. To solve this problem, we assign              Following [1], we employ the KL divergence. First, we assume that
weights to words based on their part-of-speech in p, where the              the probabilities of concepts of the target instance are the same.
weights of nouns, verbs and adjectives are set to 1, and the rest is        That is, it fits a uniform distribution. Second we calculate the corre-
set to 0. Then, the Eq.(1) can be further expressed as Eq.(2).              lation between contextual terms and each concept, and normalize
                                 N
                                 Õ                                          the scores to get a new distribution. Then, the KL divergence is
                        v(p) =         ac ∗ v(wc )                   (2)    used to measure the divergence between two distributions. The
                                 c=1                                        greater the divergence is, the more important the role of the term
where ac denotes the weight of wc , ac ∈ {0, 1}. Finally, the cosine        is. Finally, based on KL divergence, we set a threshold to filter out
metric is used to calculate the correlation, as shown in Eq.(3).            uninformative terms and obtain a new set of informative terms,
                                                                            denotes as ICT .
                     R D (t, c) = cos(v(t), v(c))                    (3)
                                                                               2.2.3 Weights of informative terms. Generally, the concept of
   Preliminary evaluation shows that the DSM-based method works             the target instance depends heavily on the choice of contextual
reasonably well for many pairs of terms, but for some noun terms,           terms. Take "the engineer is eating the apple" as an example, the
the results are less satisfactory. We use Probase to fill this gap, which   ICT is {"engineer","eating"}, the concept of "apple" is "company"
provides isA knowledge for concepts and instances, and two typi-            according to "engineer", while its concept is "fruit" if based on
cality scores for a concept/instance pair : P(e |c) = n(c, e)/n(c)     "eating". However, an ambiguous instance cannot has different
and P(c |e) = n(c, e)/n(e), where n(•) refers to the number of occur-       concepts simultaneously. To solve this problem, we prioritize each
rences of a given term or a pair of terms in Probase. Following [5],        informative term to highlight their contributions. Intuition is that
we use the corresponding context of terms to calculate correlation.         the closer the informative is to the target instance, the greater its
   Given a term t, we first extract its context S t from Probase ac-        contribution. We propose a weighting function based on sigmoid,
cording to its type. The context of term t is detailed as follows.          which is described in Eq.(7).
      - If t is a concept, its context is all the instances that can be                                                      1
        explained by it.                                                                           weiдht(ti ) = 1.5 −                                  (7)
                                                                                                                          1 + e −x
Instance Disambiguation in UGST                                                                  KaRS 2109, November 3rd-7th, 2019, Beijing, China.


where x represents the context distance, and the context distance
refers to the number of terms between ti and the target instance.
   Based on Eq.(6) and Eq.(7), we define the semantic correlation
between all informative terms and a concept of the target instance,
R(ICT , c), as described in Eq.(8).
                             Õ
               R(ICT , c) =         weiдht(tp ) ∗ R(tp , c)
                                                                (8)
                                tp ∈I CT
The concept with the maximum score is the result of IDwCA.

3 EXPERMIMENTS
3.1 Datasets and baseline algorithms
As we know, there is no gold standard metric for evaluating in-
stance disambiguation methods. Therefore, we evaluate our method
in terms of classification. To verify the validity and generality of
the method, we chose Foursquare, Twitter and Facebook as data
sources. These social networking sites are popular sites and provide
us with open data acquisition APIs. Then, we randomly selected
UGSTs from the acquired data contained ambiguous instance "ap-
ple", "Harry Potter" and "python". We classified the data manually.                              Figure 1: Results on TW, FS and FB
For convenience, three datasets are abbreviated as FS, FB and TW,
respectively. Table 1 shows the statistics of the ambiguous instance           1.0

"apple" on three datasets.And the continuous Bag-of-Words model                0.8
is used in our experiments to obtain the semantic vector of words,
                                                                               0.6
which is the one of the most commonly used DMSs. The wiki1
                                                                            PCC




dataset is used for training the model. We compare our approach                0.4

with the following representative methods: STC-NB [6] and TD [4].              0.2
               Table 1: Details of FS, FB and TW                               0.0
                                                                                     0.0   0.2   0.4       0.6   0.8   1.0
                           Datasets                                                                    θ
                                       FS     FB TW
              category                                                            Figure 2: Results w.r.t. θ                 Figure 3: Results on WP, WS
                       fruit          134 10 131
                    company            42 674 19                           well-known dataset WordSim353 2 (WS) for words and one labeled
3.2     Performance comparison between IDwCA                               data WP for MWEs created by [5]. We compare our method with
        and existing work                                                  the baseline algorithms. To evaluate the experiment, we computed
                                                                           the Pearson Correlation Coefficient (PCC) to measure the machine
We illustrate the results on three datasets in Figure 1. From the
                                                                           ratings and the human ratings over the two datasets. From the
results, we reach the following conclusions. IDwCA outperforms
                                                                           results shown in Figure 3, we observe that IDwCA performs the
all baselines, which validates its effectiveness. It is reasonable since
                                                                           best on all datasets. This is because knowledge bases are more
IDwCA 1) utilizes information from DSMs and Probase to mea-
                                                                           suitable for noun-based terms than for other types of terms, and
sure the semantic correlation, and then chooses various types of
                                                                           IDwCA uses a combination of DSMs to solve ts problem. Meanwhile,
contextual terms for disambiguation, not just relying on instances;
                                                                           as shown in Eq.(6), the threshold θ is used to tune the importance
2) assigns weights to informative terms based on their context
                                                                           of each part. To study the effect of θ , we conduct experiment based
distances, which reduces noise interference.
                                                                           on different values of θ . The WP dataset is used in the experiment.
    The STC-NB performs worse than other methods, because it
                                                                           As shown in Figure 2, we can see DSMs contribute more to the
only considers similar instances, and the correlation between terms
                                                                           correlation. This is mainly due to the fact that DSMs are more
are calculated by their co-occurrence times in Probase. Compared
                                                                           suitable for oral expressions. In our experiments, we select the
with IDwCA, TD achieves worse performances. This is because it
                                                                           value of θ = 0.75 as an optimal value.
divides terms into two types: instances and concepts, which may
lead to wrong judgements. And its correlation calculation method
does not work well in oral expressions.                                    4      CONCLUSIONS
                                                                           In this paper, we use DSMs and Probase to measure the correlation
3.3     Performance of correlation calculation                             of terms and then choose various types of contextual terms for
        method                                                             disambiguation. Experiments on ground-truth datasets validate the
Further, we explore the performance our correlation calculation            effectiveness of the proposed method.
method. We utilize two datasets in the following experiments: one
1 https://dumps.wikimedia.org/enwiki/latest/                               2 http://alfonseca.org/eng/research/wordsim353.html
KaRS 2109, November 3rd-7th, 2019, Beijing, China.                                                                                                             Yang and Li et al.


REFERENCES                                                                                      Cybernetics 48, 9 (2018), 2697–2711.
[1] Wen Hua, Zhongyuan Wang, Haixun Wang, Kai Zheng, and Xiaofang Zhou. 2017.               [5] Pei-Pei Li, Haixun Wang, Kenny Q. Zhu, Zhongyuan Wang, Xuegang Hu, and
    Understand Short Texts by Harvesting and Analyzing Semantic Knowledge. IEEE                 Xindong Wu. 2015. A Large Probabilistic Semantic Network Based Approach to
    Trans. Knowl. Data Eng. 29, 3 (2017), 499–512.                                              Compute Term Similarity. IEEE Trans. Knowl. Data Eng. 27, 10 (2015), 2604–2617.
[2] Heyan Huang, Yashen Wang, Chong Feng, Zhirun Liu, and Qiang Zhou. 2018.                 [6] Yangqiu Song, Haixun Wang, Zhongyuan Wang, Hongsong Li, and Weizhu Chen.
    Leveraging Conceptualization for Short-Text Embedding. IEEE Trans. Knowl. Data              2011. Short Text Conceptualization Using a Probabilistic Knowledgebase. In IJCAI
    Eng. 30, 7 (2018), 1282–1295.                                                               2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence,
[3] Dongwoo Kim, Haixun Wang, and Alice H. Oh. 2013. Context-Dependent Con-                     Barcelona, Catalonia, Spain, July 16-22, 2011, Toby Walsh (Ed.). IJCAI/AAAI, Palo
    ceptualization. In IJCAI 2013, Proceedings of the 23rd International Joint Conference       Alto, CA, USA, 2330–2336.
    on Artificial Intelligence, Beijing, China, August 3-9, 2013, Francesca Rossi (Ed.).    [7] Wentao Wu, Hongsong Li, Haixun Wang, and Kenny Qili Zhu. 2012. Probase: a
    IJCAI/AAAI, Palo Alto, CA, USA, 2654–2661.                                                  probabilistic taxonomy for text understanding. In Proceedings of the ACM SIGMOD
[4] Pei-Pei Li, Lu He, Haiyan Wang, Xuegang Hu, Yuhong Zhang, Lei Li, and Xindong               International Conference on Management of Data, SIGMOD 2012, Scottsdale, AZ,
    Wu. 2018. Learning From Short Text Streams With Topic Drifts. IEEE Trans.                   USA, May 20-24, 2012. ACM, New York, NY, USA, 481–492.