=Paper=
{{Paper
|id=Vol-1986/SML17_paper_4
|storemode=property
|title=Enhancing Topical Word Semantic for Relevance Feature Selection
|pdfUrl=https://ceur-ws.org/Vol-1986/SML17_paper_4.pdf
|volume=Vol-1986
|authors=Abdullah Semran Alharbi,Yuefeng Li,Yue Xu
|dblpUrl=https://dblp.org/rec/conf/ijcai/AlharbiL017
}}
==Enhancing Topical Word Semantic for Relevance Feature Selection==
Enhancing Topical Word Semantic for Relevance
Feature Selection
Abdullah Semran Alharbi1,2 Yuefeng Li1 Yue Xu1
asaharbi@uqu.edu.sa y2.li@qut.edu.au yue.xu@qut.edu.au
1
School of Electrical Engineering and Computer Science
Queensland University of Technology
Brisbane, Australia
2
Department of Computer Science
Umm Al-Qura University
Makkah, Saudi Arabia
1 Introduction
LDA [BNJ03] is currently the most common prob-
Abstract abilistic topic model compared to similar mod-
els, such as probabilistic Latent Semantic Analy-
sis (pLSA) [Hof01], with a wide range of applica-
Unsupervised topic models, such as Latent tions [Ble12]. LDA statistically discovers hidden top-
Dirichlet Allocation (LDA), are widely used ics from documents as features to be used for di↵erent
as automated feature engineering tools for tex- tasks in information retrieval (IR) [WC06, WMW07],
tual data. They model words semantics based information filtering (IF) [GXL15] and for many other
on some latent topics on the basis that se- text mining and machine learning applications. LDA
mantically related words occur in similar doc- represents documents by a set of topics, and each topic
uments. However, words weights that are as- is a set of semantically related terms1 . Thus, it is ca-
signed by these topic models do not represent pable of clustering related words in a document col-
the semantic meaning of these words to user lection, which can reduce the impact of common prob-
information needs. In this paper, we present lems like polysemy, synonymy and information over-
an innovative and e↵ective extended random load [AZ12].
sets (ERS) model to enhance the semantic of The core and critical part of any text FS method
topical words. The proposed model is used as is the weighting function. It assigns a numerical value
a word weighting scheme for relevance feature (usually a real number) to each feature, which specifies
selection (FS). It accurately weights words how informative the feature is to the user’s information
based on their appearance in the LDA latent needs [ALA13]. In the context of probabilistic topic
topics and the relevant documents. The ex- modelling in general and LDA specifically, calculat-
perimental results, based on 50 collections of ing a term weight is done locally at its document-level
the standard RCV1 dataset and TREC topics based on two components; the term local document-
for information filtering, show that the pro- topics distributions and the global term-topics assign-
posed model significantly outperforms eight, ment. Therefore, in a set of similar documents, a spe-
state-of-the-art, baseline models in five stan- cific term might receive a di↵erent weight in each single
dard performance measures. document even though this term is semantically iden-
tical across all these documents. Such approach does
not accurately reflect on the semantic meaning and
Copyright c by the paper’s authors. Copying permitted for usefulness of this term to the entire user’s information
private and academic purposes. needs. It badly influences the performance of LDA
In: Proceedings
In: A. Editor, of
B.IJCAI Workshop
Coeditor on Semantic
(eds.): Machine
Proceedings of Learning
the XYZ
Workshop,(SML 2017), Aug
Location, 19-25 2017,
Country, Melbourne, Australia.
DD-MMM-YYYY, published at 1 In this paper, terms, words, keywords or unigrams are used
http://ceur-ws.org interchangeably.
for FS as it is uncertain and difficult to know which 2 Related Works
weight is more representative and should be assigned
In the literature, there is a significant amount of work
to the intended term. Would it be the average weight?
that extends and improves LDA to suit di↵erent needs
The highest? The lowest? The aggregated? Several
including text FS [ZPH08, TG09]. However, our model
experiments in various studies confirm that the local-
is intended for IF, and, to the best of our knowledge, it
global weighting approach of the LDA is ine↵ective for
is the first attempt to extend random sets [Mol06] to
relevant FS [GXL15].
functionally describe and interpret complex relation-
Given a document set that describes user infor-
ships that involve topical terms and other entities in a
mation needs, global statistics, such as document
document collection to enhance the semantic of topi-
frequency (df), reveal the discriminatory power of
cal words for relevance FS. Relevance is a fundamental
terms [LTSL09]. However, in IR, selecting terms based
concept in both IR and IF. IR mainly concerns about
on global weighting schemes did not show better re-
document’s relevance to a query for a specific subject.
trieval performance [MO10], because global statistics
However, IF discusses the document’s relevance to user
cannot describe the local importance of terms [MC13].
information needs [LAZ10]. In relevance discovery,
From the LDA’s perspective, it is challenging and
FS is a method that selects a subset of features that
still uncertain on how to use LDA’s local-global term
are relevant to user’s needs and thus removing those
weighting function in a global context due to the com-
that are irrelevant, redundant and noisy. Existing
plex relationships between terms and many entities
methods adopt di↵erent type of text features such as
that represent the entire collection. A term, for ex-
terms [LTSL09], phrases (n-grams) [ALA13], patterns
ample, might appear in multiple documents and LDA
(a pattern is a set of associated terms) [LAA+ 15], top-
topics, and each topic may also cover many documents
ics [DDF+ 90, Hof01, BNJ03] or a combination of them
or paragraphs that contain the same term. Therefore,
for better performance [WMW07, LAZ10, GXL15].
the hard question this research tries to answer is: how
The most efficient FS methods for relevance, are the
to generalise the local topic weight (at document level)
ones that are developed based on weighting function,
and combine it with global topical statistics such as the
which is the core and critical part of the selection al-
term frequency in both topics and relevant documents
gorithm [LAA+ 15]. Using LDA words weighting func-
for more discriminative and semantically representa-
tion for relevance is still limited and does not show
tive global term weighting scheme?
encouraging results [GXL15] including similar topic-
The aim of this research is to develop an e↵ective based models such as the pLSA [Hof01]. For bet-
topic-based FS model for relevance discovery. The ter performance, Gao et al (2015) [GXL15] integrate
model uses a hierarchical framework based on ERS pattern mining techniques into topic models to dis-
theory to assign a more representative weight to terms cover discriminative features. Such work is expensive
based on their appearance in LDA topics and all rel- and susceptible to the features-loss problem and also
evant documents. Therefore, two major contributions might be impacted by the uncertainty of the prob-
have been made in this paper to the fields of text FS abilistic topic model. ERS is proven to be e↵ective
and IF: (a) A new theoretical model based on multiple in describing complex relations between di↵erent enti-
ERS [Mol06] to represent and interpret the complex re- ties and interprets them as a function (weighting func-
lationships between long documents, their paragraphs, tion) [Li03]. Thus, the ERS-based models can be used
LDA topics and all terms in the collection, where a to weight closed sequential patterns more accurately
function describes each relationship; (b) A new and and thus facilitate the discovery of specific ones as ap-
e↵ective term weighting formula that assigns a more pears in [ALX14]. However, selecting the most useful
discriminately accurate weight to topical terms that patterns is challenging due to a large number of pat-
represent their relevance to the user information needs. terns generated from relevant documents using various
The formula generalises LDA’s local topic weight to a minimum supports (min sup), and also may lead to
global one using the proposed ERS theory and then feature-loss.
combines it with the frequency ratio of words in both
documents and topics to answer the question asked by
the authors. To test the e↵ectiveness of our model,
3 Background Overview
we conducted extensive experiments on RCV1 dataset For a given corpus C, the relevant long documents set
and the assessors’ relevance judgements of the TREC D✓C represents user’s information needs that might
filtering track. The results show that our model sig- have multiple subjects. The proposed model uses D
nificantly outperforms all used baseline FS models for for training where each document dx 2D has a set of
IF despite the type of text features they use (terms, paragraphs P S and each paragraph has a set of terms
phrases, patterns, topics or even a di↵erent combina- T . ⇥ is the set of all paragraphs in D and P S✓⇥. A
tion of them). set of terms ⌦ is the set of all unique words in D.
3.1 Latent Dirichlet Allocation
The proposed model uses LDA to reduce the dimen-
sionality of D to a set of manageable topics Z, where
V is the number of topics. LDA assumes that each
document has multiple latent topics [GXL15], and de-
fines each topic zj 2Z as a multinomial probability
distribution over all words in ⌦ as p(wi |zj ) in which
P|⌦|
wi 2⌦ and 1jV such that i p(wi |zj )=1. LDA
also represents a document d as a probabilistic mix-
ture of topics as p(zj |d). As a result, and based on
the number of latent topics, the probability (local
weight) of word wi in document d can be calculated as
PV
p(wi |d)= j=1 p(wi |zj )⇥p(zj |d) . Finally, all hidden
variables, p(wi |zj ) and p(zj |d), are statistically esti-
mated by the Gibbs sampling algorithm [SG07].
3.2 Random Set
Figure 1: our proposed model
A random set is a random object that has val-
ues, which are subsets that are taken from some words by generalising the topic’s local weight, and,
space [Mol06]. It works as an e↵ective measure then, combine it with the frequency ratio of words in
of uncertainty in imprecise data for decision analy- both documents and topics.
sis [Ngu08]. For example, let Z and ⌦ be finite sets
that represent topics and words respectively. is a
4.1 Extended Random Sets
set-valued mapping from Z (the evidence space) onto
⌦ that can be written as : Z ! 2⌦ , and P is a Let assume we have a set of top-
probability function defined on Z, thus the pair (P, ) ics Z={z1 , z2 , z3 , . . . , zV } in ⇥ and let
is called a random set [KSH12]. can be extended D= {d1 , d2 , d3 , . . . , dN } is a set of N relevant
as ⇠ :: Z ! 2⌦⇥[0,1] (also calledPan extended set- long documents. Each document dx consists of
valued mapping), which satisfies (w,p)2⇠(z) p=1 for
M paragraphs such as dx = {p1 , p2 , p3 , . . . , pM }. A
each z2Z. Let P be a probability function on Z, such paragraph py consists of a set of L words, for example,
P
that z2Z P (z)=1. We call (⇠, P ) an extended ran- py = {w1 , w2 , w3 , . . . , wL }. A word w is a keyword or
dom set. unigram, where the function words(p) returns a set
of words appear in paragraph p. A topic z can be
4 The Proposed Model defined as a probability distribution over the set of
words ⌦ where words(p)✓⌦ for every paragraph p2⇥.
The proposed model (Figure 1) deals with the local For each zi 2Z, let fi (w, zi ) be a frequency func-
weight problem of terms that is assigned by the LDA tion on ⌦, such that (zi )={w|w2⌦, fi (w, zi ) 0}
probability function (described in section 3.1) by ex- while the inverse mapping of is defined as
ploring all possible relationships between di↵erent en- 1
: ⌦ ! 2Z ; 1
(w)={z2Z|w2 (z)}. Also, for
tities that influence the weighting process. The target- each dj 2D, let fj (w, dj ) be a frequency function on
ing entities in our model are documents, paragraphs, ⌦, such that (dj ) = {w|w2⌦, fj (w, dj )>0} while
topics, and terms. The possible relationships between the inverse mapping of is defined as 1
: ⌦ !
these entities are complex (a set of one-to-many rela- 2D ; 1
(w)={d2D|w2 (d)}. These extended set-
tionships). For example, a document can have many valued mappings can decide a weighting function on
paragraphs and terms; a paragraph can have multiple ⌦, which satisfies sr :: ⌦ ! [0, +1) such that
topics; a topic can have many terms. Inversely, a topic
can cover many paragraphs, and a term can appear in X ⇣ X ⌘
many documents and topics. 1
sr(w) = · Pz (zi ) ⇥ fi (w, zi )
fj (w, dj )
In this model, we proposed three ERSs to describe dj 2 1 (w) zi 2 1 (w)
such complex relationships, where each ERS can be (1)
interpreted as a function by which we can determine where sr(w) is the combined weight of topical word w
the importance of the main entity in the relationship. at the collection level.
Then, the proposed ERS theory is used to develop The extended random set 1 is proposed to describe
a new weighting scheme to accurately weight topical the relationships between paragraphs and topics us-
ing the conditional probability function Pxy (z|dx py ) as overall, is more e↵ective in selecting relevant fea-
Z⇥[0,1]
1 : ⇥ ! 2 ; 1 (dx py )={(z1 , Pxy (z1 |dx py )), . . .}. tures than most, state-of-the-art, term-based, pattern-
Similarly 2 is also proposed to describe based, topic-based or even mix-based FS models. To
the relationship between topics and terms us- support these two hypotheses, we conducted experi-
ing the defined frequency function fi (w, zi ) as ments and evaluated their performance.
⌦⇥[0,+1)
2 : Z ! 2 ; 2 (zi )={(w1 , Pi (w1 |zi )), . . .}.
Lastly, 3 is also proposed to describe the
relationship between documents and terms us- 5.1 Dataset
ing the defined frequency function fj (w, dj ) as The first 50 collections of the standard Reuters Corpus
⌦⇥[0,+1)
3 : D ! 2 ; 3 (dj )={(w1 , fj (w1 , dj )), . . .} Volume 1 (RCV1) dataset is used in this research due
Based on the inverse mapping described above, to being assessed by domain experts at NIST [SR03]
we have 1 1 , 2 1 and 3 1 . 1
1
describes the in- for TREC2 in their filtering track. This number of col-
verse relationships between topics and paragraphs lections is sufficient and stable for better and reliable
using the probability function Pz (zi ) such that experiments [BV00]. RCV1 is collections of documents
1 1
1 (z)={dx py |z2 1 (dx py )} while 2 , on the other where each document is a news story in English pub-
hand, describes the inverse relationships between lished by Reuters.
terms and topics using fi (w, zi ) function such that
1 1
2 (w)={z|w2 2 (z)}. 3 describes the inverse
relationships between terms and documents using 5.2 Baseline models
fj (w, dj ) function such that 3 (w)={d|w 2 3 (d)}
We compared the performance of our model to eight
di↵erent baseline models. These models are cate-
4.2 Generalised Topic Weight
gorised into five groups based on the type of feature
To estimate the generalised topic weight in D, we need they use. The proposed model is trained only on rele-
to calculate the probability of each topic Pz (zi ) in vant documents and does not consider irrelevant ones.
each paragraph of document d and similarly for all Therefore, for fair comparison and judgement, we can
documents in D based on 1 1 in which we assume only select a baseline model that either unsupervised
P⇥ (dx py ) = N1 , where N is the total number of para- or does not require the use of irrelevant documents.
graphs as follows: We selected Okapi BM25 [RZ09], which is one of the
best term-based ranking algorithm. The phrase-based
P model n-Grams is selected. It represents user’s infor-
Pz (zi ) = (P⇥ (dx py ) ⇥ Pxy (zi |dx py )) mation needs as a set of phrases where n = 3 as it is the
1
dx py 2 (zi )
1
best value reported by Gao et al. (2015) [GXL15]. The
P Pattern Deploying based on Support (PDS) [ZLW12] is
= N1 Pxy (zi |dx py )
dx p y 2 1
(zi )
one of the pattern-based models. It can overcome the
1
(2) limitations of pattern frequency and usage. We se-
where Pxy (zi |dx py ) is estimated by LDA, dx py refers to lected the Latent Dirichlet Allocation (LDA) [BNJ03]
paragraph y in document x. 1 1 is a mapping function as the most widely used topic modelling algorithm.
defined previously. From the same group we also selected the Probabilis-
tic Latent Semantic Analysis (pLSA) [Hof01]; it is
similar to the LDA and can deal with the problem
4.3 Topical Word Weighting Scheme of polysemy. Three models were selected from the
To calculate the topical word weight at collection level, mix-based category. First, we selected the Pattern-
we simply substitute Pz (zi ) in Equation 1 by its value Based Topic Model (PBTM-FP) [GXL15] that incor-
from Equation 2. Equation 3 shows the substitution. porates topics and frequent patterns FP to obtain se-
mantically rich and discriminative representation for
IF. Secondly, the PBTM-FCP [GXL15], which is simi-
5 Evaluation lar to the PBTM-FP except it uses the frequent closed
To verify the proposed model, we designed two hy- pattern FCP instead. Lastly, we selected the Top-
potheses. First, our ERS model can e↵ectively gen- ical N-Grams (TNG) [WMW07] that integrates the
eralise the topic’s local weight that is estimated from topic model with phrases (n-grams) to discover top-
all documents paragraphs. The generalisation has led ical phrases that are more discriminative and inter-
to a more accurate term weighting scheme especially pretable.
when it is combined with the term frequency ratio
in both documents and topics. Second, our model, 2 http://trec.nist.gov/
" ✓ #
1 X 1 X ⇣ X ⌘◆
sr(w) = ⇥ fi (w, zi ) ⇥ Pxy (zi |dx py ) (3)
N 1
fj (w, dj ) 1 1
dj 2 3 (w) zi 2 2 (w) dx p y 2 1 (zi )
5.3 Evaluation Measures parameters to be set. For the LDA-based models, we
set the number of iterations for the Gibbs sampling
The e↵ectiveness of our model is measured based on
to 1000 and for the hyper-parameters to = 0.01
relevance judgements by five metrics that are well-
and ↵ = 50/V as they were justified in [SG07]. We
established and commonly used in the IR and IF com-
configured the number of iterations for the pLSA
munities. These metrics are the average precision
to be 1000 (default setting). For the experimental
of the top-20 ranked documents (top-20), break-even
parameters of the BM25, we set b = 0.75 and k1 = 1.2
point (b/p), mean average precision (MAP), F-score
as recommended by Manning et al. (2008) [MRS08].
(F1 ) measure, and 11-points interpolated average preci-
sion (IAP). For more details about these measures, the
5.6 Experimental Results
reader can refer to Manning et al (2008) [MRS08]. For
even better analysis of the experimental results, the Table 1 and figure 2 show the evaluation results of our
Wilcoxon signed-rank test (Wilcoxon T-test) [Wil45] model and the baselines. These results are the average
was used. Wilcoxon T-test is a statistical non- of the 50 collections of the RCV1. The results in Table
parametric hypothesis test used to compare and as- 1 have been categorised based on the type of feature
sess if the ranked means of two related samples di↵er used by the baseline model and the improvement%
or not. It is a better alternative to the student’s t-test, represents the percentage change in our model’s per-
especially when no normal distribution is assumed. formance compared to the best result of the baseline
model (marked in bold if there is more than one base-
5.4 Experimental Design line model in the category). We consider any improve-
For each collection, we train our model on all para- ment that is greater than 5% to be significant.
graphs of relevant documents D in the training part Table 1 shows that our model outperformed all
of the collection. We use LDA to extract ten topics baseline models for information filtering in all five mea-
because it is the best number for each collection as it sures. Regardless of the type of feature used by the
has reported in [GXL13, GXL14, GXL15]. Then, the baseline model, our model is significantly better on av-
proposed model scores documents’ terms, ranks them erage by a minimum improvement of 8.0% and 39.7%
and uses the top-k features as a query to an IF sys- maximum. Moreover, the 11-points result in figure 2
tem. The IF system uses unknown documents (from illustrates the superiority of the proposed model and
the testing part of the same collection) to decide their confirms the significant improvements that shown in
relevance to the user’s information needs (relevant or table 1.
irrelevant). However, specifying the value of k is exper-
Table 1: Evaluation results of our model in comparison
imental. The same process is also applied separately
with the baselines (grouped based on the type of fea-
to all baseline models. If the results of the IF sys-
ture used by the model) for all measures averaged over
tem returned by the five metrics are better than the
the first 50 document collections of the RCV1 dataset.
baseline results, then we can claim that our model is
significant and outperforms a baseline model. Model Top-20 b/p MAP F =1 IAP
The IF testing system uses the following equation our model 0.560 0.471 0.502 0.475 0.526
to rank the testing documents set: LDA 0.492 0.414 0.442 0.437 0.468
pLSA 0.423 0.386 0.379 0.392 0.404
( improvement% +13.9% +13.8% +13.7% +8.5% +12.3%
X t 2 d, x = weight(t) PDS 0.496 0.430 0.444 0.439 0.464
weight(d) = x, if (3) improvement% +12.9% +9.5% +13.2% +8.0% +13.4%
t2Q
t2/ d, x = 0
n-Gram 0.401 0.342 0.361 0.386 0.384
improvement% +39.7% +37.8% +39.1% +22.9% +37.1%
where weight(d) is the weight of document d. BM25 0.445 0.407 0.407 0.414 0.428
improvement% +25.8% +15.6% +23.5% +14.6% +22.9%
5.5 Experimental Settings PBTM-FCP 0.489 0.420 0.423 0.422 0.447
In our experiment, we use the MALLET PBTM-FP 0.470 0.402 0.427 0.423 0.449
TNG 0.447 0.360 0.372 0.386 0.394
toolkit [McC02] to implement all LDA-based models
improvement% +14.5% +12.1% +17.7% +12.2% +17.1%
except for the pLSA model where we used the Lemur
toolkit 3 instead. All topic-based models require some
Wilcoxon T-test results (Table 2) present the p-
3 https://www.lemurproject.org/ values of the results of our model compared to all base-
6 Conclusion
This paper presents an innovative and e↵ective topic-
based feature ranking model to enhance the semantic
of topical words to acquire user needs. The model ex-
tends random sets to generalise the LDA topic weight
at the document level. Then, a term weighting scheme
is developed to accurately rank topical terms based
on their frequent appearance in the LDA topics dis-
tributions and all relevant documents. The new cal-
culated weight e↵ectively reflects the relevance of a
term to user’s information needs and maintains the
same semantic meaning of terms across all relevant
documents. The proposed model is tested for IF on
the standard RCV1 dataset, TREC topics, five di↵er-
ent performance measurement metrics and eight state-
of-the-art baseline models. The experimental results
show that our model achieved significant performance
compared to all other baseline models.
References
Figure 2: 11-points result of our model in compari- [ALA13] Mubarak Albathan, Yuefeng Li, and Ab-
son with baselines averaged over the first 50 document dulmohsen Algarni. Enhanced N-Gram Ex-
collections of the RCV1 dataset. traction Using Relevance Feature Discov-
ery, pages 453–465. Springer International
line models on all performance measures. A model’s Publishing, Cham, 2013.
result is considered significantly di↵erent from other
[ALX14] Mubarak Albathan, Yuefeng Li, and Yue
model’s if the p-value is less than 0.05 [Wil45].Clearly,
Xu. Using extended random set to find spe-
the p-value for all metrics is largely less than 0.05 con-
cific patterns. In WI’14, volume 2, pages
firming that our model’s performance is significantly
30–37. IEEE, 2014.
di↵erent from all baselines. This shows that our model
gains substantial improvement compared to the used [AZ12] Charu C Aggarwal and ChengXiang Zhai.
baseline models. A survey of text clustering algorithms. In
Mining text data, pages 77–128. Springer,
Table 2: Wilcoxon T-test p-values of the baseline 2012.
models in comparison with our model’s.
[Ble12] David M Blei. Probabilistic topic models.
Model Top-20 b/p MAP F =1 IAP Communications of the ACM, 55(4):77–84,
LDA 0.004165 0.000179 7.00 ⇥ 10 6 8.96 ⇥ 10 6 6.71 ⇥ 10 6 2012.
pLSA 1.48 ⇥ 10 4 1.49 ⇥ 10 4 6.65 ⇥ 10 7 5.86 ⇥ 10 7 1.72 ⇥ 10 7
PDS 0.008575 0.003034 0.000194 0.000140 4.53 ⇥ 10 5
n-Gram 7.46 ⇥ 10 8 1.05 ⇥ 10 7 1.71 ⇥ 10 9 1.86 ⇥ 10 9 1.23 ⇥ 10 9
[BNJ03] David M Blei, Andrew Y Ng, and Michael I
BM25 0.000353 0.008264 0.000279 0.000117 5.68 ⇥ 10 5 Jordan. Latent dirichlet allocation. the
TNG 0.010360 0.000607 0.000180 0.000137 3.76 ⇥ 10 5 Journal of machine Learning research,
PBTM-FP 0.003442 7.19 ⇥ 10 4 0.000382 0.000235 5.81 ⇥ 10 5
PBTM-FCP 0.048010 0.033410 0.000306 0.000289 0.000180
3:993–1022, 2003.
[BV00] Chris Buckley and Ellen M Voorhees. Eval-
Based on the results presented earlier, we are confi- uating evaluation measure stability. In SI-
dent in claiming that our extended random sets model GIR’00, pages 33–40. ACM, 2000.
can e↵ectively generalise the local topic weight at
the document level in the LDA term scoring function [DDF+ 90] Scott Deerwester, Susan T Dumais,
and, thus, provide a more globally representative term George W Furnas, Thomas K Landauer,
weight when it combined the term frequency in doc- and Richard Harshman. Indexing by latent
ument and topics. Also, our model is more e↵ective semantic analysis. Journal of the American
in selecting relevant features to acquire user’s informa- society for information science, 41(6):391,
tion needs that represented by a set of long documents. 1990.
[GXL13] Yang Gao, Yue Xu, and Yuefeng Li. [Mol06] Ilya Molchanov. Theory of random sets.
Pattern-based topic models for informa- Springer Science & Business Media, 2006.
tion filtering. In ICDM’13, pages 921–928.
IEEE, 2013. [MRS08] Christopher D Manning, Prabhakar
Raghavan, and Hinrich Schütze. Introduc-
[GXL14] Yang Gao, Yue Xu, and Yuefeng Li. Topi- tion to information retrieval. Cambridge
cal pattern based document modelling and University Press, 2008.
relevance ranking. In WISE’14, pages 186–
201. Springer, 2014. [Ngu08] Hung T Nguyen. Random sets. Scholarpe-
dia, 3(7):3383, 2008.
[GXL15] Yang Gao, Yue Xu, and Yuefeng Li.
Pattern-based topics for document mod- [RZ09] Stephen Robertson and Hugo Zaragoza.
elling in information filtering. IEEE The probabilistic relevance framework:
TKDE, 27(6):1629–1642, 2015. BM25 and beyond. Now Publishers Inc,
2009.
[Hof01] Thomas Hofmann. Unsupervised learning
by probabilistic latent semantic analysis. [SG07] Mark Steyvers and Tom Griffiths. Prob-
Machine learning, 42(1-2):177–196, 2001. abilistic topic models. Handbook of latent
semantic analysis, 427(7):424–440, 2007.
[KSH12] Rudolf Kruse, Erhard Schwecke, and
Jochen Heinsohn. Uncertainty and vague- [SR03] Ian Soboro↵ and Stephen Robertson.
ness in knowledge based systems: numeri- Building a filtering test collection for trec
cal methods. Springer Science & Business 2002. In SIGIR’03, pages 243–250. ACM,
Media, 2012. 2003.
[LAA+ 15] Yuefeng Li, Abdulmohsen Algarni, [TG09] Serafettin Tasci and Tunga Gungor. Lda-
Mubarak Albathan, Yan Shen, and based keyword selection in text categoriza-
Moch Arif Bijaksana. Relevance feature tion. In ISCIS’09, pages 230–235. IEEE,
discovery for text mining. IEEE TKDE, 2009.
27(6):1656–1669, 2015.
[WC06] Xing Wei and W Bruce Croft. Lda-based
[LAZ10] Yuefeng Li, Abdulmohsen Algarni, and document models for ad-hoc retrieval. In
Ning Zhong. Mining positive and negative SIGIR’06, pages 178–185. ACM, 2006.
patterns for relevance feature discovery. In
KDD’10, pages 753–762. ACM, 2010. [Wil45] Frank Wilcoxon. Individual comparisons
by ranking methods. Biometrics bulletin,
[Li03] Yuefeng Li. Extended random sets for
1(6):80–83, 1945.
knowledge discovery in information sys-
tems. In RSFDGrC’03, pages 524–532. [WMW07] Xuerui Wang, Andrew McCallum, and
Springer, 2003. Xing Wei. Topical n-grams: Phrase and
topic discovery, with an application to in-
[LTSL09] Man Lan, Chew Lim Tan, Jian Su, and
formation retrieval. In ICDM’07, pages
Yue Lu. Supervised and traditional term
697–702. IEEE, 2007.
weighting methods for automatic text cat-
egorization. IEEE TPAMI, 31(4):721–735, [ZLW12] Ning Zhong, Yuefeng Li, and Sheng-Tang
2009. Wu. E↵ective pattern discovery for text
[MC13] K Tamsin Maxwell and W Bruce Croft. mining. IEEE TKDE, 24(1):30–44, 2012.
Compact query term selection using topi- [ZPH08] Zhiwei Zhang, Xuan-Hieu Phan, and
cally related text. In SIGIR’13, pages 583– Susumu Horiguchi. An efficient feature se-
592. ACM, 2013. lection using hidden topic in text catego-
[McC02] Andrew Kachites McCallum. Mallet: rization. In AINAW’08, pages 1223–1228.
A machine learning for language toolkit. IEEE, 2008.
2002.
[MO10] Craig Macdonald and Iadh Ounis. Global
statistics in proximity weighting models. In
Web N-gram Workshop, page 30. Citeseer,
2010.