<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Enhancing Topical Word Semantic for Relevance Feature Selection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>1School of Electrical Engineering and Computer Science</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>2Department of Computer Science</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Queensland University of Technology</institution>
          ,
          <addr-line>Brisbane</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Umm Al-Qura University</institution>
          ,
          <addr-line>Makkah</addr-line>
          ,
          <country country="SA">Saudi Arabia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p />
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Unsupervised topic models, such as Latent
Dirichlet Allocation (LDA), are widely used
as automated feature engineering tools for
textual data. They model words semantics based
on some latent topics on the basis that
semantically related words occur in similar
documents. However, words weights that are
assigned by these topic models do not represent
the semantic meaning of these words to user
information needs. In this paper, we present
an innovative and e↵ective extended random
sets (ERS) model to enhance the semantic of
topical words. The proposed model is used as
a word weighting scheme for relevance feature
selection (FS). It accurately weights words
based on their appearance in the LDA latent
topics and the relevant documents. The
experimental results, based on 50 collections of
the standard RCV1 dataset and TREC topics
for information filtering, show that the
proposed model significantly outperforms eight,
state-of-the-art, baseline models in five
standard performance measures.</p>
      <p>Copyright c by the paper’s authors. Copying permitted for
private and academic purposes.</p>
      <p>InI:n: PAr.ocEeeddiitnogrs, oBf .IJCAoeIdWitoorrks(heodpso.)n: SePmroacnetiecdMinagcshionfe LtheaernXinYgZ
Worksho(pS,MLLo2c0a1ti7o)n,A,uCgo1u9n-t2r5y,2D01D7,-MMelbMou-YrnYe,YAYus,trpaulibal.ished at
http://ceur-ws.org
1</p>
    </sec>
    <sec id="sec-2">
      <title>Introduction</title>
      <p>LDA [BNJ03] is currently the most common
probabilistic topic model compared to similar
models, such as probabilistic Latent Semantic
Analysis (pLSA) [Hof01], with a wide range of
applications [Ble12]. LDA statistically discovers hidden
topics from documents as features to be used for di↵erent
tasks in information retrieval (IR) [WC06, WMW07],
information filtering (IF) [GXL15] and for many other
text mining and machine learning applications. LDA
represents documents by a set of topics, and each topic
is a set of semantically related terms1. Thus, it is
capable of clustering related words in a document
collection, which can reduce the impact of common
problems like polysemy, synonymy and information
overload [AZ12].</p>
      <p>The core and critical part of any text FS method
is the weighting function. It assigns a numerical value
(usually a real number) to each feature, which specifies
how informative the feature is to the user’s information
needs [ALA13]. In the context of probabilistic topic
modelling in general and LDA specifically,
calculating a term weight is done locally at its document-level
based on two components; the term local
documenttopics distributions and the global term-topics
assignment. Therefore, in a set of similar documents, a
specific term might receive a di↵erent weight in each single
document even though this term is semantically
identical across all these documents. Such approach does
not accurately reflect on the semantic meaning and
usefulness of this term to the entire user’s information
needs. It badly influences the performance of LDA
1In this paper, terms, words, keywords or unigrams are used
interchangeably.
for FS as it is uncertain and dicult to know which
weight is more representative and should be assigned
to the intended term. Would it be the average weight?
The highest? The lowest? The aggregated? Several
experiments in various studies confirm that the
localglobal weighting approach of the LDA is ine↵ective for
relevant FS [GXL15].</p>
      <p>Given a document set that describes user
information needs, global statistics, such as document
frequency (df), reveal the discriminatory power of
terms [LTSL09]. However, in IR, selecting terms based
on global weighting schemes did not show better
retrieval performance [MO10], because global statistics
cannot describe the local importance of terms [MC13].
From the LDA’s perspective, it is challenging and
still uncertain on how to use LDA’s local-global term
weighting function in a global context due to the
complex relationships between terms and many entities
that represent the entire collection. A term, for
example, might appear in multiple documents and LDA
topics, and each topic may also cover many documents
or paragraphs that contain the same term. Therefore,
the hard question this research tries to answer is: how
to generalise the local topic weight (at document level)
and combine it with global topical statistics such as the
term frequency in both topics and relevant documents
for more discriminative and semantically
representative global term weighting scheme?</p>
      <p>The aim of this research is to develop an e↵ective
topic-based FS model for relevance discovery. The
model uses a hierarchical framework based on ERS
theory to assign a more representative weight to terms
based on their appearance in LDA topics and all
relevant documents. Therefore, two major contributions
have been made in this paper to the fields of text FS
and IF: (a) A new theoretical model based on multiple
ERS [Mol06] to represent and interpret the complex
relationships between long documents, their paragraphs,
LDA topics and all terms in the collection, where a
function describes each relationship; (b) A new and
e↵ective term weighting formula that assigns a more
discriminately accurate weight to topical terms that
represent their relevance to the user information needs.
The formula generalises LDA’s local topic weight to a
global one using the proposed ERS theory and then
combines it with the frequency ratio of words in both
documents and topics to answer the question asked by
the authors. To test the e↵ectiveness of our model,
we conducted extensive experiments on RCV1 dataset
and the assessors’ relevance judgements of the TREC
filtering track. The results show that our model
significantly outperforms all used baseline FS models for
IF despite the type of text features they use (terms,
phrases, patterns, topics or even a di↵erent
combination of them).</p>
    </sec>
    <sec id="sec-3">
      <title>Related Works</title>
      <p>In the literature, there is a significant amount of work
that extends and improves LDA to suit di↵erent needs
including text FS [ZPH08, TG09]. However, our model
is intended for IF, and, to the best of our knowledge, it
is the first attempt to extend random sets [Mol06] to
functionally describe and interpret complex
relationships that involve topical terms and other entities in a
document collection to enhance the semantic of
topical words for relevance FS. Relevance is a fundamental
concept in both IR and IF. IR mainly concerns about
document’s relevance to a query for a specific subject.
However, IF discusses the document’s relevance to user
information needs [LAZ10]. In relevance discovery,
FS is a method that selects a subset of features that
are relevant to user’s needs and thus removing those
that are irrelevant, redundant and noisy. Existing
methods adopt di↵erent type of text features such as
terms [LTSL09], phrases (n-grams) [ALA13], patterns
(a pattern is a set of associated terms) [LAA+15],
topics [DDF+90, Hof01, BNJ03] or a combination of them
for better performance [WMW07, LAZ10, GXL15].</p>
      <p>The most ecient FS methods for relevance, are the
ones that are developed based on weighting function,
which is the core and critical part of the selection
algorithm [LAA+15]. Using LDA words weighting
function for relevance is still limited and does not show
encouraging results [GXL15] including similar
topicbased models such as the pLSA [Hof01]. For
better performance, Gao et al (2015) [GXL15] integrate
pattern mining techniques into topic models to
discover discriminative features. Such work is expensive
and susceptible to the features-loss problem and also
might be impacted by the uncertainty of the
probabilistic topic model. ERS is proven to be e↵ective
in describing complex relations between di↵erent
entities and interprets them as a function (weighting
function) [Li03]. Thus, the ERS-based models can be used
to weight closed sequential patterns more accurately
and thus facilitate the discovery of specific ones as
appears in [ALX14]. However, selecting the most useful
patterns is challenging due to a large number of
patterns generated from relevant documents using various
minimum supports (min sup), and also may lead to
feature-loss.
3</p>
    </sec>
    <sec id="sec-4">
      <title>Background Overview</title>
      <p>For a given corpus C, the relevant long documents set
D✓ C represents user’s information needs that might
have multiple subjects. The proposed model uses D
for training where each document dx2 D has a set of
paragraphs P S and each paragraph has a set of terms
T . ⇥ is the set of all paragraphs in D and P S✓ ⇥. A
set of terms ⌦ is the set of all unique words in D.
The proposed model uses LDA to reduce the
dimensionality of D to a set of manageable topics Z, where
V is the number of topics. LDA assumes that each
document has multiple latent topics [GXL15], and
defines each topic zj 2 Z as a multinomial probability
distribution over all words in ⌦ as p(wi|zj ) in which
wi2 ⌦ and 1  j V such that P|⌦ | p(wi|zj )=1. LDA
i
also represents a document d as a probabilistic
mixture of topics as p(zj |d). As a result, and based on
the number of latent topics, the probability (local
weight) of word wi in document d can be calculated as
p(wi|d)= PV</p>
      <p>j=1 p(wi|zj )⇥ p(zj |d) . Finally, all hidden
variables, p(wi|zj ) and p(zj |d), are statistically
estimated by the Gibbs sampling algorithm [SG07].
3.2</p>
      <sec id="sec-4-1">
        <title>Random Set</title>
        <p>A random set is a random object that has
values, which are subsets that are taken from some
space [Mol06]. It works as an e↵ective measure
of uncertainty in imprecise data for decision
analysis [Ngu08]. For example, let Z and ⌦ be finite sets
that represent topics and words respectively. is a
set-valued mapping from Z (the evidence space) onto
⌦ that can be written as : Z ! 2⌦ , and P is a
probability function defined on Z, thus the pair (P, )
is called a random set [KSH12]. can be extended
as ⇠ :: Z ! 2⌦ ⇥ [0,1] (also called an extended
setvalued mapping), which satisfies P(w,p)2 ⇠ (z) p=1 for
each z2 Z. Let P be a probability function on Z, such
that Pz2 Z P (z)=1. We call (⇠, P ) an extended
random set.
4</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>The Proposed Model</title>
      <p>The proposed model (Figure 1) deals with the local
weight problem of terms that is assigned by the LDA
probability function (described in section 3.1) by
exploring all possible relationships between di↵erent
entities that influence the weighting process. The
targeting entities in our model are documents, paragraphs,
topics, and terms. The possible relationships between
these entities are complex (a set of one-to-many
relationships). For example, a document can have many
paragraphs and terms; a paragraph can have multiple
topics; a topic can have many terms. Inversely, a topic
can cover many paragraphs, and a term can appear in
many documents and topics.</p>
      <p>In this model, we proposed three ERSs to describe
such complex relationships, where each ERS can be
interpreted as a function by which we can determine
the importance of the main entity in the relationship.
Then, the proposed ERS theory is used to develop
a new weighting scheme to accurately weight topical
words by generalising the topic’s local weight, and,
then, combine it with the frequency ratio of words in
both documents and topics.
Let assume we have a set of
topics Z={z1, z2, z3, . . . , zV } in ⇥ and let
D= {d1, d2, d3, . . . , dN } is a set of N relevant
long documents. Each document dx consists of
M paragraphs such as dx= {p1, p2, p3, . . . , pM }. A
paragraph py consists of a set of L words, for example,
py= {w1, w2, w3, . . . , wL}. A word w is a keyword or
unigram, where the function words(p) returns a set
of words appear in paragraph p. A topic z can be
defined as a probability distribution over the set of
words ⌦ where words(p)✓ ⌦ for every paragraph p2 ⇥.</p>
      <p>For each zi2 Z, let fi(w, zi) be a frequency
function on ⌦, such that ( zi)={w|w2 ⌦ , fi(w, zi) 0}
while the inverse mapping of is defined as
1 : ⌦ ! 2Z ; 1(w)={z2 Z|w2 ( z)}. Also, for
each dj 2 D, let fj (w, dj ) be a frequency function on
⌦, such that ( dj ) = {w|w2 ⌦ , fj (w, dj )&gt;0} while
the inverse mapping of is defined as 1 : ⌦ !
2D; 1(w)={d2 D|w2 ( d)}. These extended
setvalued mappings can decide a weighting function on
⌦, which satisfies sr :: ⌦ ! [0, +1 ) such that
ing the conditional probability function Pxy(z|dxpy) as
1 : ⇥ ! 2Z⇥ [0,1]; 1(dxpy)={(z1, Pxy(z1|dxpy)), . . .}.</p>
      <p>Similarly 2 is also proposed to describe
the relationship between topics and terms
using the defined frequency function fi(w, zi) as
2 : Z ! 2⌦ ⇥ [0,+1 ); 2(zi)={(w1, Pi(w1|zi)), . . .}.</p>
      <p>Lastly, 3 is also proposed to describe the
relationship between documents and terms
using the defined frequency function fj (w, dj ) as
3 : D ! 2⌦ ⇥ [0,+1 ); 3(dj )={(w1, fj (w1, dj )), . . .}</p>
      <p>Based on the inverse mapping described above,
we have 1 1, 2 1 and 3 1. 1 1 describes the
inverse relationships between topics and paragraphs
using the probability function Pz(zi) such that
1 1(z)={dxpy|z2 1(dxpy)} while 2 1, on the other
hand, describes the inverse relationships between
terms and topics using fi(w, zi) function such that
2 1(w)={z|w2 2(z)}. 3 1 describes the inverse
relationships between terms and documents using
fj (w, dj ) function such that 3(w)={d|w 2 3(d)}
To estimate the generalised topic weight in D, we need
to calculate the probability of each topic Pz(zi) in
each paragraph of document d and similarly for all
documents in D based on 1 1 in which we assume
P⇥ (dxpy) = N1 , where N is the total number of
paragraphs as follows:</p>
      <p>Pz(zi) =</p>
      <p>P
To verify the proposed model, we designed two
hypotheses. First, our ERS model can e↵ectively
generalise the topic’s local weight that is estimated from
all documents paragraphs. The generalisation has led
to a more accurate term weighting scheme especially
when it is combined with the term frequency ratio
in both documents and topics. Second, our model,
overall, is more e↵ective in selecting relevant
features than most, state-of-the-art, term-based,
patternbased, topic-based or even mix-based FS models. To
support these two hypotheses, we conducted
experiments and evaluated their performance.
5.1</p>
      <sec id="sec-5-1">
        <title>Dataset</title>
        <p>The first 50 collections of the standard Reuters Corpus
Volume 1 (RCV1) dataset is used in this research due
to being assessed by domain experts at NIST [SR03]
for TREC2 in their filtering track. This number of
collections is sucient and stable for better and reliable
experiments [BV00]. RCV1 is collections of documents
where each document is a news story in English
published by Reuters.
5.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Baseline models</title>
        <p>We compared the performance of our model to eight
di↵erent baseline models. These models are
categorised into five groups based on the type of feature
they use. The proposed model is trained only on
relevant documents and does not consider irrelevant ones.
Therefore, for fair comparison and judgement, we can
only select a baseline model that either unsupervised
or does not require the use of irrelevant documents.</p>
        <p>We selected Okapi BM25 [RZ09], which is one of the
best term-based ranking algorithm. The phrase-based
model n-Grams is selected. It represents user’s
information needs as a set of phrases where n = 3 as it is the
best value reported by Gao et al. (2015) [GXL15]. The
Pattern Deploying based on Support (PDS) [ZLW12] is
one of the pattern-based models. It can overcome the
limitations of pattern frequency and usage. We
selected the Latent Dirichlet Allocation (LDA) [BNJ03]
as the most widely used topic modelling algorithm.
From the same group we also selected the
Probabilistic Latent Semantic Analysis (pLSA) [Hof01]; it is
similar to the LDA and can deal with the problem
of polysemy. Three models were selected from the
mix-based category. First, we selected the
PatternBased Topic Model (PBTM-FP) [GXL15] that
incorporates topics and frequent patterns FP to obtain
semantically rich and discriminative representation for
IF. Secondly, the PBTM-FCP [GXL15], which is
similar to the PBTM-FP except it uses the frequent closed
pattern FCP instead. Lastly, we selected the
Topical N-Grams (TNG) [WMW07] that integrates the
topic model with phrases (n-grams) to discover
topical phrases that are more discriminative and
interpretable.</p>
        <p>2http://trec.nist.gov/</p>
        <p>
          ⌘◆#
The e↵ectiveness of our model is measured based on
relevance judgements by five metrics that are
wellestablished and commonly used in the IR and IF
communities. These metrics are the average precision
of the top-20 ranked documents (top-20), break-even
point (b/p), mean average precision (MAP), F-score
(F1) measure, and 11-points interpolated average
precision (IAP). For more details about these measures, the
reader can refer to Manning et al (
          <xref ref-type="bibr" rid="ref28">2008</xref>
          ) [MRS08]. For
even better analysis of the experimental results, the
Wilcoxon signed-rank test (Wilcoxon T-test) [Wil45]
was used. Wilcoxon T-test is a statistical
nonparametric hypothesis test used to compare and
assess if the ranked means of two related samples di↵er
or not. It is a better alternative to the student’s t-test,
especially when no normal distribution is assumed.
For each collection, we train our model on all
paragraphs of relevant documents D in the training part
of the collection. We use LDA to extract ten topics
because it is the best number for each collection as it
has reported in [GXL13, GXL14, GXL15]. Then, the
proposed model scores documents’ terms, ranks them
and uses the top-k features as a query to an IF
system. The IF system uses unknown documents (from
the testing part of the same collection) to decide their
relevance to the user’s information needs (relevant or
irrelevant). However, specifying the value of k is
experimental. The same process is also applied separately
to all baseline models. If the results of the IF
system returned by the five metrics are better than the
baseline results, then we can claim that our model is
significant and outperforms a baseline model.
        </p>
        <p>The IF testing system uses the following equation
to rank the testing documents set:</p>
        <p>t2 Q
weight(d) = X x, if (t 2 d, x = weight(t)
t 2 / d, x = 0
where weight(d) is the weight of document d.
(3)
5.5</p>
      </sec>
      <sec id="sec-5-3">
        <title>Experimental Settings</title>
        <p>
          In our experiment, we use the MALLET
toolkit [McC02] to implement all LDA-based models
except for the pLSA model where we used the Lemur
toolkit 3 instead. All topic-based models require some
3https://www.lemurproject.org/
parameters to be set. For the LDA-based models, we
set the number of iterations for the Gibbs sampling
to 1000 and for the hyper-parameters to = 0.01
and ↵ = 50/V as they were justified in [SG07]. We
configured the number of iterations for the pLSA
to be 1000 (default setting). For the experimental
parameters of the BM25, we set b = 0.75 and k1 = 1.2
as recommended by Manning et al. (
          <xref ref-type="bibr" rid="ref28">2008</xref>
          ) [MRS08].
5.6
        </p>
      </sec>
      <sec id="sec-5-4">
        <title>Experimental Results</title>
        <p>Table 1 and figure 2 show the evaluation results of our
model and the baselines. These results are the average
of the 50 collections of the RCV1. The results in Table
1 have been categorised based on the type of feature
used by the baseline model and the improvement%
represents the percentage change in our model’s
performance compared to the best result of the baseline
model (marked in bold if there is more than one
baseline model in the category). We consider any
improvement that is greater than 5% to be significant.</p>
        <p>Table 1 shows that our model outperformed all
baseline models for information filtering in all five
measures. Regardless of the type of feature used by the
baseline model, our model is significantly better on
average by a minimum improvement of 8.0% and 39.7%
maximum. Moreover, the 11-points result in figure 2
illustrates the superiority of the proposed model and
confirms the significant improvements that shown in
table 1.
Wilcoxon T-test results (Table 2) present the
pvalues of the results of our model compared to all
base</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>This paper presents an innovative and e↵ective
topicbased feature ranking model to enhance the semantic
of topical words to acquire user needs. The model
extends random sets to generalise the LDA topic weight
at the document level. Then, a term weighting scheme
is developed to accurately rank topical terms based
on their frequent appearance in the LDA topics
distributions and all relevant documents. The new
calculated weight e↵ectively reflects the relevance of a
term to user’s information needs and maintains the
same semantic meaning of terms across all relevant
documents. The proposed model is tested for IF on
the standard RCV1 dataset, TREC topics, five
di↵erent performance measurement metrics and eight
stateof-the-art baseline models. The experimental results
show that our model achieved significant performance
compared to all other baseline models.
[ALA13]
line models on all performance measures. A model’s
result is considered significantly di↵erent from other
model’s if the p-value is less than 0.05 [Wil45].Clearly,
the p-value for all metrics is largely less than 0.05
confirming that our model’s performance is significantly
di↵erent from all baselines. This shows that our model
gains substantial improvement compared to the used
baseline models.
Based on the results presented earlier, we are
confident in claiming that our extended random sets model
can e↵ectively generalise the local topic weight at
the document level in the LDA term scoring function
and, thus, provide a more globally representative term
weight when it combined the term frequency in
document and topics. Also, our model is more e↵ective
in selecting relevant features to acquire user’s
information needs that represented by a set of long documents.
[DDF+90] Scott Deerwester, Susan T Dumais,
George W Furnas, Thomas K Landauer,
and Richard Harshman. Indexing by latent
semantic analysis. Journal of the American
society for information science, 41(6):391,
1990.
[ALX14]
[AZ12]
[Ble12]
[BNJ03]
[BV00]</p>
      <sec id="sec-6-1">
        <title>Mubarak Albathan, Yuefeng Li, and Ab</title>
        <p>dulmohsen Algarni. Enhanced N-Gram
Extraction Using Relevance Feature
Discovery, pages 453–465. Springer International
Publishing, Cham, 2013.</p>
      </sec>
      <sec id="sec-6-2">
        <title>Mubarak Albathan, Yuefeng Li, and Yue Xu. Using extended random set to find specific patterns. In WI’14, volume 2, pages 30–37. IEEE, 2014.</title>
      </sec>
      <sec id="sec-6-3">
        <title>Charu C Aggarwal and ChengXiang Zhai. A survey of text clustering algorithms. In Mining text data, pages 77–128. Springer, 2012.</title>
      </sec>
      <sec id="sec-6-4">
        <title>David M Blei. Probabilistic topic models.</title>
        <p>Communications of the ACM, 55(4):77–84,
2012.</p>
      </sec>
      <sec id="sec-6-5">
        <title>David M Blei, Andrew Y Ng, and Michael I Jordan. Latent dirichlet allocation. the</title>
        <p>Journal of machine Learning research,
3:993–1022, 2003.</p>
      </sec>
      <sec id="sec-6-6">
        <title>Chris Buckley and Ellen M Voorhees. Evaluating evaluation measure stability. In SIGIR’00, pages 33–40. ACM, 2000. [GXL13]</title>
        <p>[GXL14]
[Hof01]
[KSH12]
[LAZ10]
[Li03]
[LTSL09]
[MC13]
[McC02]
[MO10]</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>[GXL15] Yang</surname>
            <given-names>Gao</given-names>
          </string-name>
          , Yue Xu, and
          <string-name>
            <given-names>Yuefeng</given-names>
            <surname>Li</surname>
          </string-name>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>Pattern-based topic models for information filtering</article-title>
          .
          <source>In ICDM'13</source>
          , pages
          <fpage>921</fpage>
          -
          <lpage>928</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Yang</surname>
            <given-names>Gao</given-names>
          </string-name>
          , Yue Xu, and
          <string-name>
            <given-names>Yuefeng</given-names>
            <surname>Li</surname>
          </string-name>
          .
          <article-title>Topical pattern based document modelling and relevance ranking</article-title>
          .
          <source>In WISE'14</source>
          , pages
          <fpage>186</fpage>
          -
          <lpage>201</lpage>
          . Springer,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <article-title>Pattern-based topics for document modelling in information filtering</article-title>
          .
          <source>IEEE TKDE</source>
          ,
          <volume>27</volume>
          (
          <issue>6</issue>
          ):
          <fpage>1629</fpage>
          -
          <lpage>1642</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>Machine learning</source>
          ,
          <volume>42</volume>
          (
          <issue>1-2</issue>
          ):
          <fpage>177</fpage>
          -
          <lpage>196</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Rudolf</given-names>
            <surname>Kruse</surname>
          </string-name>
          , Erhard Schwecke, and
          <string-name>
            <given-names>Jochen</given-names>
            <surname>Heinsohn</surname>
          </string-name>
          .
          <article-title>Uncertainty and vagueness in knowledge based systems: numerical methods</article-title>
          .
          <source>Springer Science &amp; Business Media</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Yuefeng</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Abdulmohsen</given-names>
            <surname>Algarni</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Ning</given-names>
            <surname>Zhong</surname>
          </string-name>
          .
          <article-title>Mining positive and negative patterns for relevance feature discovery</article-title>
          .
          <source>In KDD'10</source>
          , pages
          <fpage>753</fpage>
          -
          <lpage>762</lpage>
          . ACM,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Yuefeng</given-names>
            <surname>Li</surname>
          </string-name>
          .
          <article-title>Extended random sets for knowledge discovery in information systems</article-title>
          .
          <source>In RSFDGrC'03</source>
          , pages
          <fpage>524</fpage>
          -
          <lpage>532</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Springer</surname>
          </string-name>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Man</given-names>
            <surname>Lan</surname>
          </string-name>
          , Chew Lim Tan,
          <string-name>
            <given-names>Jian</given-names>
            <surname>Su</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Yue</given-names>
            <surname>Lu</surname>
          </string-name>
          .
          <article-title>Supervised and traditional term weighting methods for automatic text categorization</article-title>
          .
          <source>IEEE TPAMI</source>
          ,
          <volume>31</volume>
          (
          <issue>4</issue>
          ):
          <fpage>721</fpage>
          -
          <lpage>735</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <article-title>Compact query term selection using topically related text</article-title>
          .
          <source>In SIGIR'13</source>
          , pages
          <fpage>583</fpage>
          -
          <lpage>592</lpage>
          . ACM,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Craig</given-names>
            <surname>Macdonald</surname>
          </string-name>
          and
          <string-name>
            <given-names>Iadh</given-names>
            <surname>Ounis</surname>
          </string-name>
          .
          <article-title>Global statistics in proximity weighting models</article-title>
          .
          <source>In Web N-gram Workshop</source>
          , page 30.
          <string-name>
            <surname>Citeseer</surname>
          </string-name>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [LAA+15]
          <string-name>
            <given-names>Yuefeng</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Abdulmohsen</given-names>
            <surname>Algarni</surname>
          </string-name>
          , Mubarak Albathan,
          <string-name>
            <given-names>Yan</given-names>
            <surname>Shen</surname>
          </string-name>
          , and Moch Arif Bijaksana.
          <article-title>Relevance feature discovery for text mining</article-title>
          .
          <source>IEEE TKDE</source>
          ,
          <volume>27</volume>
          (
          <issue>6</issue>
          ):
          <fpage>1656</fpage>
          -
          <lpage>1669</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <source>[MRS08] [Ngu08] [RZ09] [SG07] [SR03] [TG09] [WC06] [Wil45] Ilya Molchanov. Theory of random sets.</source>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>Springer</surname>
            <given-names>Science &amp; Business</given-names>
          </string-name>
          <string-name>
            <surname>Media</surname>
          </string-name>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>Christopher D Manning</surname>
            ,
            <given-names>Prabhakar</given-names>
          </string-name>
          <string-name>
            <surname>Raghavan</surname>
          </string-name>
          , and Hinrich Schu¨tze. Introduction to information retrieval. Cambridge University Press,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <surname>Hung T Nguyen</surname>
          </string-name>
          .
          <article-title>Random sets</article-title>
          .
          <source>Scholarpedia</source>
          ,
          <volume>3</volume>
          (
          <issue>7</issue>
          ):
          <fpage>3383</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <article-title>The probabilistic relevance framework: BM25 and beyond</article-title>
          . Now Publishers Inc,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <given-names>Mark</given-names>
            <surname>Steyvers</surname>
          </string-name>
          and
          <string-name>
            <given-names>Tom</given-names>
            <surname>Griths</surname>
          </string-name>
          .
          <article-title>Probabilistic topic models</article-title>
          .
          <source>Handbook of latent semantic analysis</source>
          ,
          <volume>427</volume>
          (
          <issue>7</issue>
          ):
          <fpage>424</fpage>
          -
          <lpage>440</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <article-title>Building a filtering test collection for trec 2002</article-title>
          . In SIGIR'
          <volume>03</volume>
          , pages
          <fpage>243</fpage>
          -
          <lpage>250</lpage>
          . ACM,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <given-names>Serafettin</given-names>
            <surname>Tasci</surname>
          </string-name>
          and
          <string-name>
            <given-names>Tunga</given-names>
            <surname>Gungor</surname>
          </string-name>
          .
          <article-title>Ldabased keyword selection in text categorization</article-title>
          .
          <source>In ISCIS'09</source>
          , pages
          <fpage>230</fpage>
          -
          <lpage>235</lpage>
          . IEEE,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <given-names>Xing</given-names>
            <surname>Wei</surname>
          </string-name>
          and
          <string-name>
            <given-names>W Bruce</given-names>
            <surname>Croft</surname>
          </string-name>
          .
          <article-title>Lda-based document models for ad-hoc retrieval</article-title>
          .
          <source>In SIGIR'06</source>
          , pages
          <fpage>178</fpage>
          -
          <lpage>185</lpage>
          . ACM,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <given-names>Frank</given-names>
            <surname>Wilcoxon</surname>
          </string-name>
          .
          <article-title>Individual comparisons by ranking methods</article-title>
          .
          <source>Biometrics bulletin</source>
          ,
          <volume>1</volume>
          (
          <issue>6</issue>
          ):
          <fpage>80</fpage>
          -
          <lpage>83</lpage>
          ,
          <year>1945</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <surname>[WMW07] Xuerui</surname>
            <given-names>Wang</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Andrew McCallum</surname>
            ,
            <given-names>and Xing</given-names>
          </string-name>
          <string-name>
            <surname>Wei</surname>
          </string-name>
          .
          <article-title>Topical n-grams: Phrase and topic discovery, with an application to information retrieval</article-title>
          .
          <source>In ICDM'07</source>
          , pages
          <fpage>697</fpage>
          -
          <lpage>702</lpage>
          . IEEE,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [ZLW12] [ZPH08]
          <string-name>
            <given-names>Ning</given-names>
            <surname>Zhong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Yuefeng</given-names>
            <surname>Li</surname>
          </string-name>
          , and
          <string-name>
            <surname>Sheng-Tang Wu</surname>
          </string-name>
          . E↵
          <article-title>ective pattern discovery for text mining</article-title>
          .
          <source>IEEE TKDE</source>
          ,
          <volume>24</volume>
          (
          <issue>1</issue>
          ):
          <fpage>30</fpage>
          -
          <lpage>44</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <string-name>
            <given-names>Zhiwei</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <surname>Xuan-Hieu Phan</surname>
            , and
            <given-names>Susumu</given-names>
          </string-name>
          <string-name>
            <surname>Horiguchi</surname>
          </string-name>
          .
          <article-title>An ecient feature selection using hidden topic in text categorization</article-title>
          .
          <source>In AINAW'08</source>
          , pages
          <fpage>1223</fpage>
          -
          <lpage>1228</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>