=Paper= {{Paper |id=Vol-1391/73-CR |storemode=property |title=Task 2a: Team KU-CS: Query Coherence Analysis for PRF and Genomics Expansion |pdfUrl=https://ceur-ws.org/Vol-1391/73-CR.pdf |volume=Vol-1391 |dblpUrl=https://dblp.org/rec/conf/clef/ThesprasithJ15 }} ==Task 2a: Team KU-CS: Query Coherence Analysis for PRF and Genomics Expansion== https://ceur-ws.org/Vol-1391/73-CR.pdf

Task 2a: Team KU-CS: Query Coherence
Analysis for PRF and Genomics Expansion

Ornuma Thesprasith and Chuleerat Jaruskulchai

Department of Computer Science, Faculty of Science,
Kasetsart University, Thailand
ornuma.thesprasith@gmail.com and fscichj@ku.ac.th

Abstract. Laypeople who are not medical expert may formulate short
query using words from their discharge summaries or long query that ex-
plain their health conditions. The different query styles should be treated
with different query expansion mechanisms. This work is an adaptive
query expansion based on the coherence among query terms. To pro-
vide users with more readability documents, the document complexity
is analiyzed on the word lenght and is used as a re-ranking method. The
baseline retrieves using Lucene v.4.6 with default configuration. Overall
retrieval performance of the baseline is better than the adaptive query
expansion (P@10, MAP, and NDCG). Since the complexity of document
uses only length of word, the readability performance after re-ranking is
poorly performing.

Keywords: Adaptive query expansion ·Query coherence analysis ·
Genomics expansion ·Readability

1 Introduction
Health information is now available on the web and easy to access by non-
professionals who associate to health care system and called laypeople. The
2015 CLEF eHealth [1] aims to help laypeople in seeking health information.
To foster research and development of health-related search engine, the 2015
CLEF eHealth Task 2 [2] provides a shared collection of health-related web
pages and queries set. This task assumes that laypeople formulate query with
more terms to represent a single medical term. For example, “white part of eye
turned green” means to medical term “jaundice” [3].
From the example, we assume that the medical term(s) should be presented
in the relevant web page(s) and may be absented from the query. There is a gap
between query and relevant web pages. Query expansion (QE) technique is com-
monly used to handle this situation. Since there are many approaches for doing
expansion, we examine the effectiveness of different query expansion approaches
for different query styles. This work examines three local-based and one global-
based query expansion mechanisms. The local-based QE uses documents from
the initial query retrieval and the global-based QE uses all documents from the
collection for term selection.
The degree of coherence among query terms is used to estimate performance
of the query retrieval, called QPPpair. We assume that the query with per-
forming well should be expanded with terms from the small-size of top-ranked
documents or no need to expand any term. Because these documents are assumed
to be the relevance. The query with performing poorly should be expanded with
terms from the entire collection or from an external source.
In addtion to retrieve more useful web pages, the results should be easy
to read for laypeople. According to the readability requirement, factors used
to evaluate the readability [4–6] are sentence length, number of sentences, the
number of syllables or number of characters per word. Since our work treats the
collection as a bag of words therefore we assume that long word is a complex
word and then the complexity of document can be estimated by the frequency
of the long words.
In this paper, we use the following techniques to achieve the CLEF 2015
eHealth Task 2 [2] task; query performance prediction, query expansion and doc-
ument readability. The existing and related works of each technique are briefly
introduced in the next section. Our purposed method is described in the Sec-
tion 3. The experiment setting is explained in the Section 4. The results and
discussion are in the Section 5 and 6, respectively.

2 Related Works
2.1 Query Performance Prediction
Query performance prediction (QPP) [10] estimates retrieval effectiveness with-
out relevance judgments. The utilization of the QPP method is not limit to
estimate performance of query only, but also to determine query expansion mech-
anism and to select the most effectiveness expansion source as reported in works
[7, 8].
The survey of pre-retireval QPP predictors [9] have organized the QPP pre-
dictors as specificity, ambiguity, term relatedness, and ranking sensitivity. The
specificity of query is estimated from frequency of query term(s) in the collection
and number of documents that term presents. For example, the Average Inverse
Collection Term Frequency (AvICTF)[10] prefers terms that infrequenty appear
in the collection. The query with high AvICTF value should be related to the
documents that contain these specific terms and these documents may be the
relevant documents. The work [8] has used the AvICTF [10] to adapt query ex-
pansion mechanisms; non-expansion, or expand with terms from collection that
have the highest AvICTF value. Our work is inspired by this work [8] but we
use different QPP measure.
We focus on the term relatedness of QPP predictor because we assume that
query from laypeople contains more terms, as seen from the query example,
therefore relation among query terms may be reflect to the query performance.
The examples of the term relatedness-based QPP predictors [9] are the Aver-
age Pointwise Mutual Information (AvPMI), the Maximum Pointwise Mutual
Information (MaxPMI) and the query coherence score[11].
2.2 Query Expansion Approaches
Query expansion (QE) is a technique which new terms are added to an original
query and assumes that the new term(s) should be enlarging the document range
for second-pass retrieval. There are two major factors that influence on the ef-
fectiveness of QE technique; term selection method and reweighting method[12].
Our work focuses on the first factor only by investigating different sources of
terms for expansion. The term selection method based on local context anal-
ysis method[13] is reported in[14]. They constructed concept hierarchies from
text corpus and then provided related concept(s) for users to select in interac-
tive expansion manner. This idea is also applied in the construction of semantic
network for interactive query expansion[15].
There are three steps of concept selection described in[15]. The first step is
filtering concepts from terms. A concept is a term that frequently appears in
the retrieved documents set respect to entire collection. The second step is find-
ing the important concepts based on entropy of these concepts. The important
concepts should not frequently appear or infrequently appear in the retrieved
set. The entropy will be decreased if a concept is appeared more often or rarely.
The third step is finding the related concepts based on conditional probability.
Two concepts will be related if their conditional probabilities are more than a
threshold.

2.3 Readability
Readability of health information is a current issue as seen in many current
works examined the readability of web pages. Each work focuses on different
health condition such as epilepsy[4], breast cancer[5], stroke[6]. The commonly
used readability measures are the Fleshch-Kincaid grade level and the Simple
Measure of Gobbledygook (SMOG). The factors commonly used in the readabil-
ity formulas are sentence length, number of sentences, the number of syllables
or characters per word. Although these studies are focused on different topics,
their conclusion are in common. They found that health-related web pages have
a high grade level of the readability which uneasy to understand by laypeople.
The alternative way to measure readability is using language model[16] . The
model estimates the reading difficulty of web pages as the classification task.
The reading level classifier is based on linear combination of language model
and surface linguistic features in the document.

3 Methods
3.1 QPP Based on Query Term Coherence
Given an assumption that characteristic of the query of CLEF 2015 task[2]
consists of more general words, we aim to measure the relatedness among query
terms. If a pair of two terms occurs more often together in the corpus, then the
pair is seem to be related. Since we have observed that non-stop words in one
line of scientific papers are approximately appeared 7-8 words. Therefore in this
work we fixed window size at 7 word lenght for determining the pair.
Our QPP method is called the QP P pair, which measures how many pairs of
query terms existing in the collection (called P airExist) respect to all possible
pairs in the query (called AllP airs). The formula for measuring query coherence
is defined as Equation 1.

Σp∈Q P airExist(p)
QP P pair(Q) = (1)
AllP air(Q)
(
1 , if pair exist in collection
P airExist(p) = (2)
0 , otherwise

AllP air(Q) = Σp∈Q P air(p) (3)

In the preliminary experimental, we have examined the QP P pair of query set in
three collections; CLEF eHealth 2014, TREC Genomics 2004, and OHSUMED.
The results show that the Genomics collection is more consistent with the CLEF
collection so that we use the Genomics collection as external source of query
expansion. We use the average of the QP P pair value of three collections to
determine query expansion mechanism.

3.2 Query Expansion Mechanisms
There are five query expansion mechanisms; no query expansion, expanded with
top-small PRF set, expanded with top-medium PRF of both local and external
collection, expanded with top-large PRF of both local and external collection,
and expanded with co-occurrence terms in global collection.

Pseudo Relevance Feedback Query Expansion (PRF-QE). The retrieved
documents contain at least one query term. The top-ranked documents should
contain more query terms and assumed that indirectly related to a whole query.
In this PRF-QE method, we select concept terms from terms within the PRF
set based on the work [15] and finally select terms with the high specificity value
based on collection and document frequency.
Given a set, P RF = (x1 , x2 , ..., xn ) where xi is a term in the PRF set.
There are two conditions for finding concept terms; Equations 4 and Equation
5, respectively. The first condition measures how importance of term respect to
the PRF set based on the distribution of term. The second condition measures
the importance of term respect to entire collection.
T FP RF (xi )
pP RF (xi ) = > θc1 (4)
P RF size
and
pP RF (xi )
> θc2 (5)
pcollection (xi )
where T FP RF (x) is a total frequency of term x in the PRF set and P RF size
is a number of retrieved documents in PRF set. The pcollection (x) is distribution
of term x in the collection and measured by total collection frequency of term x
respect to total documents in the collection.
To filter important concept, the entropy of a concept in the retrieved set is
derived from the following equation.

G(xi ) = −pP RF (xi )log(pP RF (xi )) (6)

To select most related concepts, the conditional probability of them is used
in the following conditions.

pP RF (xi |xj ) > θsa and pP RF (xj |xi ) > θsb (7)

Now we have a number of concepts that dervied from Equation 4-7. Then
finally select smaller number of concepts for expansion based on collection fre-
quency and document frequency value as the following equation.

log(CF (c))
ConceptSpecif icity(c) = (8)
DF (c)

where CF (c) is the collection frequency of concept c and DF (c) is document
frequency that concept c appeared.

Cross Collection PRF-QE (CLEF-GPRF-QE). We assume that top-ranked
documents that retrieved from external collection are also indirectly related with
original query. Some queries are improved by this method as reported in work[8]
and in our previous work [17]. In the current work we also use TREC Genomics
2004 collection as external source and then results from the initial retrieval in
this collection are used to expand queries.
The method to select concepts is based on the PRF-QE method that de-
scribed in the previous section. In addtion, threshold values of external collection
are less than the CLEF collection, called target collection. Because the query is
generated for the target collection therefore retreived size and related terms of
the external are likely to be less than the target one.

Global Collection Query Expansion (Global-QE). We assign this method
for the query with lowest QP P pair value. This method is based on co-occurrence
assumption that two terms frequently occur together in the same context, are
likely to be related. The pair is two terms (ti , tj ) within a small window length.
This work fixes window length at 7 words for the co-occurrence condition. The
set of pairs is P airSet = (p1 , p2 , ..., pk ). Each pair pi has the collection frequency
ci called cotimes and this value is used to select the most related of a query term.
The cotimes set is CoT ime = (< p1 , c1 >, < p2 , c2 >, ..., < pk , ck >).
The steps of finding related terms from the co-occurrence terms are following.
1. Ascending sort the document frequency of query terms,
DF sorted = (q1 , q2 , ..., qm ).
2. Start from qi, lookup a corresponding pair(s) in the P airSet then take them
to the CandidateP air set.
– if qi does not exits in the P airSet set, increment i.
3. For each candidate pair, lookup the corresponding cotimes in the CoT ime
set.
4. Descending sort candidate pairs according to the cotimes value,
– if the pair pi consists of one query term as (qi , qi ), then cotimes ci of the
pair pi is used to select other term(s),
– otherwise, select top 5 terms from the sorted CandidateP air set.

3.3 Readability based on Documents Complexity

To return most readability documents to user, we measure the complexity of
document based on word length with the assumption that long word is the com-
plex word. The document contains more complex words may be high complexity
and uneasy to understand.

T otalComplexW ordInDocument(d)
DocComplex(d) = (9)
DocumentLength(d)
After retreival we re-rank each 200 documents in the results set with the
following equation.

ComplexV al(d) = 1 − 2.5 ∗ DocComplex(d) (10)

4 Experimental Design

We submit four runs where Run2 examines both query analysis method and
query expansion method. Run3 and Run4 examine re-ranking method based on
complexity of retrieved document.

Run1. The baseline retrieves with original title query using Lucene 4.6 [18] with
StandardAnalyzer configuration and using Lucene’s default VSM similarity.
Run2. The adaptive query expansion is using the QP P pair to determine the
expansion mechanism.
Run3. The re-ranking version of Run1 based on complexity of document.
Run4. The re-ranking version of Run2 based on complexity of document.

Determine Query Expansion Mechanisms. Queries are analyzed with the
QP P pair as described in Equation 1. Then each query is assigned with one of
five group and detail as the shown in Table 1. We note that values of QP P pair
in each group are from human estimation.
Table 1. Query Expansion Mechanism based on QPPpair

Group QP P pairScore QueryExpansionM echanism
One 0.5 <= x No expansion
Two 0.2 <= x < 0.5 PRF-QE with top 100 documents
Three 0.1 <= x < 0.2 CLEF-GPRF-QE with top 500 documents
Four 0 < x < 0.1 CLEF-GPRF-QE with top 1000 documents
Five 0 = x Global-QE

Table 2. Parameter Values for Concept Finding in the Target Collection

Parameter GroupTwo GroupThree GroupFour
TopK 100 500 1000
CollectionSize 914252 914252 914252
θc1 0.05 0.05 0.75
θc2 5.0 5.0 5.0
Entroy 0.05 0.05 0.00
θsa 0.08 0.08 0.08
θsb 0.03 0.03 0.03
MaxSelectTerm 7 9 9

Table 3. Parameter Values for Concept Finding in the External Collection

Parameter GroupThree GroupFour
TopK 500 1000
CollectionSize 3479789 3479789
θc1 0.025 0.025
θc2 2.5 2.5
Entroy 0.05 0.025
θsa 0.025 0.025
θsb 0.01 0.01
MaxSelectTerm 9 9
PartedSize 5 7
Adaptive Query Expansion. After determine group of query mechanism, we
do expansion each group with the parameter settings as shown the Table 2 and
Table 3. We also note that values of these parameters are from empirical trial.

5 Results
This task consists of two relevant judgments. The first one is traditional evalu-
ation meaurement that shows retrieval performance such as P@10 and NDCG.
The second measurement is readability-biased evaluation. The comparison of the
P@10 with other teams shown in Fig1,2,3, and 4. The map, P@10,NDCG cut 5
and NDCG cut 10 of our four runs are shown in Table 4. The readability per-
formance of all runs are shown in Table 5.

Table 4. The Retrieval Performance of Four Runs

Run map p@10 ndcg cut 5 ndcg cut 10
1 0.1090 0.2545 0.2354 0.2205
2 0.0930 0.2288 0.2047 0.1980
3 0.0219 0.0364 0.0248 0.0299
4 0.0180 0.0182 0.0169 0.0163

Table 5. The Readability-biasd Measure of All Runs

Run RBP (0.8) uRBP (0.8) uRP Bgr(0.8)
1 0.2785 0.2312 0.2251
2 0.2562 0.1818 0.1906
3 0.1679 0.1514 0.1425
4 0.0656 0.0600 0.0567

6 Discussion
Query Coherence Analysis. The query with more existing pairs in the corpus
does not guarantee that the result set will be more relevant as seen that queries
Q7,Q15,Q37, and Q51 have the high QppP air value but the P@10 value of these
queries is not better than the median retrieval performance. Because Lucene’s
default similarity function computes weight of individual term not pair of terms.
Using the QPPpair to determine expansion mechanism is still better than
expand all queries with the same mechanism. We believe that there are alterna-
tive ways to take advantage from these existing pairs such as reform the query
to add more weight on the pair or search query with phrase option.
Fig. 1. Baseline Run1 compare with all teams

Query Expansion Mechanism. The work [15] have provided candidate terms
for users to select in interactive manner but our work automatically selects terms
from the same candidate set for expansion. Our expansion terms are mixing both
useful and misleading terms because the theshold values as shown the Table 2
and Table 3 are setting with heuristic manner. By setting these parameters, we
observe candidate terms derived from different threshold values of some queries.
This is not a systematic manner for tuning parameter.
For expansion based on PRF set, the candidate terms sensitive to the tuning
parameter. Even the queries are in the same group, the candidate terms for some
query are likely to be related while others are drifting.
For global expansion method, the candidate terms are derived from the co-
occurrence terms of the most specificity term in the query. This method is work-
ing for some query, for example, Q61. “fingernail bruises”, the co-occurrence
terms of ”fingernail” are useful for expansion. On the other side, for example,
Q59. “heavy and squeaky breath”, the co-occurrence terms of the “squeaky” are
not useful for expansion. This is the weakness of this method.

Document Complexity Method. The complexity document score is using
only word length and then re-ranking retrieved documents. This method gets
Fig. 2. Adaptive query expansion Run2 compare with all teams

the worse results. Therefore using more features of the document is necessary to
estimate the complexity.

References

1. Lorraine Goeuriot, Liadh Kelly, Hanna Suominen, Leif Hanlen, Aurlie Nvol, Cyril
Grouin, Joao Palotti, Guido Zuccon.: Overview of the CLEF eHealth Evaluation
Lab 2015. CLEF 2015 - 6th Conference and Labs of the Evaluation Forum, Lecture
Notes in Computer Science (LNCS), Springer, September (2015)
2. J. Palotti, G. Zuccon, L. Goeuriot, L. Kelly, A. Hanbury, G.J.F Jones, M. Lupu,
and P. Pecina.: CLEF eHealth Evaluation Lab 2015, task 2: Retrieving Information
about Medical Symptoms. In CLEF 2015 Online Working Notes. CEUR-WS (2015)
3. Task 2: User-Centred Health Information Retrieval,
https://sites.google.com/site/clefehealth2015/task-2
4. Brigo, Francesco, et al.: Clearly written, easily comprehended? The readability of
websites providing information on epilepsy. Epilepsy and Behavior 44, 35–39 (2015)
5. Vargas, Christina R., et al.: Readability of online patient resources for the operative
treatment of breast cancer. Surgery 156(2), 311–318 (2014)
6. Sharma, Nikhil, Andreas Tridimas, and Paul R. Fitzsimmons.: A readability assess-
ment of online stroke information. Journal of Stroke and Cerebrovascular Diseases
23(6), 1362–1367 (2014)
Fig. 3. Re-ranking of baseline Run3 compare with all teams

7. Cronen-Townsend, Steve, Yun Zhou, and W. Bruce Croft.: A framework for selective
query expansion. In Proceedings of the thirteenth ACM international conference on
Information and knowledge management, pp. 236–237, ACM (2004)
8. He, Ben, and Iadh Ounis.: Combining fields for query expansion and adaptive query
expansion. Information processing and management 43(5) ,1294–1307 (2007)
9. Hauff, Claudia, Djoerd Hiemstra, and Franciska de Jong.: A survey of pre-retrieval
query performance predictors. In Proceedings of the 17th ACM conference on In-
formation and knowledge management, pp. 1419–1420, ACM (2008)
10. He, Ben, and Iadh Ounis.: Query performance prediction. Information Systems
31(7), 585–594 (2006)
11. Kumaran, Giridhar, and Vitor R. Carvalho.: Reducing long queries using query
quality predictors. In Proceedings of the 32nd international ACM SIGIR conference
on Research and development in information retrieval, pp. 564–571, ACM (2009)
12. Bai, Jing, and Jian-Yun Nie.: Adapting information retrieval to query contexts.
Information Processing and Management 44(6), 1901–1922 (2008)
13. Xu, Jinxi, and W. Bruce Croft.: Improving the effectiveness of information retrieval
with local context analysis. ACM Transactions on Information Systems (TOIS)
18(1), 79–112 (2000)
14. Sanderson, Mark, and Bruce Croft.: Deriving concept hierarchies from text. In
Proceedings of the 22nd annual international ACM SIGIR conference on Research
and development in information retrieval, pp 206–213, ACM (1999)
Fig. 4. Re-ranking of adaptive query expansion Run4 compare with all teams

15. Oh, J., Kim, T., Park, S., Yu, H., and Lee, Y. H.: Efficient semantic network
construction with application to PubMed search. Knowledge-Based Systems. 39,
185–193 (2013)
16. Si, Luo, and Jamie Callan.: A statistical model for scientific readability. In Pro-
ceedings of the tenth international conference on Information and knowledge man-
agement, pp. 574–576, ACM (2001)
17. Thesprasith, Ornuma, and Chuleerat Jaruskulchai.: Csku gprf-qe for medical topic
web retrieval. In Proceedings of the ShARe/CLEF eHealth Evaluation Lab (2014)
18. Apache Lucene, http://lucene.apache.org