<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>HITS and Misses: Combining BM25 with HITS for Expert Search</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Johannes Leveling</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gareth J. F. Jones</string-name>
          <email>gjonesg@computing.dcu.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Computing and Centre for Next Generation Localisation (CNGL) Dublin City University Dublin 9</institution>
          ,
          <country country="IE">Ireland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes the participation of Dublin City University in the CriES (Cross-Lingual Expert Search) pilot challenge. To realize expert search, we combine traditional information retrieval (IR) using the BM25 model with reranking of results using the HITS algorithm. The experiments were performed on two indexes, one containing all questions and one containing all answers. Two runs were submitted. The rst one contains the combination of results from IR on the questions with authority values from HITS; the second contains the reranked results from IR on answers with authority values. To investigate the impact of multilinguality, additional experiments were conducted on the English topic subset and on all topics translated into English with Google Translate. The overall performance is moderate and leaves much room for improvement. However, reranking results with authority values from HITS typically improved results and more than doubled the number of relevant and retrieved results and precision at 10 documents in many experiments.</p>
      </abstract>
      <kwd-group>
        <kwd>Expert Search</kwd>
        <kwd>Information Retrieval</kwd>
        <kwd>BM25</kwd>
        <kwd>HITS Algorithm</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The CriES pilot challenge [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] aims at multilingual expert search and is based
on a subset of the data provided by Yahoo! Research Webscope1. The complete
Yahoo QA dataset comprises 4.5M natural language questions and 35.9M
answers. Questions are associated with one or more answers and the best answer is
marked by users of the web portal. Questions are also annotated with categories
from a hierarchical classi cation system. The Yahoo QA dataset has been
previously used in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] to train a learning to rank approach. The CriES data subset was
extracted with the preprocessing tool provided by the organizers. This subset
contains 780,193 questions, posted by more than 150,000 users.
      </p>
      <p>For the CriES expert search experiments described in this paper, di erent
approaches to nd experts likely to answer a question were investigated: 1.
Finding experts by matching the current question with previously given answers.</p>
    </sec>
    <sec id="sec-2">
      <title>1 http://research.yahoo.com/</title>
      <p>This corresponds to a standard information retrieval approach on answer
documents. 2. Finding experts by matching the current question with questions
which have previously been answered. This approach is typically employed in
FAQ (frequently asked questions) search and corresponds to IR on questions.
3. + 4. Reranking the results of the two former approaches by interpreting HITS
authority values of question and answer documents as the level of expertise.</p>
      <p>The rest of this paper is organized as follows: Section 2 introduces related
work. Section 3 describes the theoretical background and the system setup for
the CriES experiments. Section 4 presents the experimental setup and results,
followed by an analysis and discussion of results in Section 5 and the paper
concludes with an outlook on future work in Section 6.
2</p>
      <sec id="sec-2-1">
        <title>Related Work</title>
        <p>
          Expert search on question answer (Q/A) pairs is a relatively new research area
which is related to search in FAQs, social network analysis, and question
answering (see, for example [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]).
2.1
        </p>
        <p>
          FAQ Search
Burke, et al. [
          <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
          ] introduce FAQ nder, a system for nding answers to
frequently asked questions. Their experiments are based on a small set of FAQ les
from Usenet newsgroups. A weighted sum of vector similarity between question
and Q/A pairs, term overlap, and WordNet-based lexical similarity between
questions is computed to nd the best results.
        </p>
        <p>
          Wu, et al. [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] use a probabilistic mixture model for FAQ nding in the
medical domain. Questions and answers are rst categorized and the Q/A pairs are
interpreted as a set of independent aspects. WordNet [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] and HowNet [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] are
employed as lexical resources for question classi cation. Answers are paragraphed,
and clustered by LSA and k-means. A probabilistic mixture model is used to
interpret questions and answers based on independent aspects. Optimal weights
in the probabilistic mixture model are estimated by expectation maximization.
This approach outperforms FAQ nder [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] in the medical domain.
        </p>
        <p>
          Jijkoun and de Rijke [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] describe FAQ nding based on a collection of crawled
web pages. Q/A pairs are extracted and questions are answered by retrieving
matching pairs. The approach is based on the vector space model and Lucene,
using a linear combination of retrieval in di erent elds.
        </p>
        <p>
          Chiu, et al. [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] use a combination of hierarchical agglomerative clustering
(HAC) and rough set theory for FAQ nding. HAC is applied to create a concept
hierarchy. Lower/upper approximation from rough set theory helps to classify
and match user queries. They conclude that rough set theory can signi cantly
improve classi cation of user queries.
        </p>
        <p>
          Several retrieval experiments described in this paper are also based on nding
experts who answered similar questions by indexing all questions.
2.2
Balog, Azzopardi, and de Rijke [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] propose two models to nd experts based
on documents for the TREC enterprise track2. The rst approach is to locate
knowledge from experts' documents; the second approach aims at nding
documents on topics and extract associated experts. To this end, they analyze the
communication link structure. They nd that the second approach consistently
outperforms the rst one.
        </p>
        <p>
          MacDonald and Ounis [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] perform experiments on expert nding on the
TREC enterprise data. They nd that increasing the precision in the document
retrieval step does not always result in better precision for the expert search.
        </p>
        <p>
          Yang, et al. [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] present the expert nding system EFS, which employs
experts' pro les created from their lists of publications. Category links are
extracted from Wikipedia. Nine di erent areas of expertise are di erentiated.
        </p>
        <p>Similar to extracting experts from retrieved documents, some retrieval
experiments described in this paper rely on retrieving answers to a given question
and extracting their experts (authors).
2.3</p>
        <p>
          Link Analysis
The HITS (Hyperlink-Induced Topic Search) algorithm is a link analysis
algorithm for rating web pages [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. PageRank [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] produces a static,
queryindependent score for web pages, taking the incoming and outgoing links of
a web page into account. In contrast, HITS produces two values for a web page:
its authority and its hub value. HITS values are computed at query time and
on results retrieved with an initial retrieval, i.e. the computations are performed
only on initially retrieved results, not across all linked web pages. Recent
variants of HITS have been concerned with stability of the algorithm [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] and with
modi cations of the algorithm to improve precision [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ].
        </p>
        <p>For our CriES experiments, we selected the HITS algorithm, as its values
are computed at query time and on a smaller document base. Thus, the HITS
algorithm does not require re-indexing the document collection to recompute
scores after modi cations or extensions to the algorithm. HITS scores also highly
correlate with in/outdegree of linked nodes, which intuitively correspond to the
level of expertise: the more information a person produces on a given topic, the
higher her/his level of expertise should be.</p>
        <p>The experiments for the CriES pilot challenge can be based on two di erent
types of data which were provided by the organizers: a collection of Q/A pairs
and a linked graph model extracted from this collection. Furthermore, the CriES
challenge is unique in that it aims at expert nding in a multilingual setting, i.e.
topics are provided in di erent languages.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>2 http://www.ins.cwi.nl/projects/trec-ent/</title>
      <sec id="sec-3-1">
        <title>System Description</title>
        <p>3.1</p>
        <p>Topic and Document Processing
Interpreting individual questions and answers as documents, standard IR
techniques can be applied for expert search. In our work, the Lucene toolkit3 was
utilized to preprocess the topics and documents, and to index and search the
document collection. Standard Lucene modules were employed to tokenize the
questions and answers and to fold upper case characters to lower case. Stopword
lists from Jacques Savoy's web page on multilingual IR resources4 were used to
identify stopwords. Stemming of topics and documents was performed using the
Snowball stemmer for the corresponding language provided in Lucene. For all
retrieval experiments, only the topic elds for `title' and `description' were used to
create IR queries for Lucene (TD). The elds `narrative' and `questioner' were
omitted for query formulation. The `answerer' eld was used to form documents
IDs. Figure 1 shows a sample topic.</p>
        <p>The CriES question-answer set was preprocessed by us to generate two types
of documents from the original CriES documents: answer documents (A) and
question documents (Q). The rst type of document contains the `answerer'
ID as a document ID and the text of his answer concatenated with the
category of the question. This retrieval approach realizes standard IR by nding
answers based on the replies the users have already generated. The second type
of document contains the `answerer' ID as a document ID and the question text
concatenated with all category labels from the original document. Thus, retrieval
on these documents aims at nding experts by matching the input question with
previous questions the answerer has replied to. In detail, documents for indexing
were created as follows: answer documents were extracted from answers given
(i.e. `bestanswer' ); question documents consist of the question text (i.e. `subject',
`content' ). Both types of documents were concatenated with the category elds
(i.e. `cat', `maincat', `subcat' ). In addition, the link graph consisting of nodes
representing experts and links between questioner and answerer (provided as
part of the CriES challenge) was employed as input for the HITS algorithm.
3.2</p>
        <p>
          The Information Retrieval System
Support for the BM25 retrieval model [
          <xref ref-type="bibr" rid="ref18 ref19">18, 19</xref>
          ] and for the corresponding BRF
approach (see Equation 1 and 2) was implemented for Lucene by one of the
authors. The BM25 score for a document and a query Q is de ned as:
t2Q
X w(1) (k1 + 1)tf (k3 + 1)qtf
        </p>
        <p>K + tf k3 + qtf
(1)</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3 http://lucene.apache.org/</title>
    </sec>
    <sec id="sec-5">
      <title>4 http://members.unine.ch/jacques.savoy/clef/index.html</title>
      <p>&lt;topic lang="en"&gt;
&lt;identifier&gt;3938625&lt;/identifier&gt;
&lt;title&gt;What is the origin of "foobar"?&lt;/title&gt;
&lt;description&gt;I want to know the meaning of the</p>
      <p>
        word and how to explain to my friends.&lt;/description&gt;
&lt;narrative/&gt;
&lt;category&gt;Programming &amp;amp; Design&lt;/category&gt;
&lt;questioner&gt;u1061966&lt;/questioner&gt;
&lt;answerer&gt;u25724&lt;/answerer&gt;
&lt;/topic&gt;
where Q is the query, containing terms t and w(1) is the RSJ (Robertson /
Sparck-Jones) weight of t in Q [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]:
w(1) =
(n
      </p>
      <p>(d + 0:5)=(D
d + 0:5)=(N
n
d + 0:5)</p>
      <p>
        D + d + 0:5)
(2)
where k1, k3, and b are model parameters. The default parameters for the BM25
model used are b = 0:75, k1 = 1:2, and k3 = 7. N is the number of documents
in the collection and D is the number of documents known or presumed to be
relevant for the current topic. For the experiments described in this paper, D
was set to 0, i.e. no blind relevance feedback was employed, because the number
of experts and precision of our initial retrieval were presumed to be very low. n is
the document frequency for the term and d is the number of relevant documents
containing the term. tf is the frequency of the term within a document; qtf is
the frequency of the term in the topic. K = k1((1 b) + b doclen=avg doclen)
doclen and avg doclen are the document length and average document length,
respectively. The BM25 retrieval model has been employed for many years in
evaluation campaigns such as TREC [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], but can still be considered as a
stateof-the-art IR approach.
3.3
      </p>
      <p>
        Reranking with HITS
The HITS algorithm is a link analysis algorithm for rating web pages [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
Unlike PageRank [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], which produces a static, query-independent score, HITS
produces two values for a web page: its authority and its hub value. In contrast
to PageRank, HITS values are computed at query time and on results retrieved
with an initial retrieval [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. The computations are performed only on initially
retrieved results, not across all linked web pages. The authority estimates the
value of the content of a web page (also referred to as item in the rest of the
paper, because the CriES data does not comprise web pages). In terms of expert
search, the authority value indicates the quality of answers given, and indirectly
the experts' level of expertise. The hub value estimates the value of its links
to other pages. Authority and hub values are de ned recursively and in terms
of one another. The authority value is calculated as the sum of the scaled hub
values that of items linking to that item. The hub value of an item is computed
by the sum of the scaled authority values of the items it links to.
      </p>
      <p>To apply HITS for expert search, the expert graph is viewed as a linked graph
of experts (corresponding to web pages) with directed connections (links) from
questioners to answerers if the answerer provided an answer to a question.</p>
      <p>For the experiments described in this paper, the hub and authority values
for an item are calculated with the following algorithmic steps, iterating steps
(2)-(4) for k times (see also Figure 2):
(1) Initialize: Set the hub and authority value for each item (node) to 1.
(2) Update authority values: Update the authority value of each item to be equal
to the sum of the hub values of each item that points to it. That is, items
with a high authority value are linked to by items that are recognized as
informational hubs.
(3) Update hub values: Update the hub value of each item to be equal to the
sum of the authority values of each item that it points to. That is, items
with a high hub value link to items that can be considered to be authorities
on the subject.
(4) Normalize values: Normalize the authority and hub values by dividing each
authority value by the sum of the squares of all authority values, and dividing
each hub value by the sum of the squares of all hub values.</p>
      <p>Applied to expert search, hubs can be interpreted as persons interested in a
topic, and authorities can be seen as experts on a topic.
4</p>
      <sec id="sec-5-1">
        <title>Experiments</title>
        <p>The dataset consists of Q/A pairs which are maintained and veri ed by
experts. In contrast to IR, results represent experts which may be associated with
di erent levels of expertise; in comparison with FAQ nding, expert search
focuses on looking for people most capable of providing an answer. In the simplest
case, people have already provided that answer to the same or to similar
questions. A graph model was provided as part of the CriES data, which consists of
a directed graph representation where nodes denote topics, incoming links are
questions and outgoing links represent answers.</p>
        <p>The answer documents (A) and question documents (Q) generated from the
CriES data were indexed separately. The following experimental settings were
varied:
{ index: retrieval on answer documents (A) or question documents (Q)
{ language: no topic translation; topic translation (using Google Translate)5;</p>
        <p>English topics only.
{ retrieval method: using standard IR (BM25); combining BM25 with HITS
authority values from top 50/100 results (HITS 50/100).</p>
        <p>The BM25 retrieval model was used with default parameters (b = 0:75,
k1 = 1:2, and k3 = 7), retrieving the top 100 results for each topic. The HITS
algorithm was applied on the top 50 or top 100 results retrieved by standard
retrieval with the BM25 model. The HITS algorithm was run for 50 iterations
(k = 50). The experimental setting were chosen empirically after initial retrieval
experiments on CriES test data. Table 1 shows results for o cial and additional
expert search experiments on the CriES data. The submitted runs were obtained
by retrieving 100 results via IR and reranking these results with the HITS
authority value. The top ten results for each topic were extracted for submission.
5</p>
      </sec>
      <sec id="sec-5-2">
        <title>Discussion and Analysis</title>
        <p>
          As described in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], a baseline run resulting from BM25+Z-Score was generated
by the organizers of the pilot challenge. This baseline experiment was based on
di erent language-speci c indexes, using Google Translate for topic translation.
Z-Score normalization was employed to aggregate nal results. Two di erent sets
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5 http://translate.google.com/</title>
      <p>-</p>
      <p>0.19
50 0.0106 0.0833
56 0.0123 0.0933
146 0.0584 0.2433
38 0.0102 0.0633
46 0.0113 0.0767
128 0.0441 0.2133
12 0.0107 0.0800
18 0.0139 0.1200
10 0.0070 0.0667
9 0.0046 0.0600
41 0.0067 0.0683
101 0.0409 0.1683
23 0.0039 0.0383
75 0.0290 0.1250
-</p>
      <p>
        0.39
112 0.0143 0.1867
241 0.0426 0.4017
206 0.0393 0.3433
109 0.0134 0.1817
251 0.0425 0.4183
183 0.0328 0.3050
27 0.0133 0.1800
76 0.0569 0.5067
31 0.0164 0.2067
74 0.0509 0.4933
71 0.0063 0.1183
141 0.0266 0.2350
61 0.0067 0.1017
119 0.0213 0.1983
of relevant judgments were provided by the organizers, corresponding to strict
evaluation (experts likely able to answer are considered relevant) and lenient
evaluation (experts likely able to answer and experts which may be able to
answer are relevant). A comparison of our experimental results to the provided
baseline reveals that our best experimental results are slightly higher in terms of
P@10 (0.24 vs. 0.19 for strict, 0.42 vs. 0.39 for lenient relevance judgments). The
results do not consistently outperform the BM25 baseline, and show much lower
performance than the best results reported by the organizers [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. To test if this
behavior was caused by the missing topic translation, the data was analyzed in
more detail with respect to the languages.
      </p>
      <p>The topic set contains 60 topics. The lenient relevance assessment contains
3602 relevant entries (60.03 relevant items on average per topic), the strict
assessment 1736 entries (28.93 relevant items average). Table 2 shows the
distribution of languages in topics, questions, and answers. As the topics are equally
distributed among four languages (15 topics per language), a more detailed
analysis of the language of questions and answers was performed. This analysis shows
that languages of Q/A pairs are not equally distributed among these languages,
i.e. there is a bias towards English (91.3%).
The question index and answer index both contain 780,133 documents with
the language distribution shown in Table 2.6 While the topics are equally
distributed between the four languages German (DE), English (EN), Spanish (ES),
and French (FR), the majority of question and answer documents are in English.</p>
      <p>As an additional experiment, the experimental results were calculated for the
English topics only (the rst 15 topics). However, there seems to be little bias
towards English in relevant items compared to all items, because MAP slightly
decreases from 0.1867 to 0.1800 and the number of relevant items is about a
quarter of relevant items for all topics (27 vs. 112).</p>
      <p>A comparison of retrieval on question documents and answer documents
shows that results (i.e. rel ret, MAP, and P@10) are slightly higher for IR on
answer documents. A possible explanation is that answers are typically longer
than questions and provide more terms to match. Thus, a lexical mismatch may
be less likely. The reranking with HITS authority typically shows considerable
improvement in the number of relevant and retrieved results (rel ret), mean
average precision (MAP), and P@10, more than doubling P@10 for the lenient
evaluation.</p>
      <p>
        There are several possible explanations of the moderate performance in
comparison to the best CriES runs submitted by other participants (see, for example,
the overview in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]):
      </p>
      <p>
        Multilinguality: All 60 topics (corresponding to queries) are equally
distributed between German, English, Spanish, and French. Assuming that the
questions and answers in the CriES data was similarly distributed, no topic or
document translation was performed for our o cial experiments and all
documents in their original language were organized in a single index. However, most
answers in the CriES data are written in English. The baseline experiment in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
was in contrast conducted on di erent language-speci c indexes, combining
results by Z-Score. Additional experiments on the subset of English topics and on
translated topics show that there is in fact no bias towards English documents
in the relevance assessments (see Table 1).
6 The original number of documents reported by [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is 780,193, but a small number
of documents do not include a language code or contain invalid XML and have not
been indexed.
      </p>
      <p>Expert model: Experts are represented by a set of individual questions
or answers. An aggregated model, i.e. combining all questions and answers into
a single representation (e.g. document or weighted term vector) has not been
investigated. The main reason for this is that a single category (unrelated to the
current topic) may dominate in all contributions of a single user.</p>
      <p>
        External resources: No additional external resources have been used for
the experiments described in this paper. Standard approaches for FAQ search
typically utilize resources such as WordNet [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] to bridge the lexical gap between
questions and answers. However, in a multilingual problem setting, WordNet
may be of limited use, because WordNet synsets contain only English terms.
Multilingual resources such as EuroWordNet [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] seem to be more suitable, but
su er from a limited lexical coverage.
      </p>
      <p>Link analysis: Traditional approaches for link analysis of the link graph
provided in the CriES data have not been used. Instead, the HITS algorithm,
which is typically applied for reranking web pages has been employed for
reranking results. Interestingly, reranking initial retrieval results with HITS authority
values improved performance considerably in most cases, increasing MAP and
P@10 and yielding more than double the number of relevant results compared to
the corresponding BM25 experiment, even for poor initial results. This result was
unexpected, because the initial precision is very low for the result set retrieved
with the BM25 model. Standard query expansion techniques such as blind
relevance feedback also aim at improving at performance by reranking documents
in a second retrieval phase, but build on the assumption that top-ranked
documents in an initial retrieval phase are relevant (i.e. the initial precision is already
high). If initial results have low precision, standard query expansion techniques
will typically add noise instead of useful terms. Reranking results with HITS
authority values improves performance despite low initial precision.
6</p>
      <sec id="sec-6-1">
        <title>Conclusion and Future Work</title>
        <p>The experiments on the CriES data show that traditional IR methods alone (i.e.
the BM25 retrieval model) may not be suitable for this kind of task and social
network or link analysis may be more successful. Reranking results with HITS
authority values seems to improve performance even when the initial precision is
low. The multilingual aspect introduced in the CriES challenge seems arti cial
because most contributions in the data are in English. However, experiments on
the English topic subset did not show a bias.</p>
        <p>Future work will include adding knowledge from external resources such as
Wikipedia and expanding the use of categories and other metadata provided in
the CriES data. Also, reranking with HITS authority scores for ad-hoc IR will
be investigated.</p>
      </sec>
      <sec id="sec-6-2">
        <title>Acknowledgments</title>
        <p>This material is based upon works supported by the Science Foundation Ireland
under Grant No. Grant 07/CE/I1142.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Sorg</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cimiano</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sizov</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Overview of the cross-lingual expert search (CriES) pilot challenge</article-title>
          .
          <source>In: Working Notes of the CLEF 2010 Lab Sessions</source>
          . (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Surdeanu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ciaramita</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zaragoza</surname>
          </string-name>
          , H.:
          <article-title>Learning to rank answers on large online QA collections</article-title>
          .
          <source>In: ACL</source>
          <year>2008</year>
          ,
          <article-title>Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics</article-title>
          , June 15-20,
          <year>2008</year>
          , Columbus, Ohio, USA, The Association for Computer Linguistics (
          <year>2008</year>
          )
          <volume>719</volume>
          {
          <fpage>727</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Harabagiu</surname>
            ,
            <given-names>S.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maiorano</surname>
            ,
            <given-names>S.J.:</given-names>
          </string-name>
          <article-title>Finding answers in large collections of texts: Paragraph indexing + abductive inference</article-title>
          .
          <source>In: Proceedings of the AAAI Fall Symposium on Question Answering Systems</source>
          . (
          <year>1999</year>
          )
          <volume>63</volume>
          {
          <fpage>71</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Burke</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hammond</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kulyukin</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lytinen</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tomuro</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schoenberg</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>Natural language processing in the FAQ nder system: Results and prospects</article-title>
          .
          <source>In: Proceedings of the 1997 AAAI Spring Symposium on Natural Language Processing for the World Wide Web</source>
          . (
          <year>1997</year>
          )
          <volume>17</volume>
          {
          <fpage>26</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Burke</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hammond</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kulyukin</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lytinen</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tomuro</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schoenberg</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Question answering from frequently-asked-question les: Experiences with the FAQ nder system</article-title>
          .
          <source>Technical Report TR-97-05</source>
          , Dept. of Computer Science, University of Chicago (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>C.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yeh</surname>
            ,
            <given-names>J.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          :
          <article-title>Domain-speci c FAQ retrieval using independent aspects</article-title>
          .
          <source>ACM Transactions on Asian Language Processing</source>
          <volume>4</volume>
          (
          <issue>1</issue>
          ) (
          <year>2005</year>
          )
          <volume>1</volume>
          {
          <fpage>17</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Fellbaum</surname>
          </string-name>
          , C., ed.:
          <string-name>
            <surname>Wordnet</surname>
          </string-name>
          .
          <article-title>An Electronic Lexical Database</article-title>
          . MIT Press, Cambridge, Massachusetts (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Dong</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dong</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          :
          <article-title>HowNet And the Computation of Meaning</article-title>
          . World Scienti c Publishing, River Edge, NJ, USA (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Jijkoun</surname>
          </string-name>
          , V.,
          <string-name>
            <surname>de Rijke</surname>
          </string-name>
          , M.:
          <article-title>Retrieving answers from frequently asked questions pages on the web</article-title>
          .
          <source>In: CIKM'05, October 31-November 5</source>
          <year>2005</year>
          , Bremen, Germany. (
          <year>2005</year>
          )
          <volume>76</volume>
          {
          <fpage>83</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Chiu</surname>
            ,
            <given-names>D.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>P.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pan</surname>
            ,
            <given-names>Y.C.</given-names>
          </string-name>
          :
          <article-title>Dynamic FAQ retrieval with rough set theory</article-title>
          .
          <source>IJCSNS International Journal of Computer Science and Network Security</source>
          <volume>7</volume>
          (
          <issue>8</issue>
          ) (
          <year>2007</year>
          )
          <volume>204</volume>
          {
          <fpage>211</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Balog</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Azzopardi</surname>
          </string-name>
          , L.,
          <string-name>
            <surname>de Rijke</surname>
          </string-name>
          , M.:
          <article-title>Formal models for expert nding in enterprise corpora</article-title>
          .
          <source>In: SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval</source>
          , New York, NY, USA, ACM (
          <year>2006</year>
          )
          <volume>43</volume>
          {
          <fpage>50</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Macdonald</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ounis</surname>
            ,
            <given-names>I.:</given-names>
          </string-name>
          <article-title>The in uence of the document ranking in expert search</article-title>
          .
          <source>In: CIKM '09: Proceeding of the 18th ACM conference on Information and knowledge management</source>
          , New York, NY, USA, ACM (
          <year>2009</year>
          )
          <year>1983</year>
          {
          <fpage>1986</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>K.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>C.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>J.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ho</surname>
            ,
            <given-names>J.M.:</given-names>
          </string-name>
          <article-title>EFS: Expert nding system based on Wikipedia link pattern analysis</article-title>
          .
          <source>In: IEEE explore. (</source>
          <year>2008</year>
          )
          <volume>631</volume>
          {
          <fpage>635</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Kleinberg</surname>
          </string-name>
          , J.:
          <article-title>Authoritative sources in a hyperlinked environment</article-title>
          .
          <source>Journal of the ACM</source>
          <volume>5</volume>
          (
          <issue>46</issue>
          ) (
          <year>1999</year>
          )
          <volume>604</volume>
          {
          <fpage>632</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Brin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Page</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>The anatomy of a large-scale hypertextual web search engine</article-title>
          .
          <source>In: WWW7: Proceedings of the seventh international conference on World Wide Web 7</source>
          , Amsterdam, The Netherlands, The Netherlands, Elsevier Science Publishers
          <string-name>
            <surname>B. V.</surname>
          </string-name>
          (
          <year>1998</year>
          )
          <volume>107</volume>
          {
          <fpage>117</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Ng</surname>
            ,
            <given-names>A.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zheng</surname>
            ,
            <given-names>A.X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jordan</surname>
            ,
            <given-names>M.I.</given-names>
          </string-name>
          :
          <article-title>Stable algorithms for link analysis</article-title>
          .
          <source>In: SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval</source>
          , New York, NY, USA, ACM (
          <year>2001</year>
          )
          <volume>258</volume>
          {
          <fpage>266</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          , W.:
          <article-title>Improvement of HITS-based algorithms on web documents</article-title>
          .
          <source>In: WWW '02: Proceedings of the 11th international conference on World Wide Web</source>
          , New York, NY, USA, ACM (
          <year>2002</year>
          )
          <volume>527</volume>
          {
          <fpage>535</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Robertson</surname>
            ,
            <given-names>S.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Walker</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hancock-Beaulieu</surname>
            ,
            <given-names>M.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gatford</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Okapi at TREC-3</article-title>
          . In Harman, D.K., ed.:
          <source>Overview of the Third Text Retrieval Conference (TREC-3)</source>
          , Gaithersburg,
          <string-name>
            <surname>MD</surname>
          </string-name>
          , USA,
          <source>National Institute of Standards and Technology (NIST)</source>
          (
          <year>1995</year>
          )
          <volume>109</volume>
          {
          <fpage>126</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Robertson</surname>
            ,
            <given-names>S.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Walker</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Beaulieu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Okapi at TREC-7: Automatic ad hoc, ltering, VLC and interactive track</article-title>
          . In Harman, D.K., ed.:
          <source>The Seventh Text REtrieval Conference (TREC-7)</source>
          , Gaithersburg,
          <string-name>
            <surname>MD</surname>
          </string-name>
          , USA,
          <source>National Institute of Standards and Technology (NIST)</source>
          (
          <year>1998</year>
          )
          <volume>253</volume>
          {
          <fpage>264</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Robertson</surname>
            ,
            <given-names>S.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sparck-Jones</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Relevance weighting of search terms</article-title>
          .
          <source>Journal of the American Society for Information Science</source>
          <volume>27</volume>
          (
          <year>1976</year>
          )
          <volume>129</volume>
          {
          <fpage>146</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Vossen</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Introduction to EuroWordNet</article-title>
          . In:
          <article-title>EuroWordNet: a multilingual database with lexical semantic networks</article-title>
          . Kluwer Academic Publishers, Norwell, MA, USA (
          <year>1998</year>
          )
          <volume>1</volume>
          {
          <fpage>17</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>