<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Language Model Document Priors based on Citation and Co-citation Analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Haozhen Zhao</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xiaohua Hu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>College of Computing &amp; Informatics, Drexel University</institution>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Citation, an integral component of research papers, implies certain kind of relevance that is not well captured in current Information Retrieval (IR) researches. In this paper, we explore ingesting citation and co-citation analysis results into IR modeling process. We operationalize on going beyond the general uniform document prior assumption in language modeling framework through deriving document priors from papers citation counts, citation induced PageRank and co-citation clusters. We test multiple ways to estimate these priors and conduct extensive experiments on the iSearch test collection. Our results do not suggest significant improvements of using these priors over no prior baseline measured by mainstream retrieval e ectiveness metrics. We analyze the possible reasons and suggest further directions in using bibliometric document priors to enhance IR.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Recent years have seen growing interests in combining bibliometrics and
information retrieval (IR), the two major specialties of information science [23]. White
proposed a synthesis of the two under Sperber and Wilson's relevance theory,
leading to a novel Pennant visualization for accessing literature [24]. Extensive
researches have been carried on leveraging the inherent regularity and
dynamics of bibliographical entities in scienti c information spaces to improve search
strategies and retrieval quality [
        <xref ref-type="bibr" rid="ref13 ref15">15,13</xref>
        ].
      </p>
      <p>
        We participate in this line of inquiry by studying incorporating evidences
derived from citation and co-citation analysis into a formal IR model. Though the
importance of citation in assisting researchers to access literature is self-evident,
there are not many studies on incorporating them into formal retrieval
models. Mainstream IR modeling researches generally center around term weighting,
smoothing, matching, etc. Still it is possible to ingest bibliometric insights into
formal IR models if they are conceptualized as query independence evidences
or static features [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. We adopt here the language modeling framework to
investigate whether including citation and co-citation information as document
prior probabilities of being relevant to queries improves retrieval e ectiveness.
(See Section 3.2 for details) We used three kinds of data to estimate document
priors: (1) Paper's citation count, (2) Paper's PageRank induced from citation
relationships, and (3) Paper co-citation clusters. We compare each approach in
terms of general retrieval e ectiveness measurements with extensive experiments
on the iSearch test collection1.
      </p>
      <p>In Section 2, we review related work as the context of our work. Section 3
details on our retrieval model, experiment setup, and the document priors we
choose and their estimation methods. Section 4 reports our experiment results
and discussion. Section 5 concludes the paper with future directions.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <sec id="sec-2-1">
        <title>Using citation in information retrieval</title>
        <p>
          Gar eld initiated the idea of creating citation indexes for scienti c articles [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ].
Smith reviewed early researches in using citation relations in information
retrieval [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]. Salton found out that textual similarity correlated with citation
similarity and proposed using terms from bibliographic citation documents to
augment original document representation [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. Larsen studies the \boomerang"
e ect, which is to use frequently occurring citations in top retrieval result to
query against citation indexes for relevant documents [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. Yin et al. studied
linearly combining content score and link score to improve biomedical
literature retrieval [25]. For the iSearch test collection, Norozi et al. experimented
with a contextualization approach to boost document scores with their random
walked neighborhood documents over the in-link and out-link citation network
[
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. Document co-citation, as a methodology was proposed by Small, is mostly
used in revealing scienti c information structure [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. In this paper, we explore
using document co-citation clusters for document prior estimation.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Language Model Document Priors</title>
        <p>
          Prior information is shown to be useful in certain Web search tasks, e.g. entry
page nding [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. The language model provides an elegant and principled
framework to include document priors. Previous studies have used citation counts
[
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], document length [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], document quality [27], URL type [
          <xref ref-type="bibr" rid="ref18 ref9">18,9</xref>
          ] and so on as
language document priors. For the iSearch collection, there are studies that use
the document type as prior [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ], as well documents matched with disambiguated
query terms[
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], in which documents get a higher prior probability of relevance if
they match disambiguated query terms. We further this line of study and
introduce document co-citation analysis in estimating document priors and compared
it with other methods.
3
3.1
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Methodology</title>
      <sec id="sec-3-1">
        <title>Dataset</title>
        <p>
          We use the iSearch test collection as our test collection. The iSearch collection
was created by the iSearch team. It consists of 18,443 book MAchine-Readable
1 http://itlab.dbit.dk/~isearch/
Cataloging (MARC) records (BK), 291,246 articles metadata (PN) and 291,246
PDF full text articles (PF), plus 3.7 million extracted internal citation entries
among PN and PF. 66 topics drawn from physics researchers' real information
needs with corresponding relevance judgment data also come with the collection
[
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. Previous study has shown that the iSearch collection is appropriate to
informetric analysis [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. Of all the PN and PF documents, 259,093 are cited at
least once, which is chosen as the subset for our experiment for reducing citation
sparsity consideration. We index them with Indri2. Following the best practice
in [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ], we used the SMART stopword list and Krovetz stemming method.
Accordingly, we removed documents not in our index from the relevance judgement
les. Then we lter out topics without any relevant documents in the relevance
judgement data, resulting 57 valid topics out of the original 66 topics (topic 5,
6, 15, 17, 20, 25, 42, 54, 56 are excluded). We used only the \search terms" eld
of the topics as our queries.
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Retrieval Model</title>
        <p>
          We use language model as our IR modeling framework. In particular, we choose
the query-likelihood language model [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. In this model, the relevance of a
document D to a query Q is modeled as how likely a user would pose such a query
for this document, P (DjQ). Using Bayesian rule, P (DjQ) can be rewritten as:
        </p>
        <p>P (DjQ) / P (QjD)P (D);
which is easier to be estimated and implemented in IR systems. Much work
has been done in nding e ective ways to smooth P (QjD), but generally
document prior P (D) is assumed to be uniform thus not a ecting the ordering of
the retrieval results therefore being ignored [26]. Here we go beyond this
uniformity assumption by focusing on the estimation of P (D) with citation and
co-citation analysis results. We propose three kinds of priors based on citation
counts, citation induced paper PageRank and co-citation clusters.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Document Priors and Their Estimation</title>
        <p>Analyzing paper citation and co-citation network of the iSearch dataset, we
propose three kinds of document priors: paper citation count, paper PageRank
score induced from citation relationships and co-citation clusters. We tested two
kinds of prior estimation methods: maximum likelihood estimation (MLE) and
binned estimation. For the MLE approach we also tried a logarithm version.
We explain here the three kinds of document priors and how to calculate them.
Paper Citation Count Prior In this case, document prior P (D) is directly
estimated based on the proportion of the number of times of a paper being cited
(Ci) to the total number of times of all papers being cited:</p>
        <p>Pcitedcount mle(D) =</p>
        <p>Ci
PN
k=1 Ck
;
2 http://www.lemurproject.org/indri.php
(1)
(2)</p>
        <p>Ppagerank mle(D) =
and the logarithm version:</p>
        <p>Ppagerank log mle(D) =</p>
        <p>PRi
PN
k=1 PRk</p>
        <p>;
log(PRi)
PN
k=1 log(PRk)
:
and the logarithm version:</p>
        <p>Pcitedcount log mle(D) =</p>
        <p>log(Ci)
PN
k=1 log(Ck)
:
Paper PageRank Prior We use the internal citation structure of the iSearch test
collection to calculate the PageRank value for all the papers in our index. The
PageRank value of a given paper d is:</p>
        <p>PageRank(d) =</p>
        <p>X
x2D !d</p>
        <p>PageRank(x)
jDd! j
+
1</p>
        <p>
          N
;
where D !d and Dd! denotes papers citing d and cited by d respectively, N is
the total number of papers in the collection. = 0:85 is called damping factor
[
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. Let PRi be the PageRank score of paper i, then document PageRank prior
using MLE is:
(3)
(4)
(5)
(6)
Paper Co-citation Cluster Prior In this case, documents get prior probabilities
based on the cluster they belong to. We calculated the document co-citation
counts and compiled all the co-citation among the indexed papers, resulting
a weighted undirected graph with 259,093 vertices and 33,888,861 edges, with
edge weights being the number of times two papers are cited together. We then
use the graph clustering software Graclus3 to cluster the document co-citation
network. Graclus provides two clustering algorithms, Normalized Cut (NCT) to
minimize the sum of edge weights between clusters and Ratio Association (ASC)
to maximize edge density within each clusters [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. We tried both algorithms and
decided to use NCT here because with ASC, most papers are easily clustered
into one huge cluster, preventing e ective prior estimation.
        </p>
        <p>In the co-citation binned estimation method, the probability a document d
from a given bin is given by:</p>
        <p>Pcocited(D) =
# relevant documents of a bin
# documents of a bin
=</p>
        <p># documents of a bin
# total number of documents
: (7)</p>
        <p>We used a cross validation method to estimate P (D) in bins. We rst order
the 57 topic randomly and divide them into 5 folds (11, 11, 11, 12, 12). Then at
each round we use 4 folds to estimate the P (D), and use the other 1 fold to test
with the prior. We rotate 5 rounds, with each fold being testing set once, then
we average results in all the testing folds as the nal scores.
3 http://www.cs.utexas.edu/users/dml/Software/graclus.html</p>
        <p>We also applied binned estimation methods on Citation Count and PageRank
priors. We divide all papers into 10 bins and used the aforementioned ve fold
cross validation approach to geting the nal scores. In total, there are 8 runs
reported in Table 1</p>
        <p>All estimated P (D) values are converted into logarithm values and applied
as Indri prior les and combined with the index using makeprior application of
Indri. During the retrieval process, they are applied to query terms according to
the Indri Query Syntax #combine(#prior( PRIOR ) query terms).
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiment Results and Discussion</title>
      <p>
        With the baseline no prior setup, we extensively tested Jelinek{Mercer (JM)
smoothing with 2f0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 0.99g,
Dirichlet prior smoothing with 2f100, 500, 800, 1000, 2000, 3000, 4000, 5000,
8000, 10000g, and two-stage smoothing with f g. We nd JM smoothing
with = 0:7 performs top almost on all the four metrics we chosen. Therefore,
we choose it as our retrieval model setting for the reporting baseline and other
runs. For each run, we report four mainstream retrieval e ectiveness
measurements: Mean Average Precision (MAP), Precision at 10 (P@10), nDCG [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and
BPREF[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>MAP P@10 nDCG BPREF
baseline-noprior 0.1152 0.1474 0.3134 0.3079
citedcount-mle 0.0990 0.1351 0.2825 0.2846
citedcount-log-mle 0.1092 0.1439 0.3046 0.3005
citedcount-bin10 0.1139 0.1452 0.3103 0.2943
pagerank-mle 0.1036 0.1386 0.2972 0.2941
pagerank-log-mle 0.1072 0.1421 0.3031 0.2989
pagerank-bin10 0.1137 0.1434 0.3099 0.2969
cocited-bin10 0.1155 0.1397 0.3122 0.3013</p>
      <p>Table 1 shows our results in di erent setups. We can see that the overall
effectiveness of applying document priors based on citation counts, PageRank and
co-citation clusters is limited. The only marginal improvement over the baseline
happens in cocited-bin10 on MAP. But we can still see di erence across priors:
overall, logarithm smoothed estimations are better than non-smoothed; binned
estimations perform better than MLE estimation.</p>
      <p>There are several possible reasons for our results. First, our relevant
documents set is relatively small. The total number of relevant documents in our
subset of the iSearch test collection qrels is 964, of which there are 863 distinct
documents. Though that averages to 17 (964/57) relevant documents for each
topic, more than half of topics (29) has only 7 or fewer documents judged as
being relevant. This may contribute to the underperformance in binned
estimation of document priors. Second, our current approach is totally independent
to content features, only considering the citation dimension. A better approach
may be to combine citation features with content features or to use document
priors in a query dependent manner. Third, performance of document priors
may depend on the type of search tasks or queries. We need to do query by
query analysis and comparison of the document priors performance.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion and Future Directions</title>
      <p>In this paper, we explored ways of integrating citation and co-citation
analysis results into language model modeling framework as document priors. We test
three types of document priors with various ways of estimating them. The overall
experiment results do not suggest signi cant improvements over no prior baseline
run. In the future, we plan to test document priors with other bibliographic
entities such as authors and journals, and to investigate how to e ectively combining
di erent kinds of bibliometric-based priors to enhance IR.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements References</title>
      <p>We appreciate the iSearch team for sharing the iSearch dataset and the helpful
comments from the reviewers.
23. H. D. White and K. W. McCain. Visualizing a discipline: An author co-citation
analysis of information science, 1972-1995. Journal of the American Society for
Information Science, 49(4):327{355, 1998.
24. Howard D. White. Combining bibliometrics, information retrieval, and relevance
theory, part 1: First examples of a synthesis. Journal of the American Society for
Information Science and Technology, 58(4):536{559, 2007.
25. Xiaoshi Yin, Jimmy Xiangji Huang, and Zhoujun Li. Mining and modeling linkage
information from citation context for improving biomedical literature retrieval.</p>
      <p>Information Processing &amp; Management, 47(1):53{67, January 2011.
26. ChengXiang Zhai and John La erty. A study of smoothing methods for language
models applied to information retrieval. ACM Trans. Inf. Syst., 22(2):179{214,
April 2004.
27. Yun Zhou and W. Bruce Croft. Document quality models for web ad hoc retrieval.</p>
      <p>In CIKM '05 Proceedings of the 14th ACM international conference on Information
and knowledge management, pages 331{332, Bremen, Germany, 2005. ACM.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Roi</given-names>
            <surname>Blanco</surname>
          </string-name>
          and
          <string-name>
            <given-names>Alvaro</given-names>
            <surname>Barreiro</surname>
          </string-name>
          .
          <article-title>Probabilistic document length priors for language models</article-title>
          . In Craig Macdonald, Iadh Ounis, Vassilis Plachouras, Ian Ruthven, and Ryen W. White, editors,
          <source>Advances in Information Retrieval, number 4956 in Lecture Notes in Computer Science</source>
          , pages
          <volume>394</volume>
          {
          <fpage>405</fpage>
          . Springer Berlin Heidelberg,
          <year>January 2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Chris</given-names>
            <surname>Buckley and Ellen M. Voorhees</surname>
          </string-name>
          .
          <article-title>Retrieval evaluation with incomplete information</article-title>
          .
          <source>In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , SIGIR '
          <volume>04</volume>
          , page
          <volume>25</volume>
          {
          <fpage>32</fpage>
          , New York, NY, USA,
          <year>2004</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Nick</given-names>
            <surname>Craswell</surname>
          </string-name>
          , Stephen Robertson, Hugo Zaragoza, and
          <string-name>
            <given-names>Michael</given-names>
            <surname>Taylor</surname>
          </string-name>
          .
          <article-title>Relevance weighting for query independent evidence</article-title>
          .
          <source>In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '05</source>
          , pages
          <fpage>416</fpage>
          {
          <fpage>423</fpage>
          , New York, NY, USA,
          <year>2005</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>W. Bruce</given-names>
            <surname>Croft</surname>
          </string-name>
          and
          <string-name>
            <surname>John D.</surname>
          </string-name>
          <article-title>La erty</article-title>
          .
          <source>Language Modeling for Information Retrieval</source>
          , volume
          <volume>13</volume>
          <source>of The Information Retrieval Series</source>
          .
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>I.S.</given-names>
            <surname>Dhillon</surname>
          </string-name>
          , Yuqiang Guan, and
          <string-name>
            <given-names>B.</given-names>
            <surname>Kulis</surname>
          </string-name>
          .
          <article-title>Weighted graph cuts without eigenvectors a multilevel approach</article-title>
          .
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          ,
          <volume>29</volume>
          (
          <issue>11</issue>
          ):
          <year>1944</year>
          {
          <year>1957</year>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Eugene</surname>
          </string-name>
          <article-title>Gar eld</article-title>
          .
          <article-title>Citation indexes for science</article-title>
          .
          <source>Science</source>
          ,
          <volume>122</volume>
          :
          <fpage>108</fpage>
          {
          <fpage>111</fpage>
          ,
          <year>1955</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Tamara</given-names>
            <surname>Heck</surname>
          </string-name>
          and
          <string-name>
            <given-names>Philipp</given-names>
            <surname>Schaer</surname>
          </string-name>
          .
          <article-title>Performing informetric analysis on information retrieval test collections: Preliminary experiments in the physics domain</article-title>
          .
          <source>In 14th International Society of Scientometrics and Informetrics Conference ISSI</source>
          , volume
          <volume>2</volume>
          , pages
          <fpage>1392</fpage>
          {
          <fpage>1400</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Kalervo</given-names>
            <surname>Ja</surname>
          </string-name>
          <article-title>rvelin and Jaana Kekalainen. Cumulated gain-based evaluation of IR techniques</article-title>
          .
          <source>ACM Trans. Inf</source>
          . Syst.,
          <volume>20</volume>
          (
          <issue>4</issue>
          ):
          <volume>422</volume>
          {
          <fpage>446</fpage>
          ,
          <year>October 2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Wessel</given-names>
            <surname>Kraaij</surname>
          </string-name>
          , Thijs Westerveld, and
          <string-name>
            <given-names>Djoerd</given-names>
            <surname>Hiemstra</surname>
          </string-name>
          .
          <article-title>The importance of prior probabilities for entry page search</article-title>
          .
          <source>In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '02</source>
          , pages
          <fpage>27</fpage>
          {
          <fpage>34</fpage>
          , New York, NY, USA,
          <year>2002</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>Birger</given-names>
            <surname>Larsen</surname>
          </string-name>
          .
          <article-title>References and citations in automatic indexing and retrieval systems : experiments with the boomerang e ect</article-title>
          .
          <source>PhD thesis</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Christina</surname>
            <given-names>Lioma</given-names>
          </string-name>
          , Alok Kothari, and
          <string-name>
            <given-names>Hinrich</given-names>
            <surname>Schuetze</surname>
          </string-name>
          .
          <article-title>Sense discrimination for physics retrieval</article-title>
          .
          <source>In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, SIGIR '11</source>
          , pages
          <fpage>1101</fpage>
          {
          <fpage>1102</fpage>
          , New York, NY, USA,
          <year>2011</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Marianne</surname>
            <given-names>Lykke</given-names>
          </string-name>
          , Birger Larsen, Haakon Lund, and
          <string-name>
            <given-names>Peter</given-names>
            <surname>Ingwersen</surname>
          </string-name>
          .
          <article-title>Developing a test collection for the evaluation of integrated search</article-title>
          . In Cathal Gurrin, Yulan He, Gabriella Kazai, Udo Kruschwitz, Suzanne Little, Thomas Roelleke, Stefan Ruger, and Keith van Rijsbergen, editors,
          <source>Advances in Information Retrieval, number 5993 in Lecture Notes in Computer Science</source>
          , pages
          <volume>627</volume>
          {
          <fpage>630</fpage>
          . Springer Berlin Heidelberg,
          <year>January 2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>Philipp</given-names>
            <surname>Mayr</surname>
          </string-name>
          and
          <string-name>
            <given-names>Peter</given-names>
            <surname>Mutschke</surname>
          </string-name>
          .
          <article-title>Bibliometric-enhanced retrieval models for big scholarly information systems</article-title>
          .
          <source>In IEEE International Conference on Big Data (IEEE BigData</source>
          <year>2013</year>
          ).
          <source>Workshop on Scholarly Big Data: Challenges and Ideas</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>Edgar</given-names>
            <surname>Meij</surname>
          </string-name>
          and Maarten de Rijke.
          <article-title>Using prior information derived from citations in literature search</article-title>
          . In Large Scale Semantic Access to Content (Text, Image, Video, and Sound),
          <source>RIAO '07</source>
          , pages
          <fpage>665</fpage>
          {
          <fpage>670</fpage>
          , Paris, France, France,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Peter</surname>
            <given-names>Mutschke</given-names>
          </string-name>
          , Philipp Mayr, Philipp Schaer, and York Sure.
          <article-title>Science models as value-added services for scholarly information systems</article-title>
          . Scientometrics,
          <volume>89</volume>
          (
          <issue>1</issue>
          ):
          <volume>349</volume>
          {
          <fpage>364</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16. Muhammad Ali Norozi,
          <string-name>
            <surname>Arjen P de Vries</surname>
            , and
            <given-names>Paavo</given-names>
          </string-name>
          <string-name>
            <surname>Arvola</surname>
          </string-name>
          .
          <article-title>Contextualization from the bibliographic structure</article-title>
          .
          <source>In Proc. of the ECIR 2012 Workshop on TaskBased and Aggregated Search (TBAS2012), page 9</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Lawrence</surname>
            <given-names>Page</given-names>
          </string-name>
          , Sergey Brin, Rajeev Motwani, and
          <string-name>
            <given-names>Terry</given-names>
            <surname>Winograd</surname>
          </string-name>
          .
          <article-title>The PageRank citation ranking: Bringing order to the web</article-title>
          .
          <source>Technical Report 1999-0120</source>
          , Computer Science Department, Standford University,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Jie</surname>
            <given-names>Peng</given-names>
          </string-name>
          , Craig Macdonald, Ben He, and
          <string-name>
            <given-names>Iadh</given-names>
            <surname>Ounis</surname>
          </string-name>
          .
          <article-title>Combination of document priors in web information retrieval. In Large Scale Semantic Access to Content (Text, Image, Video,</article-title>
          and Sound),
          <source>RIAO '07</source>
          , pages
          <fpage>596</fpage>
          {
          <fpage>611</fpage>
          , Paris, France, France,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <given-names>Gerard</given-names>
            <surname>Salton</surname>
          </string-name>
          .
          <article-title>Associative document retrieval techniques using bibliographic information</article-title>
          .
          <source>J. ACM</source>
          ,
          <volume>10</volume>
          (
          <issue>4</issue>
          ):
          <volume>440</volume>
          {
          <fpage>457</fpage>
          ,
          <year>October 1963</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <given-names>H.</given-names>
            <surname>Small</surname>
          </string-name>
          and
          <string-name>
            <surname>B. C.</surname>
          </string-name>
          <article-title>Gri th. The structure of scienti c literatures i: Identifying and graphing specialties</article-title>
          .
          <source>Science studies</source>
          , pages
          <volume>17</volume>
          {
          <fpage>40</fpage>
          ,
          <year>1974</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Linda</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Smith.</surname>
          </string-name>
          <article-title>Citation analysis</article-title>
          .
          <source>Library Trends</source>
          ,
          <volume>30</volume>
          (
          <issue>1</issue>
          ):
          <volume>83</volume>
          {
          <fpage>106</fpage>
          ,
          <year>1981</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Diana</surname>
            Ransgaard S rensen, Toine Bogers, and
            <given-names>Birger</given-names>
          </string-name>
          <string-name>
            <surname>Larsen</surname>
          </string-name>
          .
          <article-title>An exploration of retrieval-enhancing methods for integrated search in a digital library</article-title>
          .
          <source>In TBAS 2012: ECIR Workshop on Task-based and Aggregated Search</source>
          , pages
          <fpage>4</fpage>
          <issue>{8</issue>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>