<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Bag of Works Retrieval: TF*IDF Weighting of Co-cited Works</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>College of Computing</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Informatics Drexel University</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Philadelphia PA</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>USA whitehd@drexel.edu</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <fpage>63</fpage>
      <lpage>72</lpage>
      <abstract>
        <p>Although it is not presently possible in any system, the style of retrieval described here combines familiar components-co-citation linkages of documents and TF*IDF weighting of terms-in a novel way that could be implemented in citation-enhanced digital libraries of the future. Rather than entering keywords, the user enters a string identifying a work, called a seed, to retrieve the strings identifying other works that are co-cited with the seed. Each of the latter is part of a “bag of works,” and it presumably has both a co-citation count with the seed and an overall citation count in the database. These two counts can be plugged into a standard formula for TF*IDF weighting such that all the co-cited items can be ranked for relevance to the seed. The result is analogous to, but different from, traditional “bag of words” retrieval. Certain properties of the ranking are illustrated with the top and bottom items co-cited with a classic paper by Marcia J. Bates, “The design of browsing and berrypicking techniques for the online search interface.” However, the properties apply to bag of works retrievals in general and have implications for users (e.g., humanities scholars, domain analysts) that go beyond any one example.</p>
      </abstract>
      <kwd-group>
        <kwd>Co-citation</kwd>
        <kwd>relevance ranking</kwd>
        <kwd>seed documents</kwd>
        <kwd>models of users</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Howard D. White</title>
      <p>College of Computing and Informatics</p>
      <p>Drexel University, Philadelphia PA, USA
1</p>
      <p>Introduction
ment is predicted to be. A seed like the string above can retrieve the other strings
cocited with it, regardless of the natural language they contain or how indexers have
described them.</p>
      <p>IDF: In standard topical retrieval, the IDF factor weights non-stopped words in the
database progressively lower as the number of documents containing them increases,
because words used very frequently are relatively poor discriminators of subject
matter. In bag of works retrieval, IDF functions in the same way but is interpreted
differently. The raw DF scores are the total citation counts for documents in the database.
The higher the DF count, the more well-known and widely used a document is, and
the greater its breadth of implication and general applicability. IDF, which inverts the
DF count, favors works that are narrowly and specifically related to the seed over
widely used works that are more broadly and generally related.</p>
      <p>TF*IDF: The formula here uses base-10 logs, and N is estimated with a rounded
count of records in the database. For any co-cited document string:</p>
      <p>Relevance to the seed = (1 + logTF) * (log(N/DF))</p>
      <p>Weighted in this way, a bag of works retrieval differs in important respects from
typical retrievals in IR. Its properties include:
• All retrieved items are relevant to the seed in varying degrees by empirical
cocitation evidence. Such evidence from multiple co-citing authors is stronger than
the usual gold standard for relevance judgments, the verdict of a single assessor.</p>
      <p>
        Thus, any item is of potential interest to a domain-literate user.
• Retrieved items may be topically similar to the seed, but need not be.
• Since seeds merely imply topical content, their semantic relations with retrieved
items will be more various and less predictable than those obtained by
algorithmic word-matching or query-expansion based on it. Yet when spelled out as full
references, all retrievals have the following broadly predictable structure
        <xref ref-type="bibr" rid="ref17 ref18">(White
2010, 2011)</xref>
        :
• A substantial segment of top-ranked items will be easy to relate to the seed in
global topic (or sometimes in authorship).
• The relevance of items to the seed in global topic will be progressively less easy
to see over the whole retrieval, as evidenced by the decreasing coherence of
content indicators such as terms from titles and abstracts.
• A substantial segment of bottom-ranked items (those with the lowest TF*IDF
weights) will be relatively difficult to relate to the seed’s global topic at first
glance because of their generality.
      </p>
      <p>In citation databases, algorithms take a seed document as input and return the
documents that cite it, a linkage known as direct citation. The documents in this
retrieved set—call it Set A—are by default ranked high to low by their own citation
counts (in GS) or by recency of publication (in WoS and Scopus). However, the
direct-citation relationship does not allow the documents in Set A to be ranked by their
relevance to the seed, because each simply lists the seed once among its references,
and so its score with respect to the seed is always one. All citing documents thus
appear equally relevant to it.</p>
      <p>By contrast, the documents co-cited with the seed can be ranked for relevance to
it, because their co-citation counts vary and can be treated as relevance scores. This
requires the further step of retrieving the co-cited documents as Set B. Suppose the
seed is the 1990 book edited by Christine Borgman, Scholarly Communication and
Bibliometrics, and that it is cited in an article by Olle Persson in Set A. When the
book is paired with each of the nine other items in Persson’s references, each pair has
a co-citation count of one. But over all the papers in Set A, many of these pairs would
be co-cited more than once. For example, at this writing the M. M. Kessler paper that
introduced bibliographic coupling in 1963 has a count of seven with Borgman’s book,
because seven documents in Set A have cited both it and the book in their references.
It is these varying co-citation counts that are plugged into the TF factor of the TF*IDF
formula in bag of works retrieval. In this case, the formula would be used to rank the
relevance of documents in Set B to Scholarly Communication and Bibliometrics.</p>
      <p>
        Paradigmatic IR researchers have delved into co-citation retrieval rather seldom.
Birger Larsen, who reviewed the matter in his dissertation (2004: 49-50), concluded:
“Although relatively straightforward to carry out online as demonstrated, e.g., by
Chapman and Subramanyam (1981) co-citation search...does not seem to have
received much attention for retrieval. Instead co-citation has been used extensively for
mapping the structure of research fields...” Since he wrote, there has not been a great
deal of change. Insofar as cited references are used in IR, the tendency is to use the
direct citation relationship in query expansion to augment topical retrievals. The
cocitation relationship does make an appearance in proposed systems for recommending
papers to cite
        <xref ref-type="bibr" rid="ref10 ref13 ref3 ref7">(McNee et al. 2002, Strohman et al. 2006, Huang et al. 2012, Beel et al.
2015)</xref>
        , since acts of co-citation leave traces like those exploited in better-known
recommender systems, such as co-purchasing in Amazon or co-renting in Netflix.
      </p>
      <p>With respect to operational systems, CiteSeerx automatically returns a small (and
opaque) selection of the titles co-cited with a seed document, but it is the exception.
In the Web of Science, Scopus, and Google Scholar, no co-citation retrievals of any
kind are possible. For 20 years they could be carried out in the Thomson Reuters
databases on DialogClassic, but that service has been defunct since 2013. Ironically,
Thomson Reuters created what is now the Web of Science in the home of co-citation
analysis (ISI, the Institute for Scientific Information), yet the Basic Search panel in
WoS is designed mainly for retrievals by topic, author, journal, or characteristics of a
single work. The secondary Cited Reference Search panel in WoS is designed to take
authors or single works as input and find the items that have cited them. These
capabilities are indispensable, of course, but valuable possibilities remain.
2</p>
      <sec id="sec-1-1">
        <title>Example</title>
        <p>Carevic and Schaer (2014) used the iSearch test collection in physics to experiment
with bag of works retrieval as presented in White (2010). In iSearch, documents come
with both cited references and assessors’ relevance ratings on a four-point scale. The
authors were looking for overlaps between the documents pre-scored by assessors as
relevant to a topic and the documents retrieved by TF*IDF-weighted co-citation. This
proved not feasible because the co-citation counts they found in iSearch were small.
But in examples from two search topics, the top-ranked co-cited documents did
cohere with seed documents in their title terms. The present paper further illustrates bag
of works retrieval with more robust co-citation data gathered in 2013 from Thomson
Reuters citation databases on DialogClassic. The intent is not to evaluate the method,
but merely to present some aspects of TF*IDF-weighted co-citation not covered in
Carevic and Schaer (2014) or elsewhere in paradigmatic IR sources.</p>
        <p>Bates (1989) is actually cited in 279 documents in Set A, but the most common
version of the identifying string is cited in 264, and so that count is used here for
simplicity. The others are minor variants cited at most a few times each. Fragmented ID
strings that affect counts are a long-standing problem in citation databases.</p>
        <p>
          Table 2 displays some specimen calculations for high-end and low-end Bates data.
(Over the full dataset, these scores form a lognormal distribution, and the items shown
take the extreme values in the positive and negative tails.) The documents are ranked
by TF*IDF score. Here, the top TF*IDF weights do not much alter the ranking
produced by the raw TF counts, but large changes in rank can occur
          <xref ref-type="bibr" rid="ref17">(see White 2010)</xref>
          .
        </p>
        <p>In bag of works retrieval, relevance varies directly with the TF factor and
inversely with the IDF factor. TF*IDF weighting thus elevates works whose co-citation
counts (TF) with the seed are high relative to their overall citation counts (DF). The
cognitive effect is that high-ranked works in the distribution tend to be easy to relate
to the seed because their verbal associations are highly specific to it. This can be seen
at even the most superficial level, as in Table 3, where strings representing the top 12
items are spelled out as titles. (Books are in italicized title case.) The 12 works are all
rather old, but they deal with principles of design that are relatively timeless, and,
taken together, they cohere nicely for someone interested in what Bates’s paper
connotes. A researcher familiar with this area could readily discern a common theme—
something like “psychological and behavioral factors in designing user-oriented
interfaces for online document retrieval.” The titles express the theme with considerable
variety, but that is a recurrent feature of co-citation retrieval, which captures citers’
implicit understanding of connections in ways that keyword matching and expansion
do not. Co-citation ties also cause thematically salient authors to recur. For example,
Table 3 has two more papers by Bates and three by Nicholas J. Belkin.
TF*IDF</p>
        <sec id="sec-1-1-1">
          <title>Sole or First Author, Date, and Title of Co-cited Work</title>
          <p>BATES MJ, 1989, The design of browsing and berrypicking
techniques for the on-line search interface [seed]</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>ELLIS D, 1989, A behavioural approach to information retrieval design BATES MJ, 1990, Where should the person stop and the information search interface start?</title>
      <p>BELKIN NJ, 1982, ASK for information retrieval. Part 1.
KUHLTHAU CC, 1991, Inside the search process: Information
seeking from the user's perspective
BELKIN NJ, 1995, Cases, scripts and information seeking strategies:
Design of interactive information retrieval systems
MARCHIONINI G, 1995, Information Seeking in Electronic
Environments
BELKIN NJ, 1993, BRAQUE: Design of an interface to support user
interaction in information retrieval
COVE JF, 1988, Online text retrieval via browsing</p>
    </sec>
    <sec id="sec-3">
      <title>BATES MJ, 1979, Information search tactics</title>
      <p>INGWERSEN P, 1992, Information Retrieval Interaction
BELKIN NJ, 1980, Anomalous states of knowledge as a basis for
information retrieval
TAYLOR RS, 1968, Question negotiation and information seeking in
libraries</p>
      <p>At the same time, the TF*IDF weighting lowers the ranks of works whose overall
citation counts (DF) are high relative to their co-citation (TF) counts with the seed.
These latter works tend to be harder to relate to the seed because their associations
with it are much less specific. The promotion of specific terms and demotion of
nonspecific terms is exactly what Karen Sparck Jones (1972) intended the IDF factor to
do when she invented it—she called it “statistical specificity”—except that she and
virtually everyone since have used IDF weighting on words rather than works. Yet on
word-blind strings denoting works IDF performs no less well.
TF*IDF
4.9
4.87
4.87
4.85
4.8
4.74
4.73
4.67
4.62
4.32
4.16
4.02</p>
      <sec id="sec-3-1">
        <title>Sole or First Author, Date, and Title of Co-cited Work</title>
        <p>DAVIS FD, 1989, Perceived usefulness, perceived ease of use, and
user acceptance of information technology
GLASER BG, 1967, The Discovery of Grounded Theory</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>SIMON HA, 1955, A behavioral model of rational choice</title>
      <p>PUTNAM RD, 1995, Bowling Alone: America's Declining Social
Capital
STRAUSS A, 1998, Basics of Qualitative Research</p>
    </sec>
    <sec id="sec-5">
      <title>GRANOVETTER MS, 1973, The strength of weak ties</title>
      <p>GIDDENS A, 1984, The Constitution of Society: Outline of the Theory
of Structuration
GARFINKEL H, 1967, Studies in Ethnomethodology
PATTON MQ, 1990, Qualitative Evaluation and Research Methods
LINCOLN YS, 1985, Naturalistic Inquiry
LAVE J, 1991, Situated Learning: Legitimate Peripheral Participation
KUHN TS, 1970, The Structure of Scientific Revolutions</p>
      <p>Table 4 has the tail end of the 706 items in the Bates distribution. They tend to be
famous theoretical or methodological items, mostly books, that are relevant to many
research specialties. It is here that bag of works retrieval most clearly departs from
what is customary in IR. It is hard to imagine typical assessors of relevance in
TRECstyle IR experiments marking any of the works in Table 4 as relevant to the Bates
“berrypicking” paper (assuming they were presented). Yet each has been co-cited
with it at least three times.</p>
      <p>
        Granted, they may be related to the seed only very distantly in their local contexts
of citation. One predictor is how widely they are separated from it in body text.
        <xref ref-type="bibr" rid="ref6">(The
effects of such “citation windows” have been examined by several researchers; see,
e.g., Eto 2013)</xref>
        . But they do co-occur with it in the global context set by the citing
paper and thus bear consideration. If nothing else, they show connections that might
never occur to someone who retrieved only works that are closely and obviously
related to the seed. On that ground, a researcher or teacher examining the intellectual
world of Bates’s paper might find them valuable—perhaps even more so than closely
similar works. Authors of seed papers are themselves candidates for such information.
      </p>
      <p>To illustrate, Marcia Bates read an earlier draft of the present paper. Extracts from
her comments (personal communication, February 2016) include: “I think someone
studying the intellectual development of a field could use your approach to great
effect. I find the end-of-the-list co-cited papers to be a really intriguing set. First, it says
something about what kind of research/philosophical point of view co-exists with my
writing. Also, though there is some overlap in the thinking among the writers, they
represent some significant differences in philosophy that make them possibly distinct
theory streams.” She goes on to speculate why various end-of-the-list works appear,
concluding that it is “not accidental that most of the last items are methodological.”</p>
      <p>TF and IDF weights have been applied to ranked co-citation data before in White
(2007a,b, 2009, 2010, 2014). These papers provide a number of detailed examples
and extensive theoretical background. In White (2014) two historians comment like
Bates on items retrieved by seeds they themselves supplied. They found the retrievals
to be readily intelligible and could see a place for them in humanities scholarship.
3</p>
      <sec id="sec-5-1">
        <title>Discussion</title>
        <p>It seems an unwritten rule in IR that knowledge of works should not be presumed.
The default assumption is that users will represent their interests through topical terms
because that is what they routinely submit. Using a document as one’s search term
requires domain knowledge of the sort possessed only by certain text-oriented
scientists and scholars. It moreover requires familiarity with the conventions of citation
databases, which even learned researchers may lack. When Larsen (2004) built an
experimental retrieval system that included direct citation linkages, he explicitly
designed it so that users would not need a document to initiate retrieval; instead, seed
documents were generated automatically from an initial subject search.</p>
        <p>Note, then, that topical terms can function just like works in retrieving co-cited
items. For example, one or more topical terms can retrieve Set A as full records from
WoS; from those, software external to WoS can extract Set B. That is how data for
maps of co-cited works or authors are now generated. Yet it may still be the case that:
• The user can represent an interest through at least one seed document in addition
to topical terms. Many thousands of people possess enough domain expertise to
do this and thus might find uses for bag of works retrievals.
• The user can represent an interest only through one or more seed documents.</p>
        <p>Suppose, for instance, one wants to explore Bates’s “berrypicking” idea at length;
how can her metaphor be transferred to non-metaphorical contexts? With bag of
works retrieval, the question answers itself, as the titles in Table 3 show.
• The user’s interest is the seed document itself. Here, the user is not conducting a
conventional literature search but seeking information on the seed document’s
use by citers over time. This possibility differs strikingly from the model of users
in paradigmatic IR and, once again, bag of works retrieval is pertinent.</p>
        <p>Paradigmatic IR systems are designed for users who know “needs” rather than
documents, and whose needs are met mainly by documents hitherto unknown. This
design accommodates both non-specialists and scientists who read primarily to have
their questions answered and not because of an interest in documents as texts per se.
As Bates (1996) points out, the typical scientist wants to keep up with relevant
research findings but frequently does so through an interpersonal network well before
they are published; the actual literature is regarded as archival, and many
contributions to it may go unread. In marked contrast, the typical humanities scholar’s
research is centered on texts as ends in themselves, to be mastered in all their unique
particulars. Bates’s empirical data show that humanists already know the literature in
their specialties so well that they are surprised if a literature search turns up even a
few new items. However, bag of works retrievals for such persons could reveal
something new: how citers have received and contextualized known works.</p>
        <p>Take, for example, Virginia Woolf’s Mrs. Dalloway as a seed in Arts and
Humanities Citation Index. One might expect that the items top-ranked with it would be
studies of Woolf and of that novel. Not so; down much of the distribution, the
majority of items are writings by Woolf herself. (The same is true of another Woolf novel,
Orlando.) The items pushed to lower ranks by the IDF factor include such
“costudied” works as Ulysses, The Sound and the Fury, and The Waste Land. Obviously
the relevance of these works to the seed is not topical, but part of the history of
scholarship on it. Bag of works retrieval thus in a small way supports intellectual history.</p>
        <p>In this regard, bag of works retrieval bears on citation-based domain analysis.
Domain analysts can often name one or more documents that initiated a particular line
of research. Given well-chosen “foundational” seeds, Set A and Set B are both
significant portrayals of a domain. Set A may contain one or more of the domain’s research
fronts—clusters of relatively recent documents that define emerging research areas.
Set B, which includes the seed, is the domain’s intellectual base—older documents
that have proved widely useful within a particular paradigm. So bag of works retrieval
can in some cases also be understood as intellectual base retrieval. Because every
document in Set B is ranked for relevance to the seed, thresholds can be set for
extracting the most important documents in the base, as evidenced by their citedness.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bates</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          <article-title>The design of browsing and berrypicking techniques for the online search interface</article-title>
          .
          <source>Online Review</source>
          ,
          <volume>13</volume>
          ,
          <issue>5</issue>
          ,
          <fpage>407</fpage>
          -
          <lpage>424</lpage>
          (
          <year>1989</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bates</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          <article-title>Document familiarity, relevance</article-title>
          , and
          <source>Bradford's Law: The Getty Online Searching Project Report No. 5. Information Processing &amp; Management</source>
          <volume>32</volume>
          ,
          <issue>6</issue>
          ,
          <fpage>697</fpage>
          -
          <lpage>707</lpage>
          (
          <year>1996</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Beel</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , et al.
          <article-title>Research paper recommender systems: A literature survey</article-title>
          .
          <source>International Journal on Digital Libraries</source>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>34</lpage>
          (published online
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Carevic</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schaer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <article-title>On the connection between citation-based and topical relevance ranking: Results of a pretest using iSearch</article-title>
          .
          <source>Proceedings of the First Workshop on Bibliometric-enhanced Information Retrieval</source>
          , pp.
          <fpage>37</fpage>
          -
          <lpage>44</lpage>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Chapman</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Subramanyam</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <article-title>Cocitation search strategy</article-title>
          .
          <source>National Online Meeting: Proceedings</source>
          , pp.
          <fpage>97</fpage>
          -
          <lpage>102</lpage>
          . Medford, NJ: Learned Information (
          <year>1981</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Eto</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>Evaluations of context-based co-citation searching</article-title>
          .
          <source>Scientometrics</source>
          <volume>94</volume>
          ,
          <issue>2</issue>
          ,
          <fpage>651</fpage>
          -
          <lpage>673</lpage>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          , et al.
          <article-title>Recommending citations: Translating papers into references</article-title>
          .
          <source>Proceedings of the 21st International Conference on Information and Knowledge Management</source>
          , pp.
          <fpage>1910</fpage>
          -
          <lpage>1914</lpage>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Larsen</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <article-title>References and citations in automatic indexing and retrieval systems: Experiments with the boomerang effect</article-title>
          .
          <source>PhD dissertation</source>
          ,
          <source>Royal School of Library and Information Science</source>
          , Copenhagen, Denmark (
          <year>2004</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schütze</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <article-title>Foundations of statistical natural language processing</article-title>
          . MIT Press, Cambridge, Massachusetts (
          <year>1999</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>McNee</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , et al.
          <article-title>On the recommending of citations for research papers</article-title>
          .
          <source>Proceedings of the ACM Conference on Computer Supported Cooperative Work</source>
          , pp.
          <fpage>116</fpage>
          -
          <lpage>125</lpage>
          (
          <year>2002</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Smucker</surname>
            ,
            <given-names>M.D.</given-names>
          </string-name>
          <article-title>Evaluation of find-similar with simulation and network analysis</article-title>
          .
          <source>PhD dissertation</source>
          , University of Massachusetts Amherst (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>Sparck</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.</surname>
          </string-name>
          <article-title>A statistical interpretation of term specificity and its application in retrieval</article-title>
          .
          <source>Journal of Documentation 28</source>
          ,
          <issue>1</issue>
          ,
          <fpage>11</fpage>
          -
          <lpage>21</lpage>
          (
          <year>1972</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Strohman</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Croft</surname>
            ,
            <given-names>B.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jensen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <article-title>Recommending citations for academic papers</article-title>
          .
          <source>Technical Report IR466</source>
          , Center for Intelligent Information Retrieval, University of Massachusetts Amherst (
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>White</surname>
            ,
            <given-names>H.D.</given-names>
          </string-name>
          <article-title>Combining bibliometrics, information retrieval, and relevance theory, Part 1: First examples of a synthesis</article-title>
          .
          <source>Journal of the American Society for Information Science and Technology 58</source>
          ,
          <issue>4</issue>
          ,
          <fpage>536</fpage>
          -
          <lpage>559</lpage>
          (
          <year>2007a</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>White</surname>
            ,
            <given-names>H.D.</given-names>
          </string-name>
          <article-title>Combining bibliometrics, information retrieval, and relevance theory, Part 2: Some implications for information science</article-title>
          ,
          <source>Journal of the American Society for Information Science and Technology 58</source>
          ,
          <issue>4</issue>
          ,
          <fpage>583</fpage>
          -
          <lpage>605</lpage>
          (
          <year>2007b</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>White</surname>
            ,
            <given-names>H.D.</given-names>
          </string-name>
          <article-title>Pennants for Strindberg and Persson</article-title>
          . In:
          <article-title>Celebrating Scholarly Communication Studies: A Festschrift for Olle Persson at His 60th Birthday. Special volume of the ENewsletter of the International Society for Scientometrics and Informetrics</article-title>
          , S-
          <volume>5</volume>
          ,
          <fpage>71</fpage>
          -
          <lpage>83</lpage>
          (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>White</surname>
            ,
            <given-names>H.D.</given-names>
          </string-name>
          <article-title>Some new tests of relevance theory in information science</article-title>
          .
          <source>Scientometrics</source>
          <volume>83</volume>
          ,
          <issue>3</issue>
          ,
          <fpage>653</fpage>
          -
          <lpage>667</lpage>
          (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>White</surname>
            ,
            <given-names>H.D.</given-names>
          </string-name>
          <article-title>Relevance theory and citations</article-title>
          .
          <source>Journal of Pragmatics</source>
          ,
          <volume>43</volume>
          ,
          <issue>14</issue>
          ,
          <fpage>3345</fpage>
          -
          <lpage>3361</lpage>
          (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>White</surname>
            ,
            <given-names>H.D.</given-names>
          </string-name>
          <article-title>Co-cited author retrieval and relevance theory: Examples from the humanities</article-title>
          .
          <source>Scientometrics</source>
          ,
          <volume>102</volume>
          ,
          <issue>3</issue>
          ,
          <fpage>2275</fpage>
          -
          <lpage>2299</lpage>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>