Editorial for the Bibliometric-enhanced Information
               Retrieval Workshop at ECIR 2014

            Philipp Mayr, Philipp Schaer, Andrea Scharnhorst, Peter Mutschke


1         Introduction

This first “Bibliometric-enhanced Information Retrieval” (BIR 2014) workshop1 aims
to engage with the IR community about possible links to bibliometrics and scholarly
communication [6]. Bibliometric techniques are not yet widely used to enhance re-
trieval processes in digital libraries, although they offer value-added effects for users.
To give an example, recent approaches have shown the possibilities of alternative
ranking methods based on citation analysis leading to an enhanced IR. In this work-
shop we will explore how statistical modelling of scholarship, such as Bradfordizing
or network analysis of co-authorship network, can improve retrieval services for spe-
cific communities, as well as for large, cross-domain collections. This workshop aims
to raise awareness of the missing link between information retrieval (IR) and biblio-
metrics / scientometrics and to create a common ground for the incorporation of bibli-
ometric-enhanced services into retrieval at the digital library interface. Our interests
include information retrieval, information seeking, science modelling, network analy-
sis, and digital libraries. The goal is to apply insights from bibliometrics, scientomet-
rics, and informetrics to concrete practical problems of information retrieval and
browsing.
    Retrieval evaluations have shown that simple text-based retrieval methods scale up
well but do not progress. Traditional retrieval has reached a high level in terms of
measures like precision and recall, but scientists and scholars still face challenges
present since the early days of digital libraries: mismatches between search terms and
indexing terms, overload from result sets that are too large and complex, and the
drawbacks of text-based relevance rankings. Such analyses have revealed not only the
fundamental laws of Bradford and Lotka, but also network structures and dynamic
mechanisms in scientific production. Statistical models of scholarly activities are
increasingly used to evaluate specialties, to forecast and discover research trends, and
to shape science policy. Their use as tools in navigating scientific information in pub-
lic digital libraries is a promising but still relatively new development. We will ex-
plore how statistical modelling of scholarship can improve retrieval services for spe-
cific communities, as well as for large, cross-domain collections. Some of these tech-
niques are already used in working systems but not well integrated in larger scholarly
IR environments.
    The availability of new IR test collections that contain citation and bibliographic
information like the iSearch collection 2 could deliver enough ground to interest
1
    http://www.gesis.org/en/events/conferences/ecirworkshop2014/
2
    http://itlab.dbit.dk/~isearch/
(again) the IR community in these kind of bibliographic systems. The long-term re-
search goal is to develop and evaluate new approaches based on informetrics and
bibliometrics.
   The aim of this workshop is to bring together researchers from different domains,
such as information retrieval, information seeking, science modelling, bibliometrics,
scientometrics, network analysis, and digital libraries to move toward a deeper under-
standing of this research challenge. In the following we will outline the six papers of
the workshop in the sequence of presentation.


2      Overview of the papers

Since bibliographic studies enabled the systematic study of citations, researchers have
debated about the meaning of citations. The analysis of citations has revealed mean-
ingful traces of knowledge diffusion in scholarly communication based on large scale
analysis. This does not take away that for every reference made in a text, the reason
for such a reference can be very different. It can be a reference to a body of work
fundamental for the argument made in this paper, or indicating other related work
with which this paper engages complementary, continuing or debating. Linguistic
analysis of the context (the textual neighborhood) of a citation has been conducted to
determine the sentiment of a citation. The paper of Bertin and Atanassova [2] belongs
to those studies, which try to further unravel the riddle of meaning of citations. The
authors analyse the word use in standard parts of articles - such as Introduction,
Methods, Results and Discussion, and reveal interesting distributions of the use of
verbs for those sections. The authors propose to use this work in future citation classi-
fier, which in the long-term might be implemented also in citation-based information
retrieval.
    Nees Jan van Eck and Ludo Waltman [4] consider the problem of scientific litera-
ture search, and suggest that citation relations between publications can be a helpful
instrument in the systematic retrieval process of scientific literature. They introduce a
new software tool called CitNetExplorer that can be used for citation-based scientific
literature retrieval. To demonstrate the use of CitNetExplorer, they employ the tool to
identify publications dealing with the topic of “community detection in networks”.
They argue that their approach can be especially helpful in situations in which one
needs a comprehensive overview of the literature on a certain research topic, for in-
stance in the preparation of a review article.
    Muhammad Kamran Abbasi and Ingo Frommholz [1] investigate the benefit of
combining polyrepresentation with document clustering. The goal is to provide the
search process by highly ranked polyrepresentative clusters. The principle of polyrep-
resentation in IR can be generally described as the increase of a document’s relevancy
if multiple representations are pointing to it. Given this, the authors argue that from
user perspective it seems more suitable to present clusters of documents relevant to
the same representation instead of presenting ranked lists of search results. The ap-
proach proposed therefore is to provide the user with a ranked list of documents ap-
pearing in the “best” cluster first, i.e. the cluster of documents providing the most
cognitive overlap of different representations. The authors applied clustering to in-
formation need as well as to document-based polyrepresentation. The evaluation of
the model on the basis of the iSearch collection shows some potential of the approach
to improve retrieval quality, but also some dependency from the number of relevant
documents.
   Haozhen Zhao and Xiaohua Hu [7] explore the effect of including citation and co-
citation information as document prior probabilities for relevancy on retrieval quality.
As document priors a paper's citation count, its PageRank and its co-citation cluster is
used. The paper provides an evaluation of the approach on the basis of the iSearch
collection, however indicating a limited effect of applying document priors based on
citation counts, PageRank and co-citation clusters of retrieval performance. The au-
thors conclude that using document priors in a more query dependent manner and
combining citation features with content features might lead to a greater effect.
   Zeljko Carevic and Philipp Schaer [3] examined the iSearch test collection and the
available citation information included in this collection. Unlike iSearch common IR
test collections don’t included all available information to do proper evaluations in the
field of citation-based rankings. The main goal of this work is to learn about the con-
nection between citation-based and topical relevance rankings and the suitability of
iSearch to work on this task. The paper at hand is a pretest for this overall research
question and analyses the dataset and it’s suitability for citation analysis. Furthermore
they investigated on co-citation recommendations based on topical relevant seed doc-
uments.
   Kris Jack, Pablo López-García, Maya Hristakeva and Roman Kern [5] present a
work on how to increase the number of citations to support claims in Wikipedia. They
analyse the distribution of more than 9 million citations in Wikipedia and found that
more than 400,000 times an explicit marker for a needed citation is present. To over-
come this situation they propose different techniques based on journal productivity
(Bradfordizing) and popularity (number of readers in Mendeley) to implement a cita-
tion recommending system. The evaluation is carried out using the Mendeley corpus
with 100 million documents and 10 topics. Although this paper is just a case study it
can be clearly seen that a normal keyword-based search engine like Google Scholar is
not sufficient to be used to provide citation recommendation for Wikipedia articles
and that altmetrics like readership information can improve retrieval and recommen-
dation performance.


3         Outlook

After the ISSI workshop “Combining Bibliometrics and Information Retrieval” 3 we
aimed with the BIR workshop for a dissemination strategy oriented towards core-IR
which is the reason why we located this workshop at ECIR. The variety of papers we
received and the small subset we could accept for this workshop show the different
ways of combining bibliometrics and IR and show the mutual benefits the two disci-


3
    http://www.gesis.org/en/events/conferences/issiworkshop2013/
plines can offer each other. We hope to bring both disciplines more closer together
and start a sequence of explorations, visions, results documented in scholarly dis-
course, and set up new material for a sustainable bridge between bibliometrics and IR.


References

1. Abbasi, M.K., Frommholz, I.: Exploiting Information Needs and Bibliographics
   for Polyrepresentative Document Clustering. Proceedings of the First Workshop
   on Bibliometric-enhanced Information Retrieval. pp. 21–28 , Amsterdam, The
   Netherlands (2014).
2. Bertin, M., Atanassova, I.: A Study of Lexical Distribution in Citation Contexts
   through the IMRaD Standard. Proceedings of the First Workshop on Bibliometric-
   enhanced Information Retrieval. pp. 5–12 , Amsterdam, The Netherlands (2014).
3. Carevic, Z., Schaer, P.: On the Connection Between Citation-based and Topical
   Relevance Ranking: Results of a Pretest using iSearch. Proceedings of the First
   Workshop on Bibliometric-enhanced Information Retrieval. pp. 37–44 , Amster-
   dam, The Netherlands (2014).
4. Van Eck, N.J., Waltman, L.: Systematic retrieval of scientific literature based on
   citation relations: Introducing the CitNetExplorer tool. Proceedings of the First
   Workshop on Bibliometric-enhanced Information Retrieval. pp. 13–20 , Amster-
   dam, The Netherlands (2014).
5. Jack, K. et al.: {{citation needed}}: Filling in Wikipedia’s Citation Shaped Holes.
   Proceedings of the First Workshop on Bibliometric-enhanced Information Retriev-
   al. pp. 45–52 , Amsterdam, The Netherlands (2014).
6. Mayr, P. et al.: Bibliometric-enhanced Information Retrieval. In: et al. de Rijke,
   M. (ed.) 36th European Conference on IR Research, ECIR 2014, Amsterdam, The
   Netherlands, April 13-16, 2014. pp. 798–801 Springer International Publishing
   (2014).
7. Zhao, H., Hu, X.: Language Model Document Priors based on Citation and Co-
   citation Analysis. Proceedings of the First Workshop on Bibliometric-enhanced
   Information Retrieval. pp. 29–36 , Amsterdam, The Netherlands (2014).