=Paper=
{{Paper
|id=Vol-2831/paper17
|storemode=property
|title=Importance Assessment in Scholarly Networks
|pdfUrl=https://ceur-ws.org/Vol-2831/paper17.pdf
|volume=Vol-2831
|authors=Saurav Manchanda,George Karypis
|dblpUrl=https://dblp.org/rec/conf/aaai/ManchandaK21
}}
==Importance Assessment in Scholarly Networks==
<pdf width="1500px">https://ceur-ws.org/Vol-2831/paper17.pdf</pdf>
<pre>
                               Importance Assessment in Scholarly Networks
                                           Saurav Manchanda and George Karypis
                                                University of Minnesota, Twin Cities, USA
                                                    {manch043, karypis}@umn.edu


                            Abstract                                   etc. (Aguinis et al. 2012)). However, most of these tradi-
                                                                       tional metrics, such as citation counts and h-index treat all
  We present approaches to estimate content-aware bibliomet-
  rics to quantitatively measure the scholarly impact of a publi-      citations and publications equally, and do not take into ac-
  cation. Traditional measures to assess quality-related aspects       count the content of the publications and the context in
  such as citation counts and h-index, do not take into account        which a prior scholarly work was cited. Another related
  the content of the publications, which limits their ability to       line of work, such as PageRank (Page et al. 1999) and
  provide rigorous quality-related metrics and can significantly       HITS (Kleinberg 1999) takes the node centrality into con-
  skew the results. Our proposed metric, denoted by Content In-        sideration (as a proxy for publication influence), but still op-
  formed Index (CII), uses the content of the paper as a source        erate in an content-agnostic manner. These content-agnostic
  of distant-supervision, to weight the edges of a citation net-       metrics fail to reliably measure the scholarly impact of an
  work. These content-aware weights quantify the information           article as they do not differentiate between the possible rea-
  in the citation i.e., these weights quantify the extent to which
                                                                       sons a scholarly work is being cited. Being content-agnostic,
  the cited-node informs the citing-node. The weights convert
  the original unweighted citation network to a weighted one.          these metrics can be easily manipulated by the presence of
  Consequently, this weighted network can be used to derive            malicious entities, such as publication venues indulging in
  impact metrics for the various entities involved, like the pub-      self-citations, which leads to high impact factor, or a group
  lications, authors etc. We evaluate the weights estimated by         of scholars citing each others’ work. For example, Journal
  our approach on three manually annotated datasets, where             Citation Reports (JCR)1 routinely suppresses many journals
  the annotations quantify the extent of information in the cita-      that indulge in citation stacking, a practice where the review-
  tion. Particularly, we evaluate how well the ranking imposed         ers and journal editors pressure authors to cite papers that ei-
  by our approach associates with the ranking imposed by the           ther they wrote or that are published in “their” journal. Thus,
  manual annotations. The proposed approach achieves up to             there is a need to establish content-aware metrics to accu-
  103% improvement in performance as compared to second
                                                                       rately and quantitatively measure various innovation-related
  best performing approach.
                                                                       aspects such as their significance, novelty, impact, and mar-
                                                                       ket value. Such metrics are essential for ensuring that SET-
                     1     Introduction                                driven innovations will play an ever more significant role in
Scientific, engineering, and technological (SET) innovations           the future.
have been the drivers behind many of the significant pos-                 In this paper, we propose machine-learning-driven ap-
itive advances in our modern economy, society, and life.               proaches, that automatically estimate the weights of the
To measure various impact-related aspects of these innova-             edges in a citation network, such that edges with higher
tions various quantitative metrics have been developed and             weights correspond to higher-impact citations. There has
deployed. These metrics play an important role as they are             been considerable effort in the past to identify important
used to influence how resources are allocated, assess the per-         citations (Valenzuela, Ha, and Etzioni 2015; Jurgens et al.
formance of personnel, identify intellectual property (IP)-            2018; Cohan et al. 2019). These approaches treat this task
related takeover targets, value a company’s intangible assets          as a supervised text-classification problem, and thus, require
(IP is such an asset), and identify strategic and/or emerging          the availability of training data with ground truth annota-
competitors.                                                           tions. However, generating such labeled data is difficult and
   Citation networks of peered-reviewed scholarly publi-               time consuming, especially when the meaning of the labels
cations (e.g., journal/conference articles and patents) have           is user-defined. In contrast, our approaches are distant su-
widely been used and studied in order to derive such metrics           pervised, that require no manual annotation. The proposed
for the various entities involved (e.g., articles, researchers,        approaches leverage the readily available content of the pa-
institutions, companies, journals, conferences, countries,             pers as a source of distant-supervision. Specifically, we for-
Copyright © 2021for this paper by its authors. Use permitted under
                                                                            1
Creative Commons License Attribution 4.0 International (CC BY                 http://help.incites.clarivate.com/incitesLiveJCR/JCRGroup/
4.0).                                                                  titleSuppressions.html
mulate the problem as how well the linear combination of           life sciences and biomedical topics, and Astrophysics Data
the representations of the cited publication explains the rep-     System8 which covers astronomy and physics.
resentation of the citing publication. The weights in this
linear-combination quantify the extent to which the cited-         Citation recommendation
publication informs the citing-publication. We evaluate the
weights estimated by our approach on three manually an-            Citation recommendation describes the task of recommend-
notated datasets, where the annotations quantify the extent        ing citations for a given text. It is an essential task, as
of information in the citation. Particularly, we evaluate how      all claims written by the authors need to be backed up
well the ranking imposed by our approach associates with           in order to ensure reliability and truthfulness. The ap-
the ranking imposed by the manual annotations. The pro-            proaches developed for citation recommendation can be
posed approach achieves up to 103% improvement in per-             grouped into 4 groups as follows(Färber and Jatowt 2020):
formance as compared to second best performing approach.           hand-crafted feature based approaches, topic-modelling
   While our discussion and evaluation focuses on iden-            based approaches, machine-translation based approaches,
tifying informing citations, our approach is not restricted        and neural-network based approaches. Hand-crafted feature
to this domain, and can be used to derive impact metrics           based approaches are based on features are are manually
for the various involved entities. For example, the content-       engineered by the developers. For example, text similarity
aware weights estimated by the proposed approach convert           between the citation context and the candidate papers can
the original unweighted citation network to a weighted one.        be used as one of the text-based features. Examples of pa-
Consequently, this weighted network can be used to derive          pers that propose hand-crafted feature based approaches in-
impact metrics for the various involved entities, like the pub-    clude (Färber and Jatowt 2020; He et al. 2011; LIU, YAN,
lications, authors etc. For example, to find the impact of         and YAN 2016; Livne et al. 2014; Rokach et al. 1978).
a publication, the sum of weights outgoing from its corre-         Topic modeling based approaches represent the candidate
sponding node can be used to quantify the impact of the            papers’ text and the citation contexts by means of abstract
publication, instead of using vanilla citation count.              topics, and thereby exploiting the latent semantic structure
   The reminder of the paper is organized as follows. Section      of texts. Examples of topic modeling based approaches in-
2 presents the related literature review. The paper discusses      clude (He et al. 2010; Kataria, Mitra, and Bhatia 2010).
the proposed method in Section 3 followed by the experi-           The machine-translation based approaches apply the idea
ments in Section 4. Section 5 discusses the results. Finally,      of translating the citation context into the cited document
Section 6 corresponds to the conclusions.                          to find the candidate-papers worth citing. Examples in this
                                                                   category include (He et al. 2012; Huang et al. 2012). Fi-
                                                                   nally, the popular examples of neural-network based mod-
                    2   Related Work                               els include (Ebesu and Fang 2017; Han et al. 2018; Huang
The research areas relevant to the work present in this paper      et al. 2015; Kobayashi, Shimbo, and Matsumoto 2018; Tang,
belong to citation indexing, citation recommendation, link         Wan, and Zhang 2014; Yin and Li 2017).
prediction approaches, distant-supervised credit attribution
approaches and citation-intent classification approaches. We       Link-prediction
briefly discuss these areas below:
                                                                   A link is a connection between two nodes in a network.
Citation Indexing                                                  As such, link-prediction is the problem of predicting the
A citation index indexes the links between publications that       existence of a link between two nodes in a network. A
authors make when they cite other publications. Citation in-       good link-prediction model predicts the likelihood of a link
dexes aim to improve the dissemination and retrieval of sci-       between two nodes, so it can not only be used to pre-
entific literature. CiteSeer (Giles, Bollacker, and Lawrence       dict new links, but to also curate the graph by filtering
1998; Li et al. 2006) is a first automated citation indexing       less-likely links that are already present. Thus, the link-
system that works by downloading publications from the             prediction can be a useful tool to find likely citations in
Web and converting them to text. It then parses the papers to      a citation network. The citation recommendation task de-
extract the citations and the context in which the citations are   scribed previously can be thought of as a special case of link-
made in the body of the paper, storing this information in a       prediction. Following the taxonomy described in (Martı́nez,
database. Other examples of popular citation indices include       Berzal, and Cubero 2016), link-prediction approaches can be
Google Scholar2 , Web of Science3 by Clarivate Analytics,          broadly categorized into three categories: similarity-based
Scopus4 by Elsevier and Semantic Scholar5 . Some examples          approaches, probabilistic and statistical approaches and al-
of subject-specific citation indices include INSPIRE-HEP6          gorithmic approaches. The similarity based approaches as-
which covers high energy physics, PubMed7 , which covers           sume that nodes tend to form links with other similar nodes,
                                                                   and that two nodes are similar if they are connected to simi-
   2
     https://scholar.google.com/                                   lar nodes or are near in the network according to a given sim-
   3
     http://www.webofknowledge.com/                                ilarity function. Examples of popular similarity functions
   4
     https://www.scopus.com/                                       include number of common neighbors (Liben-Nowell and
   5
     https://www.semanticscholar.org/                              Kleinberg 2007), Adamic-Adar index (Adamic and Adar
   6
     https://inspirehep.net/
   7                                                                  8
     https://pubmed.ncbi.nlm.nih.gov/                                     http://ads.harvard.edu/
                                                                         Paper 2             Paper 3              Paper 4
2003), etc. The probabilistic and statistical approaches as-
sume that the network has a known structure. These ap-
                                                                                                                                    Paper 1
proaches estimates the model parameters of the network                                                                            (Citing paper)

structure using statistical methods, and use these parame-
ters to calculate the likelihood of the presence of a link                                                                         ---[Paper 4]-
                                                                                                                                   ---[Paper 3]-
between two nodes. Examples of probabilistic and statis-                                                                           ---[Paper 2]-

tical approaches include (Guimerà and Sales-Pardo 2009;
                                                                                        Unit Normalization
Huang 2010; Wang, Satuluri, and Parthasarathy 2007). Al-
gorithmic approaches directly uses the link-prediction as su-
pervision to build the model. For example, link-prediction
task can be formulated as a binary classification task where
the positive instances are the pair of nodes which are con-                    Representation of the historical concepts
nected in the network, and negative instances are the uncon-                                  Minimize the explanation loss
nected nodes. Examples include (Menon and Elkan 2011;
Bliss et al. 2014). Unsupervised or self-supervised node em-      Figure 1: Overview of Content-Informed Index. Paper P1
bedding (such as DeepWalk (Perozzi, Al-Rfou, and Skiena           cites papers P2 , P3 and P4 . The weights w21 , w31 , and w41
2014), node2vec (Grover and Leskovec 2016)), followed by          quantifies the extent to which P2 , P3 and P4 informs P1 ,
training a binary classifier and Graph Neural network ap-         respectively. The function f is implemented as a Multilayer
proaches such as GraphSage (Hamilton, Ying, and Leskovec          Perceptron.
2017) belong to this category.

Distant-supervised credit-attribution                             driven approaches (Valenzuela, Ha, and Etzioni 2015; Jur-
Various distant-supervised approaches have been developed         gens et al. 2018; Cohan et al. 2019). Generating labeled data
for credit-attribution, but the prior have primarily focused on   for for these supervised approaches is difficult and time con-
text documents. A document may be associated with multi-          suming, especially when the meaning of the labels is user-
ple labels but all the labels do not apply with equal speci-      defined. In contrast, our approaches are distant supervised,
ficity to the individual parts of the documents. Credit at-       that require no manual annotation.
tribution problem refers to identifying the specificity of la-
bels to different parts of the document. Various probabilis-               3       Content-Informed Index (CII)
tic and neural-network based approaches have been devel-          In the absence of labels that define the impact, we assume
oped to address the credit-attribution problem, such as La-       that the extent to which a cited paper informs the citing pa-
beled Latent Dirichlet Allocation (LLDA) (Ramage et al.           per is an indication of the citation’s impact. Specifically, we
2009), Partially Labeled Dirichlet Allocation (PLDA) (Ra-         assume that each paper Pi can be represented as a set of con-
mage, Manning, and Dumais 2011), Multi-Label Topic                cepts Ci . Further, we assume that each paper Pi is build on
Model (MLTM) (Soleimani and Miller 2017), Segmentation            top of a set of historical concepts Hi , and its novelty Ni is
with Refinement (SEG-REFINE) (Manchanda and Karypis               the new set of concepts it proposes. The contribution of a
2018), and Credit Attribution with Attention (CAWA) (Man-         cited paper Pj towards the citing paper Pi is the set of con-
chanda and Karypis 2020).                                         cepts Cji = Cj ∩ Hi . In other terms, the set of concepts Ci
   Another line of work uses distant-supervised credit-           is given by:
attribution for query-understanding in product search. Ex-
                                                                               Ci = Ni ∪ Hi = Ni ∪ [∪Pi citesPj Cji ].
amples include, (i) using the reformulation logs as a source
of distant-supervision to estimate a weight for each term in      The task at hand is to quantify the extent to which Cji con-
the query that indicates the importance of the term towards       tributes towards Hi . To achieve this task, we look into the
expressing the query’s product intent (Manchanda, Sharma,         following directions:
and Karypis 2019a,b); and (ii) annotating individual terms         • How do we supervise the exercise? We minimize the
in a query with the corresponding intended product charac-           novelty of paper Pi , by trying to explain the concepts in
teristics, using the characteristics of the engaged products         paper Pi (denoted by Ci ) using the historical concepts,
as a source of distant-supervision (Manchanda, Sharma, and           i.e., the concepts of the papers it cites (Cj ). We call the
Karypis 2020).                                                       loss associated with this minimization as the explanation
                                                                     loss. This gives rise to the following optimization prob-
Citation-intent classification                                       lem:
There is a large body of work studying the intent of cita-
                                                                                       X                    X
                                                                              minimize     Ni = minimize        Ci − Hi .
tions and devising categorization systems. In general, these                                   i                              i
approaches treat citation-intent classification as a text clas-
sification problem, and require the availability of training        To proceed in this direction, we need to answer two ques-
data with ground truth annotations. Representative examples         tions, (i) How to represent the the set of concepts associ-
include rule based approaches (Pham and Hoffmann 2003;              ated with the paper Pi ?, and (ii) How do we represent the
Garzone and Mercer 2000) as well as machine-learning                set of historical concepts Hi ? As we show next, we use
  the textual content of the papers to estimate the represen-                The max-bound constraint (wji ≤ b) is introduced to limit
  tations of Ci and Hi . Thus, we formulate our problem as a                 the projection space of the weights wji . This is because,
  distant-supervised problem, and the content of the papers                  without this constraint, for a given citing paper Pi , if the
  acts as a source of distant-supervision.                                   set of weights wji minimize Equation (1), then so will any
• How to represent the set of concepts associated with                       scalar multiplication of the weights wji . This can potentially
  a paper? For simplicity, we represent the set of con-                      lead to the estimated weights being incomparable across
  cepts associated with a paper (Ci ) as a pretrained vec-                   different citing papers. Having a max bound on the esti-
  tor representation (embedding) of its abstract, such as                    mated weights helps avoid this scenario. To take care of
  Word2Vec (Mikolov et al. 2013), GloVe (Pennington,                         the constraints, the function f (·) can be implemented as a
  Socher, and Manning 2014), BERT (Devlin et al. 2018),                      L2−regularized multilayer perceptron, with a single output
  ELMo (Peters et al. 2018), etc. In this paper, we use the                  node, and a non-negative mapping at the output node. Not
  pretrained representations pretrained on scientific docu-                  that we do not explicitly set the max-bound b, but it is im-
  ments provided by ScispaCy (Neumann et al. 2019). The                      plicitly set by the L2 regularization of the weights of the
  representation of Ci is denoted by r(Ci ).                                 function f . The L2 regularization parameter is treated as
                                                                             a hyperparameter. Figure 1 shows an overview of Content-
• How do we represent the set of historical concepts Hi ?                    Informed Index (CII).
  As the set of historical concepts Hi is a union of the bor-
  rowed concepts from the cited papers (Cj ), we simply
  represent the set of historical concepts as a weighted lin-                          4    Experimental methodology
  ear combination of the representation of the concepts of                   Evaluation methodology and metrics
  the cited papers, i.e.,
                                 X                                           We need to evaluate how well the weights estimated by our
               r(Hi )       =          w̃ji r(Cj )                           proposed approach quantifies the extent to which a cited
                              Pi cites Pj                                    paper informs the citing paper. To this extent, we leverage
                                                                             various manually annotated datasets (explained later in Sec-
                                 X
                                              2
               subject to                   w̃ji =1
                              Pi cites Pj
                                                                             tion 4), where the annotations quantify the extent of infor-
                                                                             mation in the citation. The task inherently becomes an or-
                              w̃ji ≥ 0; ∀(i, j).                             dinal association, and we need to evaluate how well the
  We have the constrained norm condition (
                                                         P            2
                                                                    w̃ji =   ranking imposed by our proposed method associates with
                                                      Pi cites Pj            the ranking imposed by the manual annotations. As a mea-
  1) to make the representation of r(Hi ) agnostic to the                    sure of rank correlation, we use the non-parametric Somers’
  number of cited-papers (a paper can cite multiple papers                   Delta (Somers 1962) (denoted by ∆). Values of ∆ range
  to reference the same borrowed concepts).                                  from −1(100% negative association, or perfect inversion)
  The weights w̃ji can be thought of as normalized simi-                     to +1(100% positive association, or perfect agreement). A
  larity measure between the concepts of the cited paper,                    value of zero indicates the absence of association. Formally,
  and the citation context. Thus, to estimate w̃ji , we first                given a dependent variable (i.e., the predicted weights by our
  estimate unnormalized w̃ji , denoted by wji , and then nor-                model) and an independent variable (i.e., the manually anno-
  malize wji so as to have unit norm. The unnormalized                       tated ground truth), ∆ is the difference between the number
  weight wji is precisely the extent to which Cj contributes                 of concordant and discordant pairs, divided by the number
  towards Hi (and hence Ci ), i.e., the weight that we wish to               of pairs with independent variable values in the pair being
  estimate in this paper. We estimate wji as a multilayer per-               unequal.
  ceptron, that takes as input the representations of the cited              Relation of ∆ to other metrics: When the independent
  paper and the citation context. We use the representation                  variable has only two distinct classes (binary variable), the
  associated with the corresponding concepts as the repre-                   area under the receiver operating characteristic curve (AUC
  sentations of the cited papers (r(Cj )). Similar to r(Cj ),                ROC) statistic is equivalent to ∆ (Newson 2002). Thus, ∆
  we use the ScispaCy vector representation for the citation                 can also be visualized as a generalization of AUC ROC to-
  context as the representation of the context, and denote it                wards ordinal classification with multiple classes. Further,
  by r(j → i).                                                               as the dependent variable (the weights estimated by our pro-
The above discussion leads to the following formulation:                     posed approach) is real valued, having two tied values on the
                    X                X                                       independent variable is very difficult. Thus, for our case, ∆
      minimize         ||r(Ci ) −          w̃ji r(Cj )||2
          f                                                                  is equivalent to Goodman and Kruskal’s Gamma (Goodman
                     i             Pi cites Pj
                                                                             and Kruskal 1959, 1963, 1972, 1979), and just a scaled vari-
                                wji                                          ant of Kendall’s τ coefficient (Kendall 1938), with are other
      subject to w̃ji = r                        ; ∀(i, j),
                                            2
                                P
                                           wji                               popular measures of ordinal association.
                             Pi cites Pj                               (1)
                   wji = f (r(Cj ), r(Cji )); ∀(i, j),                       Baselines
                   wji ≥ 0; ∀(i, j),                                         We choose representative baselines from diverse categories
                   wji ≤ b; ∀(i, j).                                         as discussed below:
Link-prediction approaches: The citation weights that               citing paper. This assumption has also been used as a feature
we estimate in this paper can also looked from the link-            in prior supervised approaches (Valenzuela, Ha, and Etzioni
prediction perspective, i.e., assigning a score to every cita-      2015). The absolute frequency of referencing a cited-paper
tion (link) in the citation graph, the score portraying the like-   may provide a good signal regarding the information bor-
lihood of the existence of a link. Thus, the citations that are     rowed from the cited paper, when comparing with other pa-
noisy, i.e., the edges that do not make sense with the respect      pers being cited by the same citing paper. However, as the
to underlying link-prediction model get smaller weights. We         citation-behavior differs between papers, the absolute fre-
compare against two link-prediction methods, one based on           quency may not be comparable across different citing pa-
classic network embedding approach, and other belonging             pers. Thus, we also provide results after doing normaliza-
to recent Graph Neural Network (GNN) based approaches.              tion of the absolute frequency of the citation references for
• DeepWalk (Perozzi, Al-Rfou, and Skiena 2014): Deep-               each citing paper. We provide results for mean, max, and
  Walk is a popular method to learn node embeddings. Deep-          min normalization. Specifically, given a citation and the cor-
  Walk borrows ideas from language modeling and incor-              responding citing paper, the information weight for a cita-
  porates them with network concepts. Its main proposition          tion is calculated by dividing the number of references of
  is that linked nodes tend to be similar and they should           that citation, by the mean, max, and min of references of all
  have similar embeddings as well. Once we have node                the citations in that citing paper, respectively.
  embeddings as the output of DeepWalk, we train a bi-
  nary classifier, with the positive instances as the pairs of      Datasets
  nodes which are connected in the network, and negative in-        The Semantic Scholar Open Research Corpus (S2ORC):
  stances are the unconnected nodes (generated using nega-          The S2ORC (Lo et al. 2020) dataset is a citation graph of
  tive sampling). We provide results using two different clas-      81.1 million academic publications and 380.5 million cita-
  sifiers: Logistic Regression (denoted by DeepWalk+LR)             tion edges. We only consider the publications for which full-
  and Multilayer Perceptron (denoted by DeepWalk+MLP).              text is available and abstract contains at least 50 words. This
  Note that Deepwalk is a transductive model, and only con-         leaves us with a total of 5, 653, 297 papers, and 30, 533, 111
  siders the network topology, i.e., DeepWalk does not use          edges (citations).
  the content of the papers to estimate the model.
• GraphSage (Hamilton, Ying, and Leskovec 2017): Graph-             ACL-2015: The ACL-2015 (Valenzuela, Ha, and Etzioni
  SAGE is a Graph Concolutional Network (GCN) based                 2015) dataset contains 465 citations gathered from the ACL
  framework for inductive representation learning on large          anthology9 , represented as tuples of (cited paper, citing pa-
  graphs. GraphSage is trained with the link-prediction loss,       per), with ordinal labels ranging from 0 to 3, in increasing
  so we do not use a second step (as in DeepWalk) to train          order of importance. The citations were annotated by one ex-
  separate classifier. Note that, GraphSage is an inductive         pert, followed by annotation by another expert on a subset of
  model, so also considers the content of the papers in addi-       the dataset, to verify the inter-annotator agreement. We only
  tion to topology of the network to estimate the model.            use the citations for which we have the inter-annotator agree-
                                                                    ment, and the citations are present in the S2ORC dataset we
Text-similarity based baselines: We can think of the                described before. The selected dataset contains 300 citations
function f as a similarity measure between the cited pa-            among 316 unique publications. The total number of unique
per and the citation context. Thus, we consider the following       citing publications are 283 and the total number of unique
similarity measures as our baselines: We use the same pre-          cited publications are 38.
trained representations as we used as an input to CII, and
cosine similarity as the similarity measure, which is a popu-       ACL-ARC: The ACL-ARC (Jurgens et al. 2018) is a
lar similarity measure for text data.                               dataset of citation intents based on a sample of papers from
 • Similarity-Abstract-Context: Similarity between the cited        the ACL Anthology Reference Corpus (Bird et al. 2008)
   abstract and the citation context.                               and includes 1,941 citation instances from 186 papers and
 • Similarity-Context-Abstract: Similarity between the cit-         is annotated by domain experts. The dataset provides ACL
   ing abstract and the citation context.                           IDs for the papers in the ACL corpus, but does not provide
                                                                    an identifier to the papers outside the ACL corpus, mak-
 • Similarity-Abstract-Abstract: Similarity between the             ing it difficult to map many citations to the S2ORC cor-
   cited abstract and citing abstract.                              pus. However, it provided the titles of those papers, and
To calculate each of the above similarity measures, we use          we used these titles to map these papers to the papers in
the same pretrained representations as we used as an input to       the S2ORC dataset, if we found matching titles. The an-
CII, and cosine similarity as the similarity measure, which is      notations in ACL-ARC are provided at individual citation-
a popular similarity measure for text data. The baselines be-       context level, leading to multiple annotations for some of the
longing to this category can also be thought of as similarity-      (cited paper, citing paper) pair. If this is the case, we chose
based link prediction approaches.                                   the highest-informing annotation for such (cited paper, cit-
   In addition, we also consider another simple baseline, re-       ing paper) pairs. The selected dataset contains 460 citations
ferred to as Reference Frequency, where we assume that              among 547 unique publications. The total number of unique
more frequently the cited paper is referenced in the citing pa-
                                                                       9
per, the higher the chances of the cited paper informing the               https://www.aclweb.org/anthology/
                                            Table 1: Results on the Somers’ ∆ metric.

 Model                                                                          ACL-2015            ACL-ARC                SciCite
 Content-Informed Index (CII)                                             0.428 ± 0.013       0.308 ± 0.010       0.296 ± 0.006
 Ref. Frequency (Absolute)                                                  0.325 ± 0.000     0.308 ± 0.000         0.144 ± 0.000
 Ref. Frequency (Mean-normalized)                                           0.351 ± 0.000      0.300 ± 0.000        0.120 ± 0.000
 Ref. Frequency (Min-normalized)                                            0.321 ± 0.000      0.298 ± 0.000        0.145 ± 0.000
 Ref. Frequency (Max-normalized)                                            0.270 ± 0.000      0.172 ± 0.000        0.035 ± 0.000
 Similarity-Abstract-Abstract                                             −0.041 ± 0.000       0.091 ± 0.000      −0.003 ± 0.000
 Similarity-Abstract-Context                                              −0.147 ± 0.000       0.090 ± 0.000      −0.125 ± 0.000
 Similarity-Context-Abstract                                               0.013 ± 0.000      −0.062 ± 0.000      −0.202 ± 0.000
 Deepwalk+LR                                                              −0.071 ± 0.016        0.190 ± 0.006     −0.037 ± 0.018
 Deepwalk+MLP                                                             −0.026 ± 0.011        0.205 ± 0.024     −0.047 ± 0.015
 GraphSage                                                                 0.023 ± 0.045        0.132 ± 0.024      0.049 ± 0.019


citing publications are 145 and the total number of unique         respectively. We use the same network architecture for the
cited publications are 413.                                        MLP that we train on top of DeepWalk representations. We
                                                                   train the logistic regression and MLP parts of Deepwalk,
SciCite (Cohan et al. 2019) SciCite is a dataset of cita-
                                                                   GraphSage, and CII for a maximum of 50 epochs, and do
tion intents based on a sample of papers from the Semantic
                                                                   early-stopping if the validation performance does not im-
Scholar corpus10 , consisting of papers in general computer
                                                                   prove for 5 epochs. For GraphSage, we use the implemen-
science and medicine domains. Citation intent was labeled
                                                                   tation provided by DGL12 . We used mini-batch size of 1024
using crowdsourcing. The annotators were asked to identify
                                                                   for training the models.
the intent of a citation, and were directed to select among
three citation intent options: Method, Result/Comparison
and Background. This resulted in a total 9, 159 crowd-                           5      Results and discussion
sourced instances. We use the citations that are present in        Quantitative analysis
the S2ORC dataset we described before. Similar to ACL-
ARC, the annotations are provided at individual citation-          Table 1 shows the performance of the various approaches on
context level, leading to multiple annotations for some of the     the Somers’ Delta (∆) for each of the datasets ACL-2015,
(cited paper, citing paper) pair. For such cases, we chose the     ACL-ARC and SciCite. For ACL-2015 and SciCite, the pro-
highest-informing annotation for the (cited paper, citing pa-      posed approach CII outperforms the competing approaches;
per) pairs. The selected dataset contains 352 citations among      while for the ACL-ARC dataset, CII performs at par with
704 unique publications. There is no repeated citing or cited      the best performing approach. The improvement of CII over
publication in this dataset, thus, the total number of unique      the second best performing approach is 22% and 103%, on
citing publications as well as unique citing publications are      the ACL-2015 and SciCite datasets, respectively.
352 each.                                                             Interestingly, the simplest baseline, Reference-frequency
                                                                   and its normalized forms are the second best performing ap-
Parameter selection                                                proaches. While Reference-frequency performs at par with
We treat one of the evaluation datasets (ACL-ARC) as the           the CII on the ACL-ARC dataset, it does not perform as
validation set, and chose the hyperparameters of our ap-           good on the other two datasets. This can be attributed to
proaches and baselines with respect to best performance            the fact that the number of unique citing papers in ACL-
on this dataset. For DeepWalk, we use the implementation           ARC dataset are relatively small. Thus, many citations in
provided here11 , with the default parameters, except the di-      ACL-ARC are shared by the same citing paper, which is not
mensionality of the estimated representations, which is set        the case with the other two datasets. Thus, as mentioned in
to 200 (for the sake of fairness, as the used 200 dimen-           Section 4, absolute frequency of referencing a cited-paper
sional text representations for CII). For the models that re-      may provide a good signal regarding the information bor-
quire learning, i.e., the logistic regression part of Deepwalk,    rowed from the cited paper, when comparing with other pa-
MLP part of Deepwalk, GraphSage, and CII, we used the              pers being cited by the same citing paper. Further, even the
ADAM (Kingma and Ba 2015) optimizer, with initial learn-           normalized forms of the Reference-frequency lead to only
ing rate of 0.0001, and further use step learning rate sched-      marginal increase in performance for the ACL-2015 and
uler, by exponentially decaying the learning rate by a factor      SciCite datasets. Thus, the simple normalizations (such as
of 0.2 every epoch. We use L2 regularization of 0.0001. The        mean, max and min normalization used in this paper), are
function f in CII was implemented as a multilayer percep-          not sufficient to address the difference in citation-behavior
tron, with three hidden layers, with 256, 64, and 8 neurons,       that occurs between different papers.
  10                                                                  12
       https://www.semanticscholar.org/                                  https://github.com/dmlc/dgl/blob/master/examples/pytorch/
  11
       https://github.com/xgfs/deepwalk-c                          graphsage
   Furthermore, we observe that simple similarity based ap-
proaches, such as cosine-similarity between pairs of various
entities (each combination of citing abstract, citing abstract,
and citation-context) performs close to random scoring (∆
value of close to zero). This validates that the simple sim-
ilarity measures, like cosine similarity are not sufficient to
manifest the the information that a cited-paper lends to the
citing-paper; thus, showing the necessity of more expressive
approaches, like CII.
   In addition, the other learning-based link-prediction-
based approaches perform considerably worse than the sim-         Figure 2: Word-cloud (Frequently occurring words) that ap-
ple baseline reference-frequency. While on ACL-2015 and           pear in the citation context of the citations with the highest
SciCite datasets, they perform close to random scoring, the       predict importance weights.
performance on ACL-ARC dataset is better than the random
baseline.

Qualitative analysis
In order to understand the patterns that the proposed ap-
proach CII learns, we look into the data instances with the
highest and lowest predicted weights. As the function f
takes as input both the abstract of the cited paper and the ci-
tation context, the learnt patterns can be a complex function
of the cited paper abstract and the citation context. Thus, for
simplicity, we limit the discussion in this section to under-     Figure 3: Word-cloud (Frequently occurring words) that ap-
stand the linguistic patterns in the citation context, and how    pear in the citation context of the citations with the least pre-
these patterns associate with the weights predicted for them.     dict importance weights.
   In this direction, we select 10, 000 citation-contexts cor-
responding to citations with highest predicted weights, and
plot the word clouds for these contexts. We repeat the same                             6    Conclusion
exercise for the citation-contexts with the lowest predicted      In this paper, we presented approaches to estimate content-
weights. Figures 2 and 3 shows the wordclouds for the high-       aware bibliometrics to accurately quantitatively measure the
est weighted citations and lowest weighted citations, respec-     scholarly impact of a publication. Our distant-supervised
tively. These figures show some clear discriminatory pat-         approaches use the content of the publications to weight
ters between the highest-weighted and lowest-weighted ci-         the edges of a citation network, where the weights quantify
tations, that relate well with the information carried by a ci-   the extent to which the cited-publication informs the citing-
tation. For example, the words such as ‘used’ and ‘using’ are     publication. Experiments on the three manually annotated
very frequent in the citation contexts of the highest weighted    datasets show the advantage of using the proposed method
citations. This is expected, as such verbs provide a strong       on the competing approaches. Our work makes a step to-
signal that the cited work was indeed employed by the citing      wards developing content-aware bibliometrics, and envision
paper, and hence the cited paper informed the citing work.        that the proposed method will serve as a motivation to de-
Another interesting pattern in the highest weighted citations     velop other rigorous quality-related metrics.
is the presence of words like ‘fig’, ‘figure’ and ‘table’. Such
words are usually present when the authors present or de-                                   References
scribe important concepts, such as methods and results. As
such, citations in these important sections indicates that the    Adamic, L. A.; and Adar, E. 2003. Friends and neighbors on
cited work is used or extended in the citing paper, which         the web. Social networks 25(3): 211–230.
signals importance.                                               Aguinis, H.; Suárez-González, I.; Lannelongue, G.; and Joo,
   On the other hand, the wordcloud for the least weighted        H. 2012. Scholarly impact revisited. Academy of Manage-
citations (Figure 3) is dominated by weasle words such as         ment Perspectives 26(2): 105–132.
‘may’, ‘many’, ‘however’, etc. The words such as ‘many’
                                                                  Bird, S.; Dale, R.; Dorr, B. J.; Gibson, B.; Joseph, M. T.;
commonly occur in the related work section of the paper,
                                                                  Kan, M.-Y.; Lee, D.; Powley, B.; Radev, D. R.; and Tan,
where the paper presents some examples of other related
                                                                  Y. F. 2008. The acl anthology reference corpus: A reference
works to emphasize the problem that the citing paper is solv-
                                                                  dataset for bibliographic research in computational linguis-
ing. The words like ‘may’, ‘however’, ‘but’ etc are com-
                                                                  tics .
monly used to describe some limitation of the cited work.
Such citations are expected to be incidental, carrying less       Bliss, C. A.; Frank, M. R.; Danforth, C. M.; and Dodds, P. S.
information, as compared to other citations.                      2014. An evolutionary algorithm approach to link prediction
in dynamic social networks. Journal of Computational Sci-         He, J.; Nie, J.-Y.; Lu, Y.; and Zhao, W. X. 2012. Position-
ence 5(5): 750–764.                                               aligned translation model for citation recommendation. In
Cohan, A.; Ammar, W.; van Zuylen, M.; and Cady, F. 2019.          International symposium on string processing and informa-
Structural Scaffolds for Citation Intent Classification in Sci-   tion retrieval, 251–263. Springer.
entific Publications. In Proceedings of the 2019 Conference       He, Q.; Kifer, D.; Pei, J.; Mitra, P.; and Giles, C. L. 2011. Ci-
of the North American Chapter of the Association for Com-         tation recommendation without author supervision. In Pro-
putational Linguistics: Human Language Technologies, Vol-         ceedings of the fourth ACM international conference on Web
ume 1 (Long and Short Papers), 3586–3596.                         search and data mining, 755–764.
Devlin, J.; Chang, M.-W.; Lee, K.; and Toutanova, K. 2018.        He, Q.; Pei, J.; Kifer, D.; Mitra, P.; and Giles, L. 2010.
Bert: Pre-training of deep bidirectional transformers for lan-    Context-aware citation recommendation. In Proceedings of
guage understanding. arXiv preprint arXiv:1810.04805 .            the 19th international conference on World wide web, 421–
                                                                  430.
Ebesu, T.; and Fang, Y. 2017. Neural citation network for
context-aware citation recommendation. In Proceedings of          Huang, W.; Kataria, S.; Caragea, C.; Mitra, P.; Giles, C. L.;
the 40th international ACM SIGIR conference on research           and Rokach, L. 2012. Recommending citations: translating
and development in information retrieval, 1093–1096.              papers into references. In Proceedings of the 21st ACM in-
                                                                  ternational conference on Information and knowledge man-
Färber, M.; and Jatowt, A. 2020. Citation Recommendation:        agement, 1910–1914.
Approaches and Datasets. arXiv preprint arXiv:2002.06961          Huang, W.; Wu, Z.; Liang, C.; Mitra, P.; and Giles, C. L.
.                                                                 2015. A neural probabilistic model for context based cita-
Garzone, M.; and Mercer, R. E. 2000. Towards an automated         tion recommendation. In Twenty-ninth AAAI conference on
citation classifier. In Conference of the canadian society for    artificial intelligence.
computational studies of intelligence, 337–346. Springer.         Huang, Z. 2010. Link prediction based on graph topology:
Giles, C. L.; Bollacker, K. D.; and Lawrence, S. 1998. Cite-      The predictive value of generalized clustering coefficient.
Seer: An automatic citation indexing system. In Proceedings       Available at SSRN 1634014 .
of the third ACM conference on Digital libraries, 89–98.          Jurgens, D.; Kumar, S.; Hoover, R.; McFarland, D.; and Ju-
Goodman, L. A.; and Kruskal, W. H. 1959. Measures of as-          rafsky, D. 2018. Measuring the evolution of a scientific field
sociation for cross classifications. II: Further discussion and   through citation frames. Transactions of the Association for
references. Journal of the American Statistical Association       Computational Linguistics 6: 391–406.
54(285): 123–163.                                                 Kataria, S.; Mitra, P.; and Bhatia, S. 2010. Utilizing Context
Goodman, L. A.; and Kruskal, W. H. 1963. Measures of              in Generative Bayesian Models for Linked Corpus. In Aaai,
association for cross classifications III: Approximate sam-       volume 10, 1. Citeseer.
pling theory. Journal of the American Statistical Association     Kendall, M. G. 1938. A new measure of rank correlation.
58(302): 310–364.                                                 Biometrika 30(1/2): 81–93.
Goodman, L. A.; and Kruskal, W. H. 1972. Measures of              Kingma, D. P.; and Ba, J. 2015. Adam: A Method
association for cross classifications, IV: Simplification of      for Stochastic Optimization URL http://arxiv.org/abs/1412.
asymptotic variances. Journal of the American Statistical         6980.
Association 67(338): 415–421.                                     Kleinberg, J. M. 1999. Authoritative sources in a hyper-
Goodman, L. A.; and Kruskal, W. H. 1979. Measures of              linked environment. Journal of the ACM (JACM) 46(5):
association for cross classifications. In Measures of associ-     604–632.
ation for cross classifications, 2–34. Springer.                  Kobayashi, Y.; Shimbo, M.; and Matsumoto, Y. 2018. Cita-
                                                                  tion recommendation using distributed representation of dis-
Grover, A.; and Leskovec, J. 2016. node2vec: Scalable fea-        course facets in scientific articles. In Proceedings of the 18th
ture learning for networks. In Proceedings of the 22nd ACM        ACM/IEEE on joint conference on digital libraries, 243–
SIGKDD international conference on Knowledge discovery            251.
and data mining, 855–864.
                                                                  Li, H.; Councill, I.; Lee, W.-C.; and Giles, C. L. 2006. Cite-
Guimerà, R.; and Sales-Pardo, M. 2009. Missing and spu-          Seerx: an architecture and web service design for an aca-
rious interactions and the reconstruction of complex net-         demic document search engine. In Proceedings of the 15th
works. Proceedings of the National Academy of Sciences            international conference on World Wide Web, 883–884.
106(52): 22073–22078.
                                                                  Liben-Nowell, D.; and Kleinberg, J. 2007. The link-
Hamilton, W.; Ying, Z.; and Leskovec, J. 2017. Inductive          prediction problem for social networks. Journal of the Amer-
representation learning on large graphs. In Advances in neu-      ican society for information science and technology 58(7):
ral information processing systems, 1024–1034.                    1019–1031.
Han, J.; Song, Y.; Zhao, W. X.; Shi, S.; and Zhang, H. 2018.      LIU, Y.; YAN, R.; and YAN, H. 2016. Personalized Citation
hyperdoc2vec: Distributed representations of hypertext doc-       Recommendation Based on User’s Preference and Language
uments. arXiv preprint arXiv:1805.03793 .                         Model. Journal of Chinese Information Processing (2): 18.
Livne, A.; Gokuladas, V.; Teevan, J.; Dumais, S. T.; and        the 2014 conference on empirical methods in natural lan-
Adar, E. 2014. CiteSight: supporting contextual citation rec-   guage processing (EMNLP), 1532–1543.
ommendation using differential search. In Proceedings of        Perozzi, B.; Al-Rfou, R.; and Skiena, S. 2014. Deepwalk:
the 37th international ACM SIGIR conference on Research         Online learning of social representations. In Proceedings of
& development in information retrieval, 807–816.                the 20th ACM SIGKDD international conference on Knowl-
Lo, K.; Wang, L. L.; Neumann, M.; Kinney, R.; and Weld,         edge discovery and data mining, 701–710.
D. 2020. S2ORC: The Semantic Scholar Open Research              Peters, M. E.; Neumann, M.; Iyyer, M.; Gardner, M.; Clark,
Corpus. In Proceedings of the 58th Annual Meeting of            C.; Lee, K.; and Zettlemoyer, L. 2018. Deep contextualized
the Association for Computational Linguistics, 4969–4983.       word representations. arXiv preprint arXiv:1802.05365 .
Online: Association for Computational Linguistics. doi:
10.18653/v1/2020.acl-main.447. URL https://www.aclweb.          Pham, S. B.; and Hoffmann, A. 2003. A new approach for
org/anthology/2020.acl-main.447.                                scientific citation classification using cue phrases. In Aus-
                                                                tralasian Joint Conference on Artificial Intelligence, 759–
Manchanda, S.; and Karypis, G. 2018. Text segmentation          771. Springer.
on multilabel documents: A distant-supervised approach.
In 2018 IEEE International Conference on Data Mining            Ramage, D.; Hall, D.; Nallapati, R.; and Manning, C. D.
(ICDM), 1170–1175. IEEE.                                        2009. Labeled LDA: A supervised topic model for credit
                                                                attribution in multi-labeled corpora. In Proceedings of the
Manchanda, S.; and Karypis, G. 2020.         CAWA: An           2009 Conference on Empirical Methods in Natural Lan-
Attention-Network for Credit Attribution. In AAAI, 8472–        guage Processing: Volume 1-Volume 1, 248–256. Associa-
8479.                                                           tion for Computational Linguistics.
Manchanda, S.; Sharma, M.; and Karypis, G. 2019a. Intent        Ramage, D.; Manning, C. D.; and Dumais, S. 2011. Par-
term selection and refinement in e-commerce queries. arXiv      tially labeled topic models for interpretable text mining. In
preprint arXiv:1908.08564 .                                     Proceedings of the 17th ACM SIGKDD international con-
Manchanda, S.; Sharma, M.; and Karypis, G. 2019b. Intent        ference on Knowledge discovery and data mining, 457–465.
term weighting in e-commerce queries. In Proceedings of         ACM.
the 28th ACM International Conference on Information and        Rokach, L.; Mitra, P.; Kataria, S.; Huang, W.; and Giles,
Knowledge Management, 2345–2348.                                L. 1978. A supervised learning method for context-aware
Manchanda, S.; Sharma, M.; and Karypis, G. 2020. Distant-       citation recommendation in a large corpus. INVITED
Supervised Slot-Filling for E-Commerce Queries. arXiv           SPEAKER: Analyzing the Performance of Top-K Retrieval
preprint arXiv:2012.08134 .                                     Algorithms 1978.
Martı́nez, V.; Berzal, F.; and Cubero, J.-C. 2016. A survey     Soleimani, H.; and Miller, D. J. 2017. Semisupervised, mul-
of link prediction in complex networks. ACM computing           tilabel, multi-instance learning for structured data. Neural
surveys (CSUR) 49(4): 1–33.                                     computation 29(4): 1053–1102.
Menon, A. K.; and Elkan, C. 2011. Link prediction via ma-       Somers, R. H. 1962. A new asymmetric measure of asso-
trix factorization. In Joint european conference on machine     ciation for ordinal variables. American sociological review
learning and knowledge discovery in databases, 437–452.         799–811.
Springer.                                                       Tang, X.; Wan, X.; and Zhang, X. 2014. Cross-language
Mikolov, T.; Chen, K.; Corrado, G.; and Dean, J. 2013. Ef-      context-aware citation recommendation in scientific articles.
ficient estimation of word representations in vector space.     In Proceedings of the 37th international ACM SIGIR confer-
arXiv preprint arXiv:1301.3781 .                                ence on Research & development in information retrieval,
                                                                817–826.
Neumann, M.; King, D.; Beltagy, I.; and Ammar, W. 2019.
ScispaCy: Fast and Robust Models for Biomedical Natural         Valenzuela, M.; Ha, V.; and Etzioni, O. 2015. Identifying
Language Processing. In Proceedings of the 18th BioNLP          meaningful citations. In Workshops at the twenty-ninth AAAI
Workshop and Shared Task, 319–327. Florence, Italy: As-         conference on artificial intelligence.
sociation for Computational Linguistics. doi:10.18653/v1/       Wang, C.; Satuluri, V.; and Parthasarathy, S. 2007. Local
W19-5034. URL https://www.aclweb.org/anthology/W19-             probabilistic models for link prediction. In Seventh IEEE
5034.                                                           international conference on data mining (ICDM 2007), 322–
Newson, R. 2002. Parameters behind “nonparametric”              331. IEEE.
statistics: Kendall’s tau, Somers’ D and median differences.    Yin, J.; and Li, X. 2017. Personalized citation recommenda-
The Stata Journal 2(1): 45–64.                                  tion via convolutional neural networks. In Asia-Pacific web
                                                                (APWeb) and web-age information management (WAIM)
Page, L.; Brin, S.; Motwani, R.; and Winograd, T. 1999. The
                                                                joint conference on web and big data, 285–293. Springer.
PageRank citation ranking: Bringing order to the web. Tech-
nical report, Stanford InfoLab.
Pennington, J.; Socher, R.; and Manning, C. D. 2014. Glove:
Global vectors for word representation. In Proceedings of

</pre>