<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Mining Intellectual In uence Associations</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tejas Shah</string-name>
          <email>tejas.shah@research.iiit.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vikram Pudi</string-name>
          <email>vikram@iiit.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>International Institute of Information Technology - Hyderabad</institution>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <fpage>100</fpage>
      <lpage>111</lpage>
      <abstract>
        <p>Within the social system of science, citation practices characterize social functions like the conferral of recognition upon the work of others as well as the acknowledgement of one's intellectual debt. However, the structure of intellectual in uence is misrepresented when only the immediate citations and their cardinality are taken into consideration. Thus, in order to better understand the associative dissemination of in uence and approximately construe the anatomy of this structure, complex interactions in the convoluted network of authors and papers need to be probed. Our study aims at understanding these heterogeneous complex interactions. For the bibliographic dataset of authors and publications, we de ne proxy scores that attempt to determine the associative in uence of the cited author over the citing author. In order to harness structural connectivity of the network, we generate author vector representations using these in uence scores. Furthermore, with a view to assess the competence of our proposed scores, we evaluate these representations and provide an empirical study of the results obtained with our algorithm against the baseline and also present a qualitative analysis.</p>
      </abstract>
      <kwd-group>
        <kwd>Citation Network</kwd>
        <kwd>In uence</kwd>
        <kwd>Representation Learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION AND RELATED</title>
    </sec>
    <sec id="sec-2">
      <title>WORK</title>
      <p>
        The contribution of an author in the form of publications holds an intrinsic
value responsible for the e ective dissemination of knowledge. This knowledge
based relay that is extended by and for the scienti c community results into
establishment of conceptual relationships in the form of citations. Citations and
references operate within a jointly cognitive and moral framework designed to
provide the historical lineage of knowledge and to repay the intellectual debts
through their open acknowledgement [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ][
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Thus, citation analysis has the
potential in providing valuable insights about the social system of science spanning
across a wide range of topics.
      </p>
      <p>
        E ective digitization of bibliographic data has led to proliferation in the
propositions of various quantitative bibliometric performance indicators for
journals, papers, authors and institutions based on citation counts and graph-based
ranking algorithms [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ][
        <xref ref-type="bibr" rid="ref17">17</xref>
        ][
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. Studies dealing with in uence have examined
various aspects like topic level in uence strength, in uence propagation and its
indirect global e ect [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]; analysis of in uence evolution between communities [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ];
time dependent estimation of in uence for evaluating pairwise community
inuences [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. Besides being extensively explored in literature, the concept of
in uence has also surfaced in scholarly search engines such as Semantic Scholar.
Many of these approaches focus on in uence analysis based on individual entities
and their overall impact on the network structures. However, in uence
relationships and associations do not emerge inherently considering just the global in
uence. These in uence associations can be understood as the degree of in uence
between a pair of nodes within the network. Our work aims at studying these
pairwise in uence associations eventuated between authors within the scholarly
network. The main contributions of the paper are as follows:
{ We propose an algorithm that simulates the in uence between the citing and
cited authors and suggest in uence association scores.
{ Considering issues in quantifying and tracing of in uences that arise in
scholarly communication and to harness structural connectivity of the network,
we pro le authors and their interactions within the bibliographic network
using representation learning and the proposed in uence scores. These
representations thus form a generic result of the proposed in uence model and
their e ectiveness in context of the problem statement is discussed.
{ For assessment of predictive capacity of the aforementioned scores and the
thus obtained vector representations, experimental results subject to
classi cation tasks are discussed along with comparative study against those
obtained by measures such as citation counts.
2
2.1
      </p>
    </sec>
    <sec id="sec-3">
      <title>PROBLEM STATEMENT</title>
      <sec id="sec-3-1">
        <title>Problem Formulation</title>
        <p>Consider a publication pi written by m co-authors ai1 , ai2 , .., aim , which cites
a publication pj written by n co-authors aj1 , aj2 , .., ajn . Thus, a publication
citation network can be de ned as a directed graph GP = (VP ; EP ) constructed
from the list of references at the end of each publication, where VP represents
vertices (publications) and EP represents set of all directed edges between the
nodes denoting citations between publications. Consequently, an author citation
network GA = (VA; EA) is de ned by projecting this publication citation
network along the corresponding author(s) for each publication node in GP . For
instance, the citation link pi ! pj is projected between their authors
respectively, thus creating m n directed links from each of the m co-authors of the
citing publication to each of the n co-authors of the cited publication.
Accordingly, VA represents nodes (authors) and EA represents set of all such directed
links between the authors. Let the directed pairwise author citation link between
the citing author ai and the cited author aj be denoted as ai ! aj (8i = 1; ::; m
and 8j = 1; ::; n). In the discussions that follow, we de ne and aim to quantify
the associative intellectual in uence measure represented by I(ai; aj ) as the
degree to which author aj in uences author ai when a citation is made from ai to
aj .
2.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Issues and Caveats in Quantifying In uence</title>
        <p>
          Citation networks are complex networks in which causal structure exists along
the interactions between the nodes. However, consideration of just the citation
counts and primary degree of interactions (immediate neighbours) within the
citation network has its shortcomings as indicators for tracing intellectual
inuences [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]. Without a subjective survey of authors in conjunction with their
publications, it remains unknown as to what fraction of the work is cited that
was indeed in uential directly or indirectly and whether the references exist
which had no in uence of any kind yet cited due to other motivations. Such
complex interplay of multiple citer motivations have been empirically studied
and reported in previous studies as well [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ][
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
        <p>
          Possible classes of errors in tracing scholarly in uences include:
{ The inclusion (or exclusion) of a reference in a bibliography does not
completely indicate whether or not those references were directly or indirectly
in uential for the proposition of the publication.
{ Citing bias in favor of elite scientists or highly cited papers i.e. over-citation
described as the \Matthew e ect" [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
{ Under-citation of fundamental scienti c work is possibly noticed due to the
obliteration (of sources) by incorporation (OBI) in the established
knowledge [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
{ When a relevant piece of scienti c work is known through an intermediate
publication, the intermediate publication serves as an intermediate in uence.
        </p>
        <p>
          However, it may remain uncited.
{ Nature of citation types as they can be further categorized into organic
or perfunctory, evolutionary or juxtapositional, con rmative or negational
etc [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ].
        </p>
        <p>
          These certain and other such classes of errors exist within the citation data
due to under-inclusion and over-inclusion of references [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ][
          <xref ref-type="bibr" rid="ref10">10</xref>
          ][
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]. This makes
the task of tracing in uences more complex.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>ASSOCIATIVE INFLUENCE MEASURES</title>
      <p>We propose two associative in uence scores for e ectively capturing intellectual
in uence of the cited author over the citing author. The underlying principle
encapsulates the reasoning that the citations towards an in uential publication
of the cited author and those from the nest works of the citing author are
indeed signi cant. Further, the net in uential impact over a publication can
be fairly attributed as an aggregation of in uences by all the cited publications.
Considering citing author's temporal scholarly activity, associative in uences are
instantiated for each such citing publication wherein references are made to other
publications. Thus, for a particular author pair (ai; aj ), we consider each such
instance wherein ai cites aj i.e. all such publications authored by ai wherein
a citation has been made to a publication authored by aj . This instantiated
in uence forms a component for the integral associative in uence between the
author pair. Since the associative in uence is a directed mapping, the in uence
scores resemble the same notion of directedness. Thus, for a citation relationship
ai ! aj , the proposed in uence association score I(ai; aj ) represents the degree
of in uence cited author aj has over the citing author ai.
3.1</p>
      <sec id="sec-4-1">
        <title>Ranking Publications</title>
        <p>
          In order to capture the collective nature of scholarly in uence, publication-level
ranking is adopted (as opposed to researcher-level ranking). This mimics the
spread of intellectual in uence among researchers via their publications. Quite
a few studies [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ][
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] investigate extensively this key issue of scienti c credit
diffusion by dissecting the credit di usion mechanism underlying both researcher
level and paper level graph-ranking methods. Their ndings emphasize that
scienti c credit is fundamentally derived from citation information between papers
rather than the derived researcher network. Our model for the in uence
dissemination within the heterogeneous network of authors and papers thus avoids the
inaccurate allocation of scienti c credit among researchers that potentially arises
in graph-ranking methods.
        </p>
        <p>
          PageRank [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] takes into account the number and quality of links while
measuring the importance of entities within a network. Using PageRank over GP ,
the importance of publications is thus measured considering the number of
citations and reputation of the papers [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. For the publication level ranking, we
have:
        </p>
        <p>P R(pi) =
(1</p>
        <p>N
)
+</p>
        <p>
          X P R(pj )
j!i Tout(pj )
(1)
where j ! i implies paper pj referring paper pi, P R(pj ) denotes the PageRank
of paper pj and Tout(pj ) denotes the number of outbound links from paper pj .
With certain empirical studies, Chen et al. [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] showed that, scienti c papers
usually follow a shorter path of about average two links. This is in opposition
to six hyperlinks for the web considering the individual surfer illustration as
mentioned in the original study [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Accordingly, we set the damping factor to
0.5 for the purpose of our studies as well.
The basis for this score is that, citations from the signi cant works of the
citing author (ai) are indeed relevant, considering the intellectual and cognitive
in uences. The citing author's signi cant works can be regarded as those
scholarly works which have a relatively high PageRank among other works of the
same author. The instantiated associative in uence components resulting from
citing author's citation activity sum up to form the net score. Thus, we calculate
IS (ai; aj ) as follows:
IS(ai; aj) =
        </p>
        <p>P P R(pik)
k Tout(pik)
P R(pai )
8pik 2 pik ! paj
(2)
where, pik denotes the kth publication of the citing author ai wherein a
citation has been made to a publication authored by the cited author aj, P R(pik)
represents the PageRank of the citing publication pik, Tout(pik) denotes the
number of outbound links (references) from publication pik and P R(pai ) is the
normalization factor which is the average PageRank value of all the publications
authored by ai. The normalization factor accounts for the variance in the ranks
of citing publications. So, higher the normalized weight of each such component
(i.e. higher the normalized PageRank of citing publication) and higher the
cardinality of such components exchanged between ai and aj, higher is the associative
in uence.
The underlying notion for devising this score is to calculate a measure of the
extent to which the citing author tends to cite a set of authors weighted by
their in uential scholarly contributions in his/her publications. Extending the
in uence instantiation notion for ID, the association components of in uence
comprise of the PageRank of the cited publication and the inbound links of
that cited publication (distributing impact of a publication among the inbound
citation references). Thus, we calculate ID(ai; aj) as follows:</p>
        <p>ID(ai; aj) =</p>
        <p>P P R(pjk)
k Tin(pjk)
P R(paj )
8pjk 2 pai ! pjk
(3)
Here, pjk denotes the kth publication of the cited author aj wherein a citation
has been made from a publication authored by the citing author ai, P R(pjk)
represents the PageRank of the cited publication pjk, Tin(pjk) denotes the
number of inbound links to publication pjk and P R(paj ) is the normalization factor
which is the average PageRank value of the publications authored by aj. The
disparity in the ranks of cited publications is accounted for by this normalization
factor. So, more the normalized weight of each such association components and
higher the cardinality of such components exchanged between ai and aj, higher
is the associative in uence.
3.4</p>
      </sec>
      <sec id="sec-4-2">
        <title>Qualitative Analysis</title>
        <p>
          Author and publication ranks of prominent scientists and young researchers span
across a wide spectrum. For example, a highly cited publication of a prominent
scientist might be in uential but in a modest way to each individual researcher
in the scienti c community at large. Also, the nest works of a young researcher
possibly might have less inbound citations comparatively, however, the
substantial in uence of cited publications within such publications should not possibly
be overlooked. Considering the di erent degrees of author and publication ranks,
it can be seen from Equation (2) that signi cant works of young as well as even
mediocre researchers are instrumental in contributing towards highlighting the
cited author's in uence. Thus, irrespective of the cited author's prominence,
his/her in uence over the citing author is conspicuous and visibly pronounced.
Also, from Equation (3) it is evident that even scientists not belonging to the
higher order of ranks receive their due scholarly attribution pertinent to the their
respective notable scienti c studies. Such notions of relative author impact and
allocation of due credit persist across the spectrum for most of the researchers
belonging to di erent degrees of author ranks. Normalization of associative
inuence measures as illustrated in sections 3.2 and 3.3 account for such variances
along with taking into consideration the e ect of accumulated advantage (i.e.
Matthew e ect [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]).
        </p>
        <p>Associative pairwise in uences eventuated due to scholarly contributions over
time semantically imply that there is certain form of in uence of the cited
author over the citing author irrespective of the citation types. In this paper, we
focus on the existence and extent of conceptual relationships formed between
authors. However, the precise nature of in uence maybe hard to quantify without
the factual ontological citation representations. For alleviating issues concerning
under-inclusion of references (as discussed in Section 2.2), capturing the
network neighborhood and harnessing the structural connectivity, we pro le authors
and their interactions using representation learning and the proposed in uence
scores. This e ectively maps the latent features within the citation network into
a vector space.
4</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>AUTHOR REPRESENTATION LEARNING</title>
      <p>
        Recent work in language modeling and representation learning such as Word2Vec [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]
focuses on application of probabilistic neural networks which map words into
vector spaces. The author vector representations are learned with a similar intuition
as discussed in the following sections:
4.1
      </p>
      <sec id="sec-5-1">
        <title>Random Walk on Weighted Directed Network</title>
        <p>
          Using proposed in uence measures, we model weighted random walks over the
author citation network. These walks can be approximated as sentences in the
context of language modeling. Analogous to recent researches [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], it represents
a network as a \document". The motivation behind converting a graph into a
series of text documents is: Word frequency in a document corpus and the visited
node frequency during a random walk for a connected graph, both follow the
power law distribution [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ].
        </p>
        <p>We sample random walks over the weighted directed author citation network.
Considering the author citation network (GA) with n author nodes (a1, a2, . . . ,
an); wij represents the weight of the edge connecting nodes ai and aj where
the edge weight is the in uence score I(ai; aj) (as derived from Equation 2 or
Equation 3) for the author pair and therefore we have, wij = I(ai; aj). Since
the in uence scores are non-negative, we have wij 0. For the study of tracing
in uence associations, we prune self-citation loops and thus wii = 0. Now, for
a given source node ai, the transition probability that the author node aj is
chosen from the direct successors of ai is proportional to the in uence measure
I(ai; aj). This is computed as:</p>
        <p>I(ai; aj)
pai;aj = P I(ai; ak)
k
(4)
where pai;aj represents the transition probability. For each source author
node ai, we simulate a weighted directed random walk Wai . This sampling is
a stochastic process consisting of author nodes wa1i , wa2i , . . . , wani as random
variables such that waji is a vertex chosen with transitive probability pai;aj from
the direct successors of ai. In our experiments we set the length of these walks
to be xed. For each source vertex, the random walk generator samples author
nodes based on respective transition probabilities until a maximum length l (=
40) is reached. For the purpose of our study, we generate such weighted random
walks (= 15) times for each author.
4.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Representation Learning Framework</title>
        <p>
          Modelling social structures and relationships within networks can be aligned
with the optimization techniques used to model natural languages [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ][
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. With
the ordered sequence of nodes constructed using weighted random walks, we
learn representations using Skip-gram model. This represents authors and the
citation relationship shared between a pair of authors in an unsupervised manner.
Based on distributional hypothesis, these representations are latent features that
capture neighbourhood as well as structural in uences in the citation network
in a continuous low dimensional vector space. In e ect, these representations
encapsulate more information and relationships between authors than using just
the immediate citations.
        </p>
        <p>
          Skip-gram is a language model that maximizes the conditional co-occurence
probability of words occurring within a prede ned window [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. Thereby, we
have f : VA ! Rd as the mapping function from nodes to feature representations
that we aim to learn. Here d speci es the number of dimensions of the feature
representation and f represents a matrix of size jVAj d parameters. Now, we
try to optimize the likelihood function as formulated in Equation 5:
max
f
j=i+w
        </p>
        <p>X
j=i w;j6=i
log Pr(aj j f (ai))</p>
        <p>(5)
where w is the size of the window, ai 2 VA and Pr(aj j f (ai) is de ned by
the softmax function:</p>
        <p>Pr(aj j f (ai)) =</p>
        <p>exp(f (aj ) f (ai))
P exp(f (ak) f (ai))
k2V
(6)</p>
        <p>
          Skip-gram assumes that inside the context, all nodes are independent of each
other and are equally important. However, as seen from Equations 5 and 6,
update step per node is proportional to jVAj. This is computationally expensive
for large networks (such as the author citation network in consideration). We
approximate the optimization function using negative sampling [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
        </p>
        <p>Using the obtained resultant author vector representations, we validate the
e ectiveness of our proposed scores as discussed in the following sections.
5</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>PERFORMANCE MODEL</title>
      <p>
        Limitations in the assessment of intellectual and cognitive in uences prevail due
to its subjective nature. However, despite the shortcomings of citation data,
studies [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] assert that citations can be used as approximate proxy indicators of
in uence for the aggregates of authors and papers. Collectively, with measurable
factors and practical limitations of the study, it can be fairly argued that if a
publication proves to be relatively in uential in the scienti c work of an author,
then he/she is quite de nitive to have a higher relative citation ratio towards the
cited in uential authors. So, we utilize and bucket these relative citation ratios
for author pairs as labels for classifying the extent of in uence. Representations
obtained using in uence scores can be evaluated against baseline by a
comparative study of citation prediction and its extent between author pairs. In our
study, the purpose is to evaluate whether our proposed in uence measures
capture meaningful relationships. To do so, it su ces to test their relative capacity
in citation prediction, and the absolute predictive accuracy is not the criterion
being assessed.
5.1
      </p>
      <sec id="sec-6-1">
        <title>Validating In uence Associations</title>
        <p>
          To capture the semantic relatedness between the citing and cited author's
inuential relationship and in order predict weighted citation link between a pair
of authors a1 and a2, we generate edge representation e(a1; a2). This is done
by de ning a binary operator over the corresponding author feature vectors
f (a1) and f (a2). Similar strategies have been successfully used in earlier studies
for link prediction tasks [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. We de ne ith component of the edge
representation ei(a1; a2) as concatenation of author feature vector components denoted by
fi(a1) t fi(a2). e(a1; a2) spans across R2 D as the author feature vectors 2 RD
are concatenated.
        </p>
        <p>These edge representations are now further used in training and evaluation
for predicting the in uence between a pair of authors. However, predicting the
presence of link between the author pair in the testset is necessary but
insufcient to assess predictive capacity of the degree of in uence. For evaluating
the extent of in uence more rigorously, we extend this binary classi cation link
prediction evaluation to multi-label classi cation. Here, the edge representations
are classi ed against labels viz., Nil In uence (NI), Slightly In uenced (SI) and
Highly In uenced (HI) depending upon the relative citation ratio between the
pairs of authors in the training and testing sets respectively. Thus, for a pair
of authors ai and aj , e(ai; aj ) is classi ed based on the relative citation ratio
(cr(ai;aj)) which is computed as:
cr(ai;aj) =</p>
        <p>c(ai; aj )
P c(ai; ak)
k
(7)
where c(ai; aj ) represents number of citations from the citing author ai to
the cited author aj in the author citation network during that speci c temporal
segment. The calculated citation ratios are then mapped to aforementioned class
labels. If cr(ai;aj) 2 (0.0, ], then e(ai; aj ) is classi ed as SI. When cr(ai;aj) &gt; ,
then e(ai; aj ) is classi ed as HI. Based on repeated experiments to maximize
discrimination among citation ratios, is set to 0.036. Lastly, cr(ai;aj) = 0 implies
that there is no in uence of the cited author aj over the citing author ai, thus,
classifying e(ai; aj ) as N I.</p>
        <p>Baseline: For comparing the performance of our model and the in uence
scores, we use citation counts as baseline for our evaluation. Author pro ling
and generation of vectors for this baseline is achieved using weighted random
walk over author citation graph. Here, weights are the number of citations from
the citing author to the cited author (as opposed to in uence measures as
discussed in 4.1). Edge representations for this baseline are then computed using
the obtained author vector representations.
6
6.1</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>EXPERIMENTAL DESIGN</title>
      <sec id="sec-7-1">
        <title>Dataset Description</title>
        <p>
          The DBLP dataset used consists of papers published from the period 1960 to
2014 wherein the citation data is enriched by using bibliographic metadata from
ArnetMiner [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. The dataset contains information such as paper's title, its
authors and their a liations, citation list, publication year, etc. For our
experiments, pre-processing is performed over the dataset for pruning incomplete
records where: 1) The publications with incomplete meta data (absence of year,
authors, etc.) are removed. 2) Internal citations can be de ned as references to
publications within the snapshot of dataset being considered. References other
than these internal citations are removed.
        </p>
        <p>The nal dataset includes 1277594 papers and 1003387 authors. The total
count of internal references for publication citation network is 7962820 whereas
edgelist for the author citation network sums up to 39713499.
105
ilscobunP#110024
ita103
101
In order to evaluate and assess the degree of in uence as quanti ed by the
suggested in uence measures, we consider work of individual authors over a
temporal scale. We divide the dataset into 3 segments as follows:</p>
        <p>Profiling: This segment of dataset represents the activity of researchers
and scientists within the bibliographic network up to 2006. Interactions between
researchers by means of collaborations, citations and conceptual exchanges in
the form of publications are signi cantly eventuated considering such a wide
temporal span. This partition of dataset helps in the process of author pro ling
by means of learning author vector representations using weighted random walks.</p>
        <p>Training: This segment of dataset is used for learning in uence
associations by generating edge representations using the author vector representations
captured in the Profiling segment. The exact testset author pair is excluded
from this segment to avoid over- tting of the classi er. Author pairs for whom
citations have been eventuated between 2006 and 2010 are considered for this
segment.</p>
        <p>Testing: The proposed in uence scores and the baseline (citation count) are
validated using the bibliographic interactions in this segment. An author pair is
valid for testing as long as we have individual vector representations for both the
authors. Citation exchanges resulted since 2011 are considered for this segment.
6.3</p>
      </sec>
      <sec id="sec-7-2">
        <title>Evaluation and Results</title>
        <p>Using the Training set, edge representations are constructed for each pair of
authors between whom citations are exchanged during this segment. Consequently,
relative citation ratios between these author pairs are calculated using Equation
(7). The edge representations are then classi ed with the class labels namely,
(N I, SI and HI) using RandomForest. For authors spanning across wide
spectrum of ranks and extent of in uences, this enables us to capture what kind of
authors cite what kind of authors. Since we aim at comparing the relative
predictive capacity of these representations, we focus less on exact classi er settings
and report results achieved by each representation using the same parameters.</p>
        <p>On evaluation, precision, recall and f-scores for baseline and in uences
measures are as shown in the tables (1), (2) and (3). From these results, we can see
that even the resultant representations obtained using baseline citation count
performs well enough for the multi-label citation prediction task. However,
inuence measures IS and ID can be clearly seen as better performers almost
throughout for each of the aforementioned classes, considering the reported
precision-recall values. We observe that, for class N I, almost all the three
measures perform equally. This can be attributed to the better accuracy of classi er
for non-existent edges between author pairs, irrespective of the weights in
consideration. It can also be theorized that certain in uence associations may traverse
from a class to another over a span of time. For example, the citations of the
citing author with higher in uence of the cited author possibly might get narrowed
down (and vice versa) over a period of time. This can happen due to various
possible reasons such as a shift in research trends, cultivation of interests in newer
elds, etc. Thereby, we also witness recall values for class HI on lower sides for
all the three measures.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>CONCLUSION AND FUTURE WORK</title>
      <p>
        In this paper, we present a model to trace intellectual in uences harnessing the
structural connectivity within an academic network. Further, we generate author
pro ling by mapping the latent features into a vector space using the proposed
in uence scores. We also evaluate e ectiveness of the captured author
relationships and the resultant author representations by performing experiments for
classi cation tasks, such as citation prediction and the extent of citation. It is
observed that results obtained using the suggested in uence scores perform
better as compared to immediate citation counts. A future direction would be to
incorporate types of citations (as mentioned in Section (2.2)) into the current
model, possibly using ontological representation of citations [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. This might help
us in knowing and gaining insights on the nature of in uences. It would also be
interesting to analyze e ects of research trends on in uences associations.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Brin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Page</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>The anatomy of a large-scale hypertextual web search engine</article-title>
          .
          <source>Computer networks and ISDN systems 30(1-7)</source>
          ,
          <volume>107</volume>
          {
          <fpage>117</fpage>
          (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Brooks</surname>
            ,
            <given-names>T.A.</given-names>
          </string-name>
          :
          <article-title>Evidence of complex citer motivations</article-title>
          .
          <source>Journal of the Association for Information Science and Technology</source>
          <volume>37</volume>
          (
          <issue>1</issue>
          ),
          <volume>34</volume>
          {
          <fpage>36</fpage>
          (
          <year>1986</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xie</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maslov</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Redner</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Finding scienti c gems with googles pagerank algorithm</article-title>
          .
          <source>Journal of Informetrics</source>
          <volume>1</volume>
          (
          <issue>1</issue>
          ),
          <volume>8</volume>
          {
          <fpage>15</fpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Chikhaoui</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chiazzaro</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sotir</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Detecting communities of authority and analyzing their in uence in dynamic social networks</article-title>
          .
          <source>ACM Transactions on Intelligent Systems and Technology (TIST) 8</source>
          (
          <issue>6</issue>
          ),
          <volume>82</volume>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Grover</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leskovec</surname>
          </string-name>
          , J.: node2vec:
          <article-title>Scalable feature learning for networks</article-title>
          .
          <source>In: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          . pp.
          <volume>855</volume>
          {
          <fpage>864</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Jiang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhuge</surname>
          </string-name>
          , H.:
          <article-title>Graph-based algorithms for ranking researchers: not all swans are white!</article-title>
          <source>Scientometrics</source>
          <volume>96</volume>
          (
          <issue>3</issue>
          ),
          <volume>743</volume>
          {
          <fpage>759</fpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Kaplan</surname>
          </string-name>
          , N.:
          <article-title>The norms of citation behavior: Prolegomena to the footnote</article-title>
          .
          <source>Journal of the Association for Information Science and Technology</source>
          <volume>16</volume>
          (
          <issue>3</issue>
          ),
          <volume>179</volume>
          {
          <fpage>184</fpage>
          (
          <year>1965</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>J.</given-names>
            , Han, J.
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>Learning in uence from heterogeneous social networks</article-title>
          .
          <source>Data Mining and Knowledge Discovery</source>
          <volume>25</volume>
          (
          <issue>3</issue>
          ),
          <volume>511</volume>
          {
          <fpage>544</fpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Ma</surname>
          </string-name>
          , N.,
          <string-name>
            <surname>Guan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Bringing pagerank to the citation analysis</article-title>
          .
          <source>Information Processing &amp; Management</source>
          <volume>44</volume>
          (
          <issue>2</issue>
          ),
          <volume>800</volume>
          {
          <fpage>810</fpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>MacRoberts</surname>
          </string-name>
          , M.H.,
          <string-name>
            <surname>MacRoberts</surname>
            ,
            <given-names>B.R.</given-names>
          </string-name>
          :
          <article-title>Problems of citation analysis: A critical review</article-title>
          .
          <source>Journal of the American Society for information Science</source>
          <volume>40</volume>
          (
          <issue>5</issue>
          ),
          <volume>342</volume>
          {
          <fpage>349</fpage>
          (
          <year>1989</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Merton</surname>
            ,
            <given-names>R.K.</given-names>
          </string-name>
          :
          <article-title>The matthew e ect in science, ii: Cumulative advantage and the symbolism of intellectual property</article-title>
          .
          <source>isis 79(4)</source>
          ,
          <volume>606</volume>
          {
          <fpage>623</fpage>
          (
          <year>1988</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>E cient estimation of word representations in vector space</article-title>
          .
          <source>arXiv preprint arXiv:1301.3781</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          .
          <source>In: Advances in neural information processing systems</source>
          . pp.
          <volume>3111</volume>
          {
          <issue>3119</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Moravcsik</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Murugesan</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>Some results on the function and quality of citations</article-title>
          .
          <source>Social studies of science 5(1)</source>
          ,
          <volume>86</volume>
          {
          <fpage>92</fpage>
          (
          <year>1975</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Nicolaisen</surname>
          </string-name>
          , J.:
          <article-title>Citation analysis</article-title>
          .
          <source>Annual review of information science and technology 41(1)</source>
          ,
          <volume>609</volume>
          {
          <fpage>641</fpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Perozzi</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Al-Rfou</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Skiena</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          : Deepwalk:
          <article-title>Online learning of social representations</article-title>
          .
          <source>In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          . pp.
          <volume>701</volume>
          {
          <fpage>710</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Pinski</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Narin</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Citation in uence for journal aggregates of scienti c publications: Theory, with application to the literature of physics</article-title>
          .
          <source>Information processing &amp; management 12(5)</source>
          ,
          <volume>297</volume>
          {
          <fpage>312</fpage>
          (
          <year>1976</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Rakoczy</surname>
            ,
            <given-names>M.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bouzeghoub</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gancarski</surname>
            ,
            <given-names>A.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wegrzyn-Wolska</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>In uence in time-dependent citation networks</article-title>
          .
          <source>In: 2018 12th International Conference on Research Challenges in Information Science (RCIS)</source>
          . pp.
          <volume>1</volume>
          {
          <fpage>11</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Shotton</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Cito, the citation typing ontology</article-title>
          .
          <source>In: Journal of biomedical semantics</source>
          . vol.
          <volume>1</volume>
          , p.
          <fpage>S6</fpage>
          .
          <source>BioMed Central</source>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Zhang, J.,
          <string-name>
            <surname>Yao</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Zhang,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <surname>Z.</surname>
          </string-name>
          :
          <article-title>Arnetminer: extraction and mining of academic social networks</article-title>
          .
          <source>In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          . pp.
          <volume>990</volume>
          {
          <fpage>998</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shen</surname>
          </string-name>
          , H., Cheng, X., et al.:
          <article-title>Scienti c credit di usion: Researcher level or paper level?</article-title>
          <source>Scientometrics</source>
          <volume>109</volume>
          (
          <issue>2</issue>
          ),
          <volume>827</volume>
          {
          <fpage>837</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>Y.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            <given-names></given-names>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          :
          <article-title>Quantifying the in uence of scientists and their publications: distinguishing between prestige and popularity</article-title>
          .
          <source>New Journal of Physics</source>
          <volume>14</volume>
          (
          <issue>3</issue>
          ),
          <volume>033033</volume>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Zuckerman</surname>
          </string-name>
          , H.:
          <article-title>Citation analysis and the complex problem of intellectual in uence</article-title>
          .
          <source>Scientometrics</source>
          <volume>12</volume>
          (
          <issue>5-6</issue>
          ),
          <volume>329</volume>
          {
          <fpage>338</fpage>
          (
          <year>1987</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>