<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Mining Intellectual Influence Associations</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Tejas</forename><surname>Shah</surname></persName>
							<email>tejas.shah@research.iiit.ac.in</email>
							<affiliation key="aff0">
								<orgName type="institution">International Institute of Information Technology</orgName>
								<address>
									<settlement>Hyderabad</settlement>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Vikram</forename><surname>Pudi</surname></persName>
							<email>vikram@iiit.ac.in</email>
							<affiliation key="aff0">
								<orgName type="institution">International Institute of Information Technology</orgName>
								<address>
									<settlement>Hyderabad</settlement>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Mining Intellectual Influence Associations</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">53EDFAF30D5FBB8AADF582A2FDA74842</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T20:02+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Citation Network</term>
					<term>Influence</term>
					<term>Representation Learning</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Within the social system of science, citation practices characterize social functions like the conferral of recognition upon the work of others as well as the acknowledgement of one's intellectual debt. However, the structure of intellectual influence is misrepresented when only the immediate citations and their cardinality are taken into consideration. Thus, in order to better understand the associative dissemination of influence and approximately construe the anatomy of this structure, complex interactions in the convoluted network of authors and papers need to be probed. Our study aims at understanding these heterogeneous complex interactions. For the bibliographic dataset of authors and publications, we define proxy scores that attempt to determine the associative influence of the cited author over the citing author. In order to harness structural connectivity of the network, we generate author vector representations using these influence scores. Furthermore, with a view to assess the competence of our proposed scores, we evaluate these representations and provide an empirical study of the results obtained with our algorithm against the baseline and also present a qualitative analysis.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">INTRODUCTION AND RELATED WORK</head><p>The contribution of an author in the form of publications holds an intrinsic value responsible for the effective dissemination of knowledge. This knowledge based relay that is extended by and for the scientific community results into establishment of conceptual relationships in the form of citations. Citations and references operate within a jointly cognitive and moral framework designed to provide the historical lineage of knowledge and to repay the intellectual debts through their open acknowledgement <ref type="bibr" target="#b14">[15]</ref> <ref type="bibr" target="#b6">[7]</ref>. Thus, citation analysis has the potential in providing valuable insights about the social system of science spanning across a wide range of topics.</p><p>Effective digitization of bibliographic data has led to proliferation in the propositions of various quantitative bibliometric performance indicators for journals, papers, authors and institutions based on citation counts and graph-based ranking algorithms <ref type="bibr" target="#b5">[6]</ref> <ref type="bibr" target="#b16">[17]</ref> <ref type="bibr" target="#b21">[22]</ref>. Studies dealing with influence have examined various aspects like topic level influence strength, influence propagation and its indirect global effect <ref type="bibr" target="#b7">[8]</ref>; analysis of influence evolution between communities <ref type="bibr" target="#b3">[4]</ref>; time dependent estimation of influence for evaluating pairwise community influences <ref type="bibr" target="#b17">[18]</ref>. Besides being extensively explored in literature, the concept of influence has also surfaced in scholarly search engines such as Semantic Scholar. Many of these approaches focus on influence analysis based on individual entities and their overall impact on the network structures. However, influence relationships and associations do not emerge inherently considering just the global influence. These influence associations can be understood as the degree of influence between a pair of nodes within the network. Our work aims at studying these pairwise influence associations eventuated between authors within the scholarly network. The main contributions of the paper are as follows:</p><p>-We propose an algorithm that simulates the influence between the citing and cited authors and suggest influence association scores. -Considering issues in quantifying and tracing of influences that arise in scholarly communication and to harness structural connectivity of the network, we profile authors and their interactions within the bibliographic network using representation learning and the proposed influence scores. These representations thus form a generic result of the proposed influence model and their effectiveness in context of the problem statement is discussed. -For assessment of predictive capacity of the aforementioned scores and the thus obtained vector representations, experimental results subject to classification tasks are discussed along with comparative study against those obtained by measures such as citation counts.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">PROBLEM STATEMENT</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Problem Formulation</head><p>Consider a publication p i written by m co-authors a i1 , a i2 , .., a im , which cites a publication p j written by n co-authors a j1 , a j2 , .., a jn . Thus, a publication citation network can be defined as a directed graph G P = (V P , E P ) constructed from the list of references at the end of each publication, where V P represents vertices (publications) and E P represents set of all directed edges between the nodes denoting citations between publications. Consequently, an author citation network G A = (V A , E A ) is defined by projecting this publication citation network along the corresponding author(s) for each publication node in G P . For instance, the citation link p i → p j is projected between their authors respectively, thus creating m • n directed links from each of the m co-authors of the citing publication to each of the n co-authors of the cited publication. Accordingly, V A represents nodes (authors) and E A represents set of all such directed links between the authors. Let the directed pairwise author citation link between the citing author a i and the cited author a j be denoted as a i → a j (∀i = 1, .., m and ∀j = 1, .., n). In the discussions that follow, we define and aim to quantify the associative intellectual influence measure represented by I(a i , a j ) as the degree to which author a j influences author a i when a citation is made from a i to a j .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Issues and Caveats in Quantifying Influence</head><p>Citation networks are complex networks in which causal structure exists along the interactions between the nodes. However, consideration of just the citation counts and primary degree of interactions (immediate neighbours) within the citation network has its shortcomings as indicators for tracing intellectual influences <ref type="bibr" target="#b22">[23]</ref>. Without a subjective survey of authors in conjunction with their publications, it remains unknown as to what fraction of the work is cited that was indeed influential directly or indirectly and whether the references exist which had no influence of any kind yet cited due to other motivations. Such complex interplay of multiple citer motivations have been empirically studied and reported in previous studies as well <ref type="bibr">[2][10]</ref>.</p><p>Possible classes of errors in tracing scholarly influences include:</p><p>-The inclusion (or exclusion) of a reference in a bibliography does not completely indicate whether or not those references were directly or indirectly influential for the proposition of the publication. -Citing bias in favor of elite scientists or highly cited papers i.e. over-citation described as the "Matthew effect" <ref type="bibr" target="#b10">[11]</ref>. -Under-citation of fundamental scientific work is possibly noticed due to the obliteration (of sources) by incorporation (OBI) in the established knowledge <ref type="bibr" target="#b9">[10]</ref>. -When a relevant piece of scientific work is known through an intermediate publication, the intermediate publication serves as an intermediate influence.</p><p>However, it may remain uncited. -Nature of citation types as they can be further categorized into organic or perfunctory, evolutionary or juxtapositional, confirmative or negational etc <ref type="bibr" target="#b13">[14]</ref>.</p><p>These certain and other such classes of errors exist within the citation data due to under-inclusion and over-inclusion of references <ref type="bibr" target="#b1">[2]</ref>[10] <ref type="bibr" target="#b22">[23]</ref>. This makes the task of tracing influences more complex.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">ASSOCIATIVE INFLUENCE MEASURES</head><p>We propose two associative influence scores for effectively capturing intellectual influence of the cited author over the citing author. The underlying principle encapsulates the reasoning that the citations towards an influential publication of the cited author and those from the finest works of the citing author are indeed significant. Further, the net influential impact over a publication can be fairly attributed as an aggregation of influences by all the cited publications. Considering citing author's temporal scholarly activity, associative influences are instantiated for each such citing publication wherein references are made to other publications. Thus, for a particular author pair (a i , a j ), we consider each such instance wherein a i cites a j i.e. all such publications authored by a i wherein a citation has been made to a publication authored by a j . This instantiated influence forms a component for the integral associative influence between the author pair. Since the associative influence is a directed mapping, the influence scores resemble the same notion of directedness. Thus, for a citation relationship a i → a j , the proposed influence association score I(a i , a j ) represents the degree of influence cited author a j has over the citing author a i .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Ranking Publications</head><p>In order to capture the collective nature of scholarly influence, publication-level ranking is adopted (as opposed to researcher-level ranking). This mimics the spread of intellectual influence among researchers via their publications. Quite a few studies <ref type="bibr" target="#b5">[6]</ref>[21] investigate extensively this key issue of scientific credit diffusion by dissecting the credit diffusion mechanism underlying both researcher level and paper level graph-ranking methods. Their findings emphasize that scientific credit is fundamentally derived from citation information between papers rather than the derived researcher network. Our model for the influence dissemination within the heterogeneous network of authors and papers thus avoids the inaccurate allocation of scientific credit among researchers that potentially arises in graph-ranking methods.</p><p>PageRank <ref type="bibr" target="#b0">[1]</ref> takes into account the number and quality of links while measuring the importance of entities within a network. Using PageRank over G P , the importance of publications is thus measured considering the number of citations and reputation of the papers <ref type="bibr" target="#b8">[9]</ref>. For the publication level ranking, we have:</p><formula xml:id="formula_0">P R(p i ) = (1 − α) N + α • j→i P R(p j ) T out (p j )<label>(1)</label></formula><p>where j → i implies paper p j referring paper p i , P R(p j ) denotes the PageRank of paper p j and T out (p j ) denotes the number of outbound links from paper p j . With certain empirical studies, Chen et al. <ref type="bibr" target="#b2">[3]</ref> showed that, scientific papers usually follow a shorter path of about average two links. This is in opposition to six hyperlinks for the web considering the individual surfer illustration as mentioned in the original study <ref type="bibr" target="#b0">[1]</ref>. Accordingly, we set the damping factor α to 0.5 for the purpose of our studies as well.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Source based Influence</head><formula xml:id="formula_1">-I S (a i , a j )</formula><p>The basis for this score is that, citations from the significant works of the citing author (a i ) are indeed relevant, considering the intellectual and cognitive influences. The citing author's significant works can be regarded as those scholarly works which have a relatively high PageRank among other works of the same author. The instantiated associative influence components resulting from citing author's citation activity sum up to form the net score. Thus, we calculate I S (a i , a j ) as follows:</p><formula xml:id="formula_2">I S (a i , a j ) = k P R(p ik ) T out (p ik ) P R(p ai ) ∀p ik ∈ p ik → p aj<label>(2)</label></formula><p>where, p ik denotes the k th publication of the citing author a i wherein a citation has been made to a publication authored by the cited author a j , P R(p ik ) represents the PageRank of the citing publication p ik , T out (p ik ) denotes the number of outbound links (references) from publication p ik and P R(p ai ) is the normalization factor which is the average PageRank value of all the publications authored by a i . The normalization factor accounts for the variance in the ranks of citing publications. So, higher the normalized weight of each such component (i.e. higher the normalized PageRank of citing publication) and higher the cardinality of such components exchanged between a i and a j , higher is the associative influence.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Destination based Influence</head><formula xml:id="formula_3">-I D (a i , a j )</formula><p>The underlying notion for devising this score is to calculate a measure of the extent to which the citing author tends to cite a set of authors weighted by their influential scholarly contributions in his/her publications. Extending the influence instantiation notion for I D , the association components of influence comprise of the PageRank of the cited publication and the inbound links of that cited publication (distributing impact of a publication among the inbound citation references). Thus, we calculate I D (a i , a j ) as follows:</p><formula xml:id="formula_4">I D (a i , a j ) = k P R(p jk ) T in (p jk ) P R(p aj ) ∀p jk ∈ p ai → p jk<label>(3)</label></formula><p>Here, p jk denotes the k th publication of the cited author a j wherein a citation has been made from a publication authored by the citing author a i , P R(p jk ) represents the PageRank of the cited publication p jk , T in (p jk ) denotes the number of inbound links to publication p jk and P R(p aj ) is the normalization factor which is the average PageRank value of the publications authored by a j . The disparity in the ranks of cited publications is accounted for by this normalization factor. So, more the normalized weight of each such association components and higher the cardinality of such components exchanged between a i and a j , higher is the associative influence.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4">Qualitative Analysis</head><p>Author and publication ranks of prominent scientists and young researchers span across a wide spectrum. For example, a highly cited publication of a prominent scientist might be influential but in a modest way to each individual researcher in the scientific community at large. Also, the finest works of a young researcher possibly might have less inbound citations comparatively, however, the substantial influence of cited publications within such publications should not possibly be overlooked. Considering the different degrees of author and publication ranks, it can be seen from Equation <ref type="formula">(</ref>2) that significant works of young as well as even mediocre researchers are instrumental in contributing towards highlighting the cited author's influence. Thus, irrespective of the cited author's prominence, his/her influence over the citing author is conspicuous and visibly pronounced. Also, from Equation (3) it is evident that even scientists not belonging to the higher order of ranks receive their due scholarly attribution pertinent to the their respective notable scientific studies. Such notions of relative author impact and allocation of due credit persist across the spectrum for most of the researchers belonging to different degrees of author ranks. Normalization of associative influence measures as illustrated in sections 3.2 and 3.3 account for such variances along with taking into consideration the effect of accumulated advantage (i.e. Matthew effect <ref type="bibr" target="#b10">[11]</ref>).</p><p>Associative pairwise influences eventuated due to scholarly contributions over time semantically imply that there is certain form of influence of the cited author over the citing author irrespective of the citation types. In this paper, we focus on the existence and extent of conceptual relationships formed between authors. However, the precise nature of influence maybe hard to quantify without the factual ontological citation representations. For alleviating issues concerning under-inclusion of references (as discussed in Section 2.2), capturing the network neighborhood and harnessing the structural connectivity, we profile authors and their interactions using representation learning and the proposed influence scores. This effectively maps the latent features within the citation network into a vector space.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">AUTHOR REPRESENTATION LEARNING</head><p>Recent work in language modeling and representation learning such as Word2Vec <ref type="bibr" target="#b11">[12]</ref> focuses on application of probabilistic neural networks which map words into vector spaces. The author vector representations are learned with a similar intuition as discussed in the following sections:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Random Walk on Weighted Directed Network</head><p>Using proposed influence measures, we model weighted random walks over the author citation network. These walks can be approximated as sentences in the context of language modeling. Analogous to recent researches <ref type="bibr" target="#b15">[16]</ref>, it represents a network as a "document". The motivation behind converting a graph into a series of text documents is: Word frequency in a document corpus and the visited node frequency during a random walk for a connected graph, both follow the power law distribution <ref type="bibr" target="#b15">[16]</ref>.</p><p>We sample random walks over the weighted directed author citation network. Considering the author citation network (G A ) with n author nodes (a 1 , a 2 , . . . , a n ); w ij represents the weight of the edge connecting nodes a i and a j where the edge weight is the influence score I(a i , a j ) (as derived from Equation <ref type="formula" target="#formula_2">2</ref>or Equation <ref type="formula" target="#formula_4">3</ref>) for the author pair and therefore we have, w ij = I(a i , a j ). Since the influence scores are non-negative, we have w ij ≥ 0. For the study of tracing influence associations, we prune self-citation loops and thus w ii = 0. Now, for a given source node a i , the transition probability that the author node a j is chosen from the direct successors of a i is proportional to the influence measure I(a i , a j ). This is computed as:</p><formula xml:id="formula_5">p ai,aj = I(a i , a j ) k I(a i , a k ) (<label>4</label></formula><formula xml:id="formula_6">)</formula><p>where p ai,aj represents the transition probability. For each source author node a i , we simulate a weighted directed random walk W ai . This sampling is a stochastic process consisting of author nodes w 1 ai , w 2 ai , . . . , w n ai as random variables such that w j ai is a vertex chosen with transitive probability p ai,aj from the direct successors of a i . In our experiments we set the length of these walks to be fixed. For each source vertex, the random walk generator samples author nodes based on respective transition probabilities until a maximum length l (= 40) is reached. For the purpose of our study, we generate such weighted random walks γ (= 15) times for each author.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Representation Learning Framework</head><p>Modelling social structures and relationships within networks can be aligned with the optimization techniques used to model natural languages <ref type="bibr" target="#b15">[16]</ref> <ref type="bibr" target="#b4">[5]</ref>. With the ordered sequence of nodes constructed using weighted random walks, we learn representations using Skip-gram model. This represents authors and the citation relationship shared between a pair of authors in an unsupervised manner. Based on distributional hypothesis, these representations are latent features that capture neighbourhood as well as structural influences in the citation network in a continuous low dimensional vector space. In effect, these representations encapsulate more information and relationships between authors than using just the immediate citations.</p><p>Skip-gram is a language model that maximizes the conditional co-occurence probability of words occurring within a predefined window <ref type="bibr" target="#b11">[12]</ref>. Thereby, we have f : V A → R d as the mapping function from nodes to feature representations that we aim to learn. Here d specifies the number of dimensions of the feature representation and f represents a matrix of size |V A | × d parameters. Now, we try to optimize the likelihood function as formulated in Equation <ref type="formula" target="#formula_7">5</ref>:</p><formula xml:id="formula_7">max f j=i+w j=i−w,j =i log Pr(a j | f (a i )) (<label>5</label></formula><formula xml:id="formula_8">)</formula><p>where w is the size of the window, a i ∈ V A and Pr(a j | f (a i ) is defined by the softmax function:</p><formula xml:id="formula_9">Pr(a j | f (a i )) = exp(f (a j ) • f (a i )) k∈V exp(f (a k ) • f (a i ))<label>(6)</label></formula><p>Skip-gram assumes that inside the context, all nodes are independent of each other and are equally important. However, as seen from Equations 5 and 6, update step per node is proportional to |V A |. This is computationally expensive for large networks (such as the author citation network in consideration). We approximate the optimization function using negative sampling <ref type="bibr" target="#b12">[13]</ref>.</p><p>Using the obtained resultant author vector representations, we validate the effectiveness of our proposed scores as discussed in the following sections.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">PERFORMANCE MODEL</head><p>Limitations in the assessment of intellectual and cognitive influences prevail due to its subjective nature. However, despite the shortcomings of citation data, studies <ref type="bibr" target="#b22">[23]</ref> assert that citations can be used as approximate proxy indicators of influence for the aggregates of authors and papers. Collectively, with measurable factors and practical limitations of the study, it can be fairly argued that if a publication proves to be relatively influential in the scientific work of an author, then he/she is quite definitive to have a higher relative citation ratio towards the cited influential authors. So, we utilize and bucket these relative citation ratios for author pairs as labels for classifying the extent of influence. Representations obtained using influence scores can be evaluated against baseline by a comparative study of citation prediction and its extent between author pairs. In our study, the purpose is to evaluate whether our proposed influence measures capture meaningful relationships. To do so, it suffices to test their relative capacity in citation prediction, and the absolute predictive accuracy is not the criterion being assessed.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">Validating Influence Associations</head><p>To capture the semantic relatedness between the citing and cited author's influential relationship and in order predict weighted citation link between a pair of authors a 1 and a 2 , we generate edge representation e(a 1 , a 2 ). This is done by defining a binary operator over the corresponding author feature vectors f (a 1 ) and f (a 2 ). Similar strategies have been successfully used in earlier studies for link prediction tasks <ref type="bibr" target="#b4">[5]</ref>. We define i th component of the edge representation e i (a 1 , a 2 ) as concatenation of author feature vector components denoted by f i (a 1 ) f i (a 2 ). e(a 1 , a 2 ) spans across R 2×D as the author feature vectors ∈ R D are concatenated.</p><p>These edge representations are now further used in training and evaluation for predicting the influence between a pair of authors. However, predicting the  <ref type="formula" target="#formula_0">1b</ref>) represent publication in-degree and out-degree distributions for the publication citation network and figures (1c), (1d) denote the author in-degree and out-degree distributions for the author citation network respectively.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.2">Dataset Segmentation</head><p>In order to evaluate and assess the degree of influence as quantified by the suggested influence measures, we consider work of individual authors over a temporal scale. We divide the dataset into 3 segments as follows:</p><p>Profiling: This segment of dataset represents the activity of researchers and scientists within the bibliographic network up to 2006. Interactions between researchers by means of collaborations, citations and conceptual exchanges in the form of publications are significantly eventuated considering such a wide temporal span. This partition of dataset helps in the process of author profiling by means of learning author vector representations using weighted random walks.</p><p>Training: This segment of dataset is used for learning influence associations by generating edge representations using the author vector representations captured in the Profiling segment. The exact testset author pair is excluded from this segment to avoid over-fitting of the classifier. Author pairs for whom citations have been eventuated between 2006 and 2010 are considered for this segment.</p><p>Testing: The proposed influence scores and the baseline (citation count) are validated using the bibliographic interactions in this segment. An author pair is valid for testing as long as we have individual vector representations for both the authors. Citation exchanges resulted since 2011 are considered for this segment.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.3">Evaluation and Results</head><p>Using the Training set, edge representations are constructed for each pair of authors between whom citations are exchanged during this segment. Consequently, relative citation ratios between these author pairs are calculated using Equation <ref type="bibr" target="#b6">(7)</ref>. The edge representations are then classified with the class labels namely, (N I, SI and HI) using RandomForest. For authors spanning across wide spectrum of ranks and extent of influences, this enables us to capture what kind of authors cite what kind of authors. Since we aim at comparing the relative predictive capacity of these representations, we focus less on exact classifier settings and report results achieved by each representation using the same parameters.</p><p>On evaluation, precision, recall and f-scores for baseline and influences measures are as shown in the tables (1), ( <ref type="formula" target="#formula_2">2</ref>) and (3). From these results, we can see that even the resultant representations obtained using baseline citation count performs well enough for the multi-label citation prediction task. However, influence measures I S and I D can be clearly seen as better performers almost throughout for each of the aforementioned classes, considering the reported precision-recall values. We observe that, for class N I, almost all the three measures perform equally. This can be attributed to the better accuracy of classifier for non-existent edges between author pairs, irrespective of the weights in consideration. It can also be theorized that certain influence associations may traverse from a class to another over a span of time. For example, the citations of the citing author with higher influence of the cited author possibly might get narrowed down (and vice versa) over a period of time. This can happen due to various possible reasons such as a shift in research trends, cultivation of interests in newer fields, etc. Thereby, we also witness recall values for class HI on lower sides for all the three measures.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">CONCLUSION AND FUTURE WORK</head><p>In this paper, we present a model to trace intellectual influences harnessing the structural connectivity within an academic network. Further, we generate author profiling by mapping the latent features into a vector space using the proposed influence scores. We also evaluate effectiveness of the captured author relationships and the resultant author representations by performing experiments for classification tasks, such as citation prediction and the extent of citation. It is observed that results obtained using the suggested influence scores perform better as compared to immediate citation counts. A future direction would be to incorporate types of citations (as mentioned in Section (2.2)) into the current model, possibly using ontological representation of citations <ref type="bibr" target="#b18">[19]</ref>. This might help us in knowing and gaining insights on the nature of influences. It would also be interesting to analyze effects of research trends on influences associations.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 :</head><label>1</label><figDesc>Fig.1: Figures (1a), (1b) represent publication in-degree and out-degree distributions for the publication citation network and figures (1c), (1d) denote the author in-degree and out-degree distributions for the author citation network respectively.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>Citation Count</figDesc><table><row><cell cols="4">Label Precision Recall F-Score</cell></row><row><cell>NI</cell><cell>0.93</cell><cell>0.89</cell><cell>0.91</cell></row><row><cell>SI</cell><cell>0.72</cell><cell>0.81</cell><cell>0.76</cell></row><row><cell>HI</cell><cell>0.71</cell><cell>0.49</cell><cell>0.58</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 :</head><label>2</label><figDesc>Influence (IS)    </figDesc><table><row><cell cols="4">Label Precision Recall F-Score</cell></row><row><cell>NI</cell><cell>0.93</cell><cell>0.90</cell><cell>0.91</cell></row><row><cell>SI</cell><cell>0.75</cell><cell>0.83</cell><cell>0.79</cell></row><row><cell>HI</cell><cell>0.74</cell><cell>0.54</cell><cell>0.62</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3 :</head><label>3</label><figDesc>Influence (ID)    </figDesc><table><row><cell cols="4">Label Precision Recall F-Score</cell></row><row><cell>NI</cell><cell>0.94</cell><cell>0.89</cell><cell>0.91</cell></row><row><cell>SI</cell><cell>0.75</cell><cell>0.85</cell><cell>0.80</cell></row><row><cell>HI</cell><cell>0.76</cell><cell>0.55</cell><cell>0.64</cell></row></table></figure>
		</body>
		<back>
			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>presence of link between the author pair in the testset is necessary but insufficient to assess predictive capacity of the degree of influence. For evaluating the extent of influence more rigorously, we extend this binary classification link prediction evaluation to multi-label classification. Here, the edge representations are classified against labels viz., Nil Influence (NI), Slightly Influenced (SI) and Highly Influenced (HI) depending upon the relative citation ratio between the pairs of authors in the training and testing sets respectively. Thus, for a pair of authors a i and a j , e(a i , a j ) is classified based on the relative citation ratio (cr (ai,aj ) ) which is computed as:</p><p>where c(a i , a j ) represents number of citations from the citing author a i to the cited author a j in the author citation network during that specific temporal segment. The calculated citation ratios are then mapped to aforementioned class labels. If cr (ai,aj ) ∈ (0.0, δ], then e(a i , a j ) is classified as SI. When cr (ai,aj ) &gt; δ, then e(a i , a j ) is classified as HI. Based on repeated experiments to maximize discrimination among citation ratios, δ is set to 0.036. Lastly, cr (ai,aj ) = 0 implies that there is no influence of the cited author a j over the citing author a i , thus, classifying e(a i , a j ) as N I.</p><p>Baseline: For comparing the performance of our model and the influence scores, we use citation counts as baseline for our evaluation. Author profiling and generation of vectors for this baseline is achieved using weighted random walk over author citation graph. Here, weights are the number of citations from the citing author to the cited author (as opposed to influence measures as discussed in 4.1). Edge representations for this baseline are then computed using the obtained author vector representations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">EXPERIMENTAL DESIGN</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.1">Dataset Description</head><p>The DBLP dataset used consists of papers published from the period 1960 to 2014 wherein the citation data is enriched by using bibliographic metadata from ArnetMiner <ref type="bibr" target="#b19">[20]</ref>. The dataset contains information such as paper's title, its authors and their affiliations, citation list, publication year, etc. For our experiments, pre-processing is performed over the dataset for pruning incomplete records where: 1) The publications with incomplete meta data (absence of year, authors, etc.) are removed. 2) Internal citations can be defined as references to publications within the snapshot of dataset being considered. References other than these internal citations are removed.</p><p>The final dataset includes 1277594 papers and 1003387 authors. The total count of internal references for publication citation network is 7962820 whereas edgelist for the author citation network sums up to 39713499.</p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">The anatomy of a large-scale hypertextual web search engine</title>
		<author>
			<persName><forename type="first">S</forename><surname>Brin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Page</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computer networks and ISDN systems</title>
		<imprint>
			<biblScope unit="volume">30</biblScope>
			<biblScope unit="issue">1-7</biblScope>
			<biblScope unit="page" from="107" to="117" />
			<date type="published" when="1998">1998</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Evidence of complex citer motivations</title>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">A</forename><surname>Brooks</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of the Association for Information Science and Technology</title>
		<imprint>
			<biblScope unit="volume">37</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="34" to="36" />
			<date type="published" when="1986">1986</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Finding scientific gems with googles pagerank algorithm</title>
		<author>
			<persName><forename type="first">P</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Xie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Maslov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Redner</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Informetrics</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="8" to="15" />
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
	<note>BIR 2019 Workshop on Bibliometric-enhanced Information Retrieval</note>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Detecting communities of authority and analyzing their influence in dynamic social networks</title>
		<author>
			<persName><forename type="first">B</forename><surname>Chikhaoui</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Chiazzaro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sotir</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Transactions on Intelligent Systems and Technology (TIST)</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="issue">6</biblScope>
			<biblScope unit="page">82</biblScope>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">node2vec: Scalable feature learning for networks</title>
		<author>
			<persName><forename type="first">A</forename><surname>Grover</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Leskovec</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining</title>
				<meeting>the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="855" to="864" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Graph-based algorithms for ranking researchers: not all swans are white!</title>
		<author>
			<persName><forename type="first">X</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Zhuge</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Scientometrics</title>
		<imprint>
			<biblScope unit="volume">96</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="743" to="759" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">The norms of citation behavior: Prolegomena to the footnote</title>
		<author>
			<persName><forename type="first">N</forename><surname>Kaplan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of the Association for Information Science and Technology</title>
		<imprint>
			<biblScope unit="volume">16</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="179" to="184" />
			<date type="published" when="1965">1965</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Learning influence from heterogeneous social networks</title>
		<author>
			<persName><forename type="first">L</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Tang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Han</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Yang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Data Mining and Knowledge Discovery</title>
		<imprint>
			<biblScope unit="volume">25</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="511" to="544" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Bringing pagerank to the citation analysis</title>
		<author>
			<persName><forename type="first">N</forename><surname>Ma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Guan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhao</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Information Processing &amp; Management</title>
		<imprint>
			<biblScope unit="volume">44</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="800" to="810" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Problems of citation analysis: A critical review</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">H</forename><surname>Macroberts</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">R</forename><surname>Macroberts</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of the American Society for information Science</title>
		<imprint>
			<biblScope unit="volume">40</biblScope>
			<biblScope unit="issue">5</biblScope>
			<biblScope unit="page" from="342" to="349" />
			<date type="published" when="1989">1989</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">The matthew effect in science, ii: Cumulative advantage and the symbolism of intellectual property</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">K</forename><surname>Merton</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">isis</title>
		<imprint>
			<biblScope unit="volume">79</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="606" to="623" />
			<date type="published" when="1988">1988</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m" type="main">Efficient estimation of word representations in vector space</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Corrado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dean</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1301.3781</idno>
		<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Distributed representations of words and phrases and their compositionality</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">S</forename><surname>Corrado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dean</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in neural information processing systems</title>
				<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="3111" to="3119" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Some results on the function and quality of citations</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Moravcsik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Murugesan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Social studies of science</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="86" to="92" />
			<date type="published" when="1975">1975</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Citation analysis</title>
		<author>
			<persName><forename type="first">J</forename><surname>Nicolaisen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Annual review of information science and technology</title>
		<imprint>
			<biblScope unit="volume">41</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="609" to="641" />
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Deepwalk: Online learning of social representations</title>
		<author>
			<persName><forename type="first">B</forename><surname>Perozzi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Al-Rfou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Skiena</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining</title>
				<meeting>the 20th ACM SIGKDD international conference on Knowledge discovery and data mining</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="701" to="710" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Citation influence for journal aggregates of scientific publications: Theory, with application to the literature of physics</title>
		<author>
			<persName><forename type="first">G</forename><surname>Pinski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Narin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Information processing &amp; management</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="issue">5</biblScope>
			<biblScope unit="page" from="297" to="312" />
			<date type="published" when="1976">1976</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Influence in time-dependent citation networks</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">E</forename><surname>Rakoczy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bouzeghoub</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">L</forename><surname>Gancarski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Wegrzyn-Wolska</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">12th International Conference on Research Challenges in Information Science (RCIS)</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2018">2018. 2018</date>
			<biblScope unit="page" from="1" to="11" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Cito, the citation typing ontology</title>
		<author>
			<persName><forename type="first">D</forename><surname>Shotton</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of biomedical semantics</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page">S6</biblScope>
			<date type="published" when="2010">2010</date>
			<publisher>BioMed Central</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Arnetminer: extraction and mining of academic social networks</title>
		<author>
			<persName><forename type="first">J</forename><surname>Tang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Yao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Su</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining</title>
				<meeting>the 14th ACM SIGKDD international conference on Knowledge discovery and data mining</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="990" to="998" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Scientific credit diffusion: Researcher level or paper level?</title>
		<author>
			<persName><forename type="first">H</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Shen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Cheng</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Scientometrics</title>
		<imprint>
			<biblScope unit="volume">109</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="827" to="837" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Quantifying the influence of scientists and their publications: distinguishing between prestige and popularity</title>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">B</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Lü</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Li</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">New Journal of Physics</title>
		<imprint>
			<biblScope unit="volume">14</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page">33033</biblScope>
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Citation analysis and the complex problem of intellectual influence</title>
		<author>
			<persName><forename type="first">H</forename><surname>Zuckerman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Scientometrics</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="issue">5-6</biblScope>
			<biblScope unit="page" from="329" to="338" />
			<date type="published" when="1987">1987</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
