=Paper=
{{Paper
|id=Vol-3745/paper20
|storemode=property
|title=Understanding Citation Mobility in the Knowledge Space
|pdfUrl=https://ceur-ws.org/Vol-3745/paper20.pdf
|volume=Vol-3745
|authors=Shuang Zhang,Feifan Liu,Haoxiang Xia
|dblpUrl=https://dblp.org/rec/conf/eeke/ZhangLX24
}}
==Understanding Citation Mobility in the Knowledge Space==
Understanding Citation Mobility in the Knowledge Space⋆ Shuang Zhang1, Feifan Liu1,2 and Haoxiang Xia1,2,∗ 1 Institute of Systems Engineering, Dalian University of Technology, No. 2 Linggong Road, Dalian, 116024, P.R. China 2 Institute for Advanced Intelligence, Dalian University of Technology, No. 2 Linggong Road, Dalian, 116024, P.R. China Abstract Despite persistent efforts to reveal the temporal patterns of citation dynamics, little is known about its spatial patterns in knowledge space, owing to the unquantifiability of citation diffusion in the virtual high-dimensional space. Here, drawing on millions of papers in the Physics field, we consider individual papers’ citation sequences as a mobility process and track trajectories with embedding methods learning the semantic proximity. We first quantify the spatial scale of citation mobility and find Gaussian-distributed citation scope and exponentially-distributed citing embedding distance, indicating the constrained mobility of citations. Simulations with the Gravity model and Radiation model further confirm that epistemic distance and popularity are key push-and-pull factors, respectively, in citation mobility. It is then found that compared with high-cited papers, disruptive papers are more likely to receive distant recognition. As science evolves, papers nowadays make narrower citation mobility than those in earlier decades. These findings provide insights into understanding the diversified knowledge diffusion and scientific innovation efficiency. Keywords citation dynamics, spatial patterns, knowledge diffusion, disruptive innovation 1. Introduction quantitatively revealed. Moreover, “sleeping beauties” whose atypical citation dynamics have been explored Citations encapsulate the dynamics of ideas in terms of identification and awakening circulation, unfolding both in temporal and spatial mechanism[13]. However, despite the fruitful efforts dimensions in the abstract knowledge space[1]. on the temporal aspects of the citation dynamics, our Extensive research has delved into citation patterns at understanding of the spatial dimension remains levels from the paper[2], author[3], discipline[4], to limited. nation[5]. For individual papers, despite the diversity On collective level, citations signify collective of citation profiles[6], researchers attempt to attention. Albeit with the explosion of papers and quantify[2], model[7], and predict[8] citation citation inflation[14], we find that citations are dynamics. Key drivers of citation dynamics, including increasingly concentrated on elite scientists[15] and preferential attachment, aging, and fitness[2] have top papers[16], leaving new publications less likely to been identified. Universal patterns, such as scale laws be recognized[17]. Growing citation inequality in citation distributions[9], first mover effect[10], indicates a narrowing and decaying scientific citation probability decreasing with papers’ age[11], attention, exacerbating the stratification of the and “jump-decay” patterns[12] have been scientific system and entrenching science trapped in Joint Workshop of the 5th Extraction and Evaluation of Knowledge Entities from Scientific Documents and the 4th AI + Informetrics (EEKE-AII2024), April 23~24, 2024, Changchun, China and Online ∗ Corresponding author. hxxia@dlut.edu.cn (H. Xia) shuang94@mail.dlut.edu.cn (S. Zhang) liufeifan@dlut.edu.cn (F. Liu) © Copyright 2024 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings 128 existing norms[12,18]. This narrowing attention 2.2. Construction of citation trajectories on phenomenon warrants detailed investigation through the epistemic landscape the lens of a holistic knowledge landscape. Papers receive citations spanning different We develop a framework, which combines epistemic distances. On citation dynamics of representative learning algorithms and manifold individual papers in the knowledge space, similar learning algorithms, for the construction of the studies focus on mapping structure and evolution of quantifiable disciplinary knowledge landscape based disciplines with citation flows[19], associations on semantics association. Unlike citation networks between interdisciplinary citations and novelty[20], merely representing the topological connections of and measuring the breadth and depth of impact by elements, this landscape provides a continuous examining textual proximity between citing distance scale, allowing for the tracking and papers[21]. However, exsiting studies remains quantifying of citation trajectories of individual inadequate for quantifying the knowledge aspect of papers. citation trajectories due to their abstract nature and Here, we employ the Doc2Vec algorithm[24], high dimensionality. capturing the semantics of content, and the popular Major obstacles in large-scale quantitative UMAP algorithm[25] preserving the global and local investigations on individual papers’ citation dynamics topology in dimension reduction. The majority of in knowledge space are the inability to track architectures and hyperparameters we utilized were trajectories and the lack of an appropriate set to their default values throughout the model quantitative metric for this dynamical progress. It is training process. unclear how papers diffuse impact and ideas in the Figure 1 illustrates the proposed framework for knowledge space over lifecycles. constructing the knowledge landscape. After building Here, we regard the sequential citation process of the corpus with the title and abstract, we train the papers as mobility on a quantifiable epistemic Doc2vec model to obtain semantic vectors of papers. landscape and use machine-learning techniques to The UMAP algorithm is subsequently applied to trace the trajectories. In this manner, we introduce the project the semantic vectors into a two-dimensional theoretical and methodological framework of space based on their cosine distance. Finally, we geospatial human mobility to characterize citation obtain the coordinates of each paper and the mobility. Some key research questions are epistemic landscape. Thus, the citation trajectories of quantitatively analyzed. First, we explore the spatial individual papers are traced by mapping their citation scale characteristics and collective-level mechanisms sequences onto this landscape. of citation mobility. Second, we probe whether different types of novel papers exhibit diversified spatial patterns. Third, evolutionary patterns of citation mobility over decades are checked. 2. Data and methods 2.1. Data This study focuses on the discipline of Physics. The dataset used is SciSciNet[22], a large-scale scientific Figure 1: Illustration for constructing the epistemic dataset built on MAG[23], covering over 134 million landscape and citation trajectories based on the scientific publications up to the year 2021. semantic proximity embedded in the textual content Using the “fields of study” classification, we of papers extract 3,263,546 papers labeled "Physics". Then we select focal papers satisfying: (i) number of citations 2.3. Radius of gyration and jump lengths no less than 10, to ensure sufficient trajectory points Two indicators are applied to characterize the spatial for quantification; (ii) citation history spanning at scale of citation mobility[26,27]. The radius of least 10 years, to ensure sufficient timespans to gyration (rg) refers to the typical distance from capture spatiotemporal patterns; (iii) receiving at individual trajectories from their centroid of mass. least one citation every five years, to exclude noisy The jump length (∆r) measures the epistemic distance data. Finally, we obtain 214,867 focal papers. between a citing-cited pair. 129 In the context of citation mobility, rg is applied to 2d intuitively shows one’s trajectory is locally measure the degree to which one’s citations are distributed. concentrated or dispersed. ∆r quantifies the research proximity of the focal paper to its citing papers. 1 𝑁 𝑖=1(𝒓𝑖 − 𝒓𝑐𝑚 ) , 𝑟𝑐𝑚 = ∑𝑖=1 𝒓𝑖 /𝑁 𝑟𝑔 = √ ∑𝑁 (1) 2 𝑁 ∆𝑟 = 𝒓𝑖 − 𝒓0 (2) In formulas (1-2), r0 is the coordinates of the focal paper; ri and ri-1 are the coordinates of its ith and (i- 1)th citing paper; rcm is the centroid of the N citing papers. 2.4. Gravity model and Radiation model The distance-based Gravity model, and the opportunity-based Radiation model, are introduced to characterize aggregated citation flows on the epistemic landscape. These two classical population- level models depict distinct flow generation mechanisms and could reveal key drivers of citation flows in terms of research popularity, knowledge distance, and opportunities. In citation scenarios, Gravity models assume flows Figure 2: Visualization of individual papers’ citation between two locations are proportional to research mobility on the epistemic landscape hotness and decay with knowledge distance[28]. Radiation models assume movement probability of We quantify spatiotemporal characteristics with citations is proportional to destination opportunities two indicators. The citing epistemic distance ∆r is and inversely proportional to intervening more approximated by an exponential function, than opportunities[29]. power-law (Fig. 3a). It indicates that papers are likely 𝑇ⅈ𝑗 ∝ 𝑚ⅈ 𝑚𝑗 𝑓(𝑟ⅈ𝑗 ) (3) to receive massive short-distanced citations and a few 𝑚ⅈ 𝑚𝑗 longer-distanced ones. Then, the radius of gyration rg 𝑇ⅈ𝑗 = 𝑂ⅈ (4) (𝑚ⅈ +𝑠ⅈ𝑗 )(𝑚ⅈ +𝑚𝑗 +𝑠 ) ⅈ𝑗 approximates lognormal distribution, suggesting the where Tⅈj is citation flows from tile i of the citing paper narrower impact of most papers and the broader to tile j of the focal paper. mi and mj are the paper impact of a few papers (Fig. 3b). These findings density in tile i and j; f(rij) is the distance function indicate that both citing distance and overall impact modeled with power-law form. Oi represents flows scope follow the typical scale variation in citation from tile i; sij is the number of intervening mobility, in contrast to the fat-tailed spatial scale opportunities (paper density) between tile i to j. displayed by human mobility in the biological Model performance is assessed with metrics: R2, world[26,27,30]. RMSE, Spearman, and Pearson correlations. Furthermore, we note the more citations papers receive, the wider their impact scope (Fig. 3d). However, exponentially distributed citing distance 3. Results and lognormal-distributed citation concentration are 3.1. The spatial characteristics of independent of the number of citations (Fig. 3c&d). In trajectories of citation mobility a word, we observe constrained mobility of citations in the knowledge space. We start by visualizing the individual papers’ citation trajectories on the epistemic landscape. In Fig. 2c, paper points are clustered and semantically distributed, depicting the knowledge structure. After mapping citation dynamics (Fig. 2a) of papers onto the epistemic landscape, we find citations are not homogeneous, as they span different knowledge distances (Fig. 2b). However, the visualization in Fig. 130 recognition, and disruptiveness, and measure them with the number of citations, sleeping beauty coefficient[13], and disruption index[31], respectively. The top 10% of papers by each metric are identified as highly cited, sleeping beauties, and disruptive papers (Fig. 5a). Figure 3: Empirical distribution of spatiotemporal characteristics of citation trajectories 3.2. The Gravity and Radiation modeling in citation mobility To further delineate the observed narrow movements, we use the classic Gravity model and Radiation model to fit the aggregated flow of citation mobility. After Figure 5: The rg and ∆r of citation trajectories of high- discretizing the Physics epistemic landscape to a cited papers, sleeping beauties, and disruptive papers. spatial tessellation, we aggregate individual ****p≤.0001, ***: p≤ .001, ns: p ≥ 0.05 trajectories into origin-destination citation flows. Most citation flows are intra-flows and only inter- We observe that these three representative novel flows between two different grids are used to employ papers with a low degree of overlap (Fig. 5a), have parameter fitting and flow generation. above-average impact scopes, with disruptive papers standing out in particular (Fig. 5b). The finding that sleeping beauties with a broader impact is in line with their interdisciplinary nature [13]. We further examine the citing distance in the first year post-publication. The consistent patterns observed in Fig. 5c reinforce our previous findings. It suggests that compared with the influential high-cited papers, sleeping beauties and high-disruptive papers Figure 4: Actual and simulated citation flows promptly attract attention from more distant generated by Gravity and Radiation models knowledge communities once published. Fig. 4 shows the simulated results. It could be seen 3.4. Evolution of citation mobility that the gravity model outperforms the radiation model, especially for long-distance flows. This Finally, we group focal papers into different decades suggests that epistemic distance and popularity are according to their publication year to investigate how key factors in citation behavior, whereas the research citation mobility evolved over decades. gap representing potential research intersection area, The first finding is that papers nowadays make is not significant in attracting citations. more restricted mobility than those in the early years, as shown in Fig. 6a. To rule out the possibility that this 3.3. Comparisons of high-cited, sleeping result is due to semantic differences between papers beauties, and disruptive papers from different decades, we analyze the citing distance of citing pairs with one year gap. In Fig. 6b, the The further question is how citation mobility differs observed decrease in the trend of citing distance over across papers with various types of novelty. We focus publication years indicates the narrowing of literature on three attributes of papers: popularity, delayed use. These two results suggest a possible shorter- 131 sightedness for scientists’ information foraging [2] D. Wang, C. Song, A.L.A. Barabási, Quantifying nowadays. long-term scientific impact, Science 342 (6154) (2013) 127-133. doi:10.1126/science.1237825. [3] R. Sinatra, D. Wang, P. Deville, C. Song, A.L. Barabasi, Quantifying the evolution of individual scientific impact, Science 354 (6312) (2016). doi:10.1126/science.aaf5239. [4] R.K. Pan, S. Sinha, K. Kaski, J. Saramäki, The evolution of interdisciplinarity in physics research, Sci. Rep. 2 (1) (2012). doi:10.1038/srep00551. Figure 6: Spatial characteristics of citation [5] R.K. Pan, K. Kaski, S. Fortunato, World citation and trajectories in different decades collaboration networks: uncovering the role of geography in science, Sci. Rep. 2 (1) (2012). 4. Conclusion and discussion doi:10.1038/srep00902. [6] A. Avramescu, Actuality and obsolescence of An empirically detailed investigation of the spatial scientific literature, Journal of the American pattern of papers’ citation mobility in knowledge Society for Information Science 30 (5) (1979) space is indispensable for understanding knowledge 296-303. doi:10.1002/asi.4630300509 diffusion. In this study, we trace and quantify [7] Y.H. Eom, S. Fortunato, Characterizing and individual papers’ citation sequences on the epistemic modeling citation dynamics, Plos One 6 (9) (2011) e24926. doi:10.1371/journal.pone.0024926. landscape based on semantic proximity. [8] A. Abrishami, S. Aliakbary, Predicting citation We primarily examine two spatial scale counts based on deep neural network learning characteristics and observe the overall conserved techniques, J. Informetr. 13 (2) (2019) 485-499. citation mobility independent of citation counts, doi:10.1016/j.joi.2019.02.011. which is distinct from the fat-tail characteristics [9] M. Golosovsky, S. Solomon, Runaway events displayed in human mobility. By applying the Gravity dominate the heavy tail of citation distributions, model, epistemic distance and popularity are The European Physical Journal Special Topics identified as two key divers. Next, compared with 205 (1) (2012) 303-311. high-cited papers, disruptive and sleeping beauties doi:10.1140/epjst/e2012-01576-4. present wider citation mobile scopes. Finally, current [10] M.E.J. Newman, The first-mover advantage in scientific publication, Epl 86 (6) (2009). papers have narrower mobility than earlier papers, doi:10.1209/0295-5075/86/68001. reflecting more myopic information foraging in [11] M. Golosovsky, S. Solomon, Stochastic dynamical current scientific practice. model of a growing citation network based on a Several research extensions can be performed. self-exciting point process, Phys. Rev. Lett. 109 Further with a whole picture of science, citation (2012) 98701. mobilities within and across disciplines could be doi:10.1103/PhysRevLett.109.098701. explored, gaining more comprehensive insights. The [12] P.D.B. Parolo, R.K. Pan, R. Ghosh, B.A. Huberman, framework could be applied to patents, open-source K. Kaski, S. Fortunato, Attention decay in science, software, and online searching behavior. J. Informetr. 9 (4) (2015) 734-745. doi:10.1016/j.joi.2015.07.006. [13] Q. Ke, E. Ferrara, F. Radicchi, A. Flammini, Acknowledgements Defining and identifying sleeping beauties in This work is supported by the National Natural science, Proc. Natl. Acad. Sci. U. S. A. 112 (24) Science Foundation of China (Grant No. 71871042 and (2015) 7426-7431. doi:10.1073/pnas.1424329112. 72371052). [14] A.M. Petersen, R.K. Pan, F. Pammolli, S. Fortunato, Methods to account for citation inflation in References research evaluation, Res. Policy 48 (7) (2019) 1855-1865. doi:10.1016/j.respol.2019.04.009. [1] S. Fortunato, C.T. Bergstrom, K. Boerner, J.A. [15] M.W. Nielsen, J.P. Andersen, Global citation Evans, D. Helbing, S. Milojevic, A.M. Petersen, F. inequality is on the rise, Proceedings of the Radicchi, R. Sinatra, B. Uzzi, A. Vespignani, L. National Academy of Sciences 118 (7) (2021). Waltman, D. Wang, A. Barabasi, Science of science, doi:10.1073/pnas.2012208118. Science 359 (2018). [16] A. Varga, The narrowing of literature use and the doi:10.1126/science.aao0185. restricted mobility of papers in the sciences, 132 Proceedings of the National Academy of Sciences (7182) (2008) 1098-1102. 119 (17) (2022). doi:10.1073/pnas.2117488119. doi:10.1038/nature06518. [17] J.S.G. Chu, J.A. Evans, Slowed canonical progress in [31] L. Wu, D. Wang, J.A. Evans, Large teams develop large fields of science, Proceedings of the National and small teams disrupt science and technology, Academy of Sciences 118 (41) (2021). Nature (2019). doi:10.1038/s41586-019-0941-9. doi:10.1073/pnas.2021636118. [18] R.K. Pan, A.M. Petersen, F. Pammolli, S. Fortunato, The memory of science: inflation, myopia, and the knowledge network, J. Informetr. 12 (3) (2018) 656-678. doi:10.1016/j.joi.2018.06.005. [19] R. Sinatra, P. Deville, M. Szell, D. Wang, A. Barabsi, A century of physics, Nat. Phys. 11 (10) (2015) 791-796. doi:10.1038/nphys3494. [20] Y. Bu, L. Waltman, Y. Huang, A multidimensional framework for characterizing the citation impact of scientific publications, Quant. Sci. Stud. 2 (1) (2021) 155-183. doi:10.1162/qss_a_00109. [21] V. Larivière, S. Haustein, K. Börner, Long-distance interdisciplinarity leads to higher scientific impact, Plos One 10 (3) (2015) e122565, . doi:10.1371/journal.pone.0122565. [22] Z. Lin, Y. Yin, L. Liu, D. Wang, Sciscinet: a large- scale open data lake for the science of science research, Sci. Data 10 (1) (2023). doi:10.1038/s41597-023-02198-9. [23] Z. Shen, H. Ma, K. Wang, A web-scale system for scientific knowledge exploration, Melbourne, Australia, 2018, pp. 87-92. [24] Q. Le, T. Mikolov, Distributed representations of sentences and documents, Proceedings of Machine Learning Research, Bejing, China, 2014, pp. 1188-1196. [25] L. Mcinnes, J. Healy, N. Saul, L. Großberger, Umap: uniform manifold approximation and projection, Journal of Open Source Software 3 (29) (2018) 861. doi:10.21105/joss.00861. [26] M.C. González, C.A. Hidalgo, A.L. Barabási, Understanding individual human mobility patterns, Nature 453 (2008) 779-782. doi:10.1038/nature. [27] C. Song, T. Koren, P. Wang, A. Barabási, Modelling the scaling properties of human mobility, Nat. Phys. 6 (10) (2010) 818-823. doi:10.1038/nphys1760. [28] M. Lenormand, A. Bassolas, J.J. Ramasco, Systematic comparison of trip distribution laws and models, J. Transp. Geogr. 51 (2016) 158-169. doi:10.1016/j.jtrangeo.2015.12.008. [29] F. Simini, M.C. González, A. Maritan, A. Barabási, A universal model for mobility and migration patterns, Nature 484 (7392) (2012) 96-100. doi:10.1038/nature10856. [30] D.W. Sims, E.J. Southall, N.E. Humphries, G.C. Hays, C.J.A. Bradshaw, J.W. Pitchford, A. James, M.Z. Ahmed, A.S. Brierley, M.A. Hindell, D. Morritt, M.K. Musyl, D. Righton, E.L.C. Shepard, V.J. Wearmouth, R.P. Wilson, M.J. Witt, J.D. Metcalfe, Scaling laws of marine predator search behaviour, Nature 451 133