=Paper= {{Paper |id=Vol-3745/paper20 |storemode=property |title=Understanding Citation Mobility in the Knowledge Space |pdfUrl=https://ceur-ws.org/Vol-3745/paper20.pdf |volume=Vol-3745 |authors=Shuang Zhang,Feifan Liu,Haoxiang Xia |dblpUrl=https://dblp.org/rec/conf/eeke/ZhangLX24 }} ==Understanding Citation Mobility in the Knowledge Space== https://ceur-ws.org/Vol-3745/paper20.pdf
                                Understanding Citation Mobility in the Knowledge Space⋆
                                Shuang Zhang1, Feifan Liu1,2 and Haoxiang Xia1,2,∗

                                1 Institute of Systems Engineering, Dalian University of Technology, No. 2 Linggong Road, Dalian, 116024, P.R.

                                China
                                2 Institute for Advanced Intelligence, Dalian University of Technology, No. 2 Linggong Road, Dalian, 116024, P.R.

                                China


                                                    Abstract
                                                    Despite persistent efforts to reveal the temporal patterns of citation dynamics, little is known
                                                    about its spatial patterns in knowledge space, owing to the unquantifiability of citation diffusion
                                                    in the virtual high-dimensional space. Here, drawing on millions of papers in the Physics field, we
                                                    consider individual papers’ citation sequences as a mobility process and track trajectories with
                                                    embedding methods learning the semantic proximity. We first quantify the spatial scale of
                                                    citation mobility and find Gaussian-distributed citation scope and exponentially-distributed
                                                    citing embedding distance, indicating the constrained mobility of citations. Simulations with the
                                                    Gravity model and Radiation model further confirm that epistemic distance and popularity are
                                                    key push-and-pull factors, respectively, in citation mobility. It is then found that compared with
                                                    high-cited papers, disruptive papers are more likely to receive distant recognition. As science
                                                    evolves, papers nowadays make narrower citation mobility than those in earlier decades. These
                                                    findings provide insights into understanding the diversified knowledge diffusion and scientific
                                                    innovation efficiency.

                                                    Keywords
                                                    citation dynamics, spatial patterns, knowledge diffusion, disruptive innovation



                                1. Introduction                                                                           quantitatively revealed. Moreover, “sleeping beauties”
                                                                                                                          whose atypical citation dynamics have been explored
                                Citations encapsulate the dynamics of ideas                                               in terms of identification and awakening
                                circulation, unfolding both in temporal and spatial                                       mechanism[13]. However, despite the fruitful efforts
                                dimensions in the abstract knowledge space[1].                                            on the temporal aspects of the citation dynamics, our
                                Extensive research has delved into citation patterns at                                   understanding of the spatial dimension remains
                                levels from the paper[2], author[3], discipline[4], to                                    limited.
                                nation[5]. For individual papers, despite the diversity                                       On collective level, citations signify collective
                                of citation profiles[6], researchers attempt to                                           attention. Albeit with the explosion of papers and
                                quantify[2], model[7], and predict[8] citation                                            citation inflation[14], we find that citations are
                                dynamics. Key drivers of citation dynamics, including                                     increasingly concentrated on elite scientists[15] and
                                preferential attachment, aging, and fitness[2] have                                       top papers[16], leaving new publications less likely to
                                been identified. Universal patterns, such as scale laws                                   be recognized[17]. Growing citation inequality
                                in citation distributions[9], first mover effect[10],                                     indicates a narrowing and decaying scientific
                                citation probability decreasing with papers’ age[11],                                     attention, exacerbating the stratification of the
                                and “jump-decay” patterns[12] have been                                                   scientific system and entrenching science trapped in



                                Joint Workshop of the 5th Extraction and Evaluation of Knowledge
                                Entities from Scientific Documents and the 4th AI + Informetrics
                                (EEKE-AII2024), April 23~24, 2024, Changchun, China and Online
                                ∗ Corresponding author.

                                   hxxia@dlut.edu.cn (H. Xia)
                                shuang94@mail.dlut.edu.cn (S. Zhang)
                                liufeifan@dlut.edu.cn (F. Liu)
                                              © Copyright 2024 for this paper by its authors. Use permitted under
                                              Creative Commons License Attribution 4.0 International (CC BY 4.0).




CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings

                                                                                                                    128
existing norms[12,18]. This narrowing attention                  2.2. Construction of citation trajectories on
phenomenon warrants detailed investigation through                       the epistemic landscape
the lens of a holistic knowledge landscape.
    Papers receive citations spanning different                  We develop a framework, which combines
epistemic distances. On citation dynamics of                     representative learning algorithms and manifold
individual papers in the knowledge space, similar                learning algorithms, for the construction of the
studies focus on mapping structure and evolution of              quantifiable disciplinary knowledge landscape based
disciplines with citation flows[19], associations                on semantics association. Unlike citation networks
between interdisciplinary citations and novelty[20],             merely representing the topological connections of
and measuring the breadth and depth of impact by                 elements, this landscape provides a continuous
examining textual proximity between citing                       distance scale, allowing for the tracking and
papers[21]. However, exsiting studies remains                    quantifying of citation trajectories of individual
inadequate for quantifying the knowledge aspect of               papers.
citation trajectories due to their abstract nature and               Here, we employ the Doc2Vec algorithm[24],
high dimensionality.                                             capturing the semantics of content, and the popular
    Major obstacles in large-scale quantitative                  UMAP algorithm[25] preserving the global and local
investigations on individual papers’ citation dynamics           topology in dimension reduction. The majority of
in knowledge space are the inability to track                    architectures and hyperparameters we utilized were
trajectories and the lack of an appropriate                      set to their default values throughout the model
quantitative metric for this dynamical progress. It is           training process.
unclear how papers diffuse impact and ideas in the                   Figure 1 illustrates the proposed framework for
knowledge space over lifecycles.                                 constructing the knowledge landscape. After building
    Here, we regard the sequential citation process of           the corpus with the title and abstract, we train the
papers as mobility on a quantifiable epistemic                   Doc2vec model to obtain semantic vectors of papers.
landscape and use machine-learning techniques to                 The UMAP algorithm is subsequently applied to
trace the trajectories. In this manner, we introduce the         project the semantic vectors into a two-dimensional
theoretical and methodological framework of                      space based on their cosine distance. Finally, we
geospatial human mobility to characterize citation               obtain the coordinates of each paper and the
mobility. Some key research questions are                        epistemic landscape. Thus, the citation trajectories of
quantitatively analyzed. First, we explore the spatial           individual papers are traced by mapping their citation
scale characteristics and collective-level mechanisms            sequences onto this landscape.
of citation mobility. Second, we probe whether
different types of novel papers exhibit diversified
spatial patterns. Third, evolutionary patterns of
citation mobility over decades are checked.

2. Data and methods
2.1. Data
This study focuses on the discipline of Physics. The
dataset used is SciSciNet[22], a large-scale scientific          Figure 1: Illustration for constructing the epistemic
dataset built on MAG[23], covering over 134 million              landscape and citation trajectories based on the
scientific publications up to the year 2021.                     semantic proximity embedded in the textual content
    Using the “fields of study” classification, we               of papers
extract 3,263,546 papers labeled "Physics". Then we
select focal papers satisfying: (i) number of citations          2.3. Radius of gyration and jump lengths
no less than 10, to ensure sufficient trajectory points          Two indicators are applied to characterize the spatial
for quantification; (ii) citation history spanning at            scale of citation mobility[26,27]. The radius of
least 10 years, to ensure sufficient timespans to                gyration (rg) refers to the typical distance from
capture spatiotemporal patterns; (iii) receiving at              individual trajectories from their centroid of mass.
least one citation every five years, to exclude noisy            The jump length (∆r) measures the epistemic distance
data. Finally, we obtain 214,867 focal papers.                   between a citing-cited pair.




                                                           129
   In the context of citation mobility, rg is applied to            2d intuitively shows one’s trajectory is locally
measure the degree to which one’s citations are                     distributed.
concentrated or dispersed. ∆r quantifies the research
proximity of the focal paper to its citing papers.
             1                       𝑁
             𝑖=1(𝒓𝑖 − 𝒓𝑐𝑚 ) , 𝑟𝑐𝑚 = ∑𝑖=1 𝒓𝑖 /𝑁
     𝑟𝑔 = √ ∑𝑁                                         (1)
                           2
            𝑁
                       ∆𝑟 = 𝒓𝑖 − 𝒓0                   (2)
    In formulas (1-2), r0 is the coordinates of the focal
paper; ri and ri-1 are the coordinates of its ith and (i-
1)th citing paper; rcm is the centroid of the N citing
papers.

2.4. Gravity model and Radiation model
The distance-based Gravity model, and the
opportunity-based Radiation model, are introduced to
characterize aggregated citation flows on the
epistemic landscape. These two classical population-
level models depict distinct flow generation
mechanisms and could reveal key drivers of citation
flows in terms of research popularity, knowledge
distance, and opportunities.
    In citation scenarios, Gravity models assume flows
                                                                    Figure 2: Visualization of individual papers’ citation
between two locations are proportional to research
                                                                    mobility on the epistemic landscape
hotness and decay with knowledge distance[28].
Radiation models assume movement probability of                         We quantify spatiotemporal characteristics with
citations is proportional to destination opportunities              two indicators. The citing epistemic distance ∆r is
and     inversely    proportional     to    intervening             more approximated by an exponential function, than
opportunities[29].                                                  power-law (Fig. 3a). It indicates that papers are likely
                      𝑇ⅈ𝑗 ∝ 𝑚ⅈ 𝑚𝑗 𝑓(𝑟ⅈ𝑗 )              (3)          to receive massive short-distanced citations and a few
                                    𝑚ⅈ 𝑚𝑗                           longer-distanced ones. Then, the radius of gyration rg
                 𝑇ⅈ𝑗 = 𝑂ⅈ                              (4)
                            (𝑚ⅈ +𝑠ⅈ𝑗 )(𝑚ⅈ +𝑚𝑗 +𝑠 )
                                               ⅈ𝑗
                                                                    approximates lognormal distribution, suggesting the
where Tⅈj is citation flows from tile i of the citing paper         narrower impact of most papers and the broader
to tile j of the focal paper. mi and mj are the paper               impact of a few papers (Fig. 3b). These findings
density in tile i and j; f(rij) is the distance function            indicate that both citing distance and overall impact
modeled with power-law form. Oi represents flows                    scope follow the typical scale variation in citation
from tile i; sij is the number of intervening                       mobility, in contrast to the fat-tailed spatial scale
opportunities (paper density) between tile i to j.                  displayed by human mobility in the biological
Model performance is assessed with metrics: R2,                     world[26,27,30].
RMSE, Spearman, and Pearson correlations.                               Furthermore, we note the more citations papers
                                                                    receive, the wider their impact scope (Fig. 3d).
                                                                    However, exponentially distributed citing distance
3. Results
                                                                    and lognormal-distributed citation concentration are
3.1. The spatial characteristics of                                 independent of the number of citations (Fig. 3c&d). In
        trajectories of citation mobility                           a word, we observe constrained mobility of citations
                                                                    in the knowledge space.
We start by visualizing the individual papers’ citation
trajectories on the epistemic landscape. In Fig. 2c,
paper points are clustered and semantically
distributed, depicting the knowledge structure. After
mapping citation dynamics (Fig. 2a) of papers onto
the epistemic landscape, we find citations are not
homogeneous, as they span different knowledge
distances (Fig. 2b). However, the visualization in Fig.




                                                              130
                                                                 recognition, and disruptiveness, and measure them
                                                                 with the number of citations, sleeping beauty
                                                                 coefficient[13], and disruption index[31], respectively.
                                                                 The top 10% of papers by each metric are identified
                                                                 as highly cited, sleeping beauties, and disruptive
                                                                 papers (Fig. 5a).




Figure 3: Empirical distribution of spatiotemporal
characteristics of citation trajectories

3.2. The Gravity and Radiation modeling in
         citation mobility
To further delineate the observed narrow movements,
we use the classic Gravity model and Radiation model
to fit the aggregated flow of citation mobility. After           Figure 5: The rg and ∆r of citation trajectories of high-
discretizing the Physics epistemic landscape to a                cited papers, sleeping beauties, and disruptive papers.
spatial tessellation, we aggregate individual                    ****p≤.0001, ***: p≤ .001, ns: p ≥ 0.05
trajectories into origin-destination citation flows.
Most citation flows are intra-flows and only inter-                  We observe that these three representative novel
flows between two different grids are used to employ             papers with a low degree of overlap (Fig. 5a), have
parameter fitting and flow generation.                           above-average impact scopes, with disruptive papers
                                                                 standing out in particular (Fig. 5b). The finding that
                                                                 sleeping beauties with a broader impact is in line with
                                                                 their interdisciplinary nature [13].
                                                                     We further examine the citing distance in the first
                                                                 year post-publication. The consistent patterns
                                                                 observed in Fig. 5c reinforce our previous findings. It
                                                                 suggests that compared with the influential high-cited
                                                                 papers, sleeping beauties and high-disruptive papers
Figure 4: Actual and simulated citation flows                    promptly attract attention from more distant
generated by Gravity and Radiation models                        knowledge communities once published.

    Fig. 4 shows the simulated results. It could be seen         3.4. Evolution of citation mobility
that the gravity model outperforms the radiation
model, especially for long-distance flows. This                  Finally, we group focal papers into different decades
suggests that epistemic distance and popularity are              according to their publication year to investigate how
key factors in citation behavior, whereas the research           citation mobility evolved over decades.
gap representing potential research intersection area,                The first finding is that papers nowadays make
is not significant in attracting citations.                      more restricted mobility than those in the early years,
                                                                 as shown in Fig. 6a. To rule out the possibility that this
3.3. Comparisons of high-cited, sleeping                         result is due to semantic differences between papers
        beauties, and disruptive papers                          from different decades, we analyze the citing distance
                                                                 of citing pairs with one year gap. In Fig. 6b, the
The further question is how citation mobility differs            observed decrease in the trend of citing distance over
across papers with various types of novelty. We focus            publication years indicates the narrowing of literature
on three attributes of papers: popularity, delayed               use. These two results suggest a possible shorter-




                                                           131
sightedness for scientists’ information foraging               [2] D. Wang, C. Song, A.L.A. Barabási, Quantifying
nowadays.                                                           long-term scientific impact, Science 342 (6154)
                                                                    (2013) 127-133. doi:10.1126/science.1237825.
                                                               [3] R. Sinatra, D. Wang, P. Deville, C. Song, A.L.
                                                                    Barabasi, Quantifying the evolution of individual
                                                                    scientific impact, Science 354 (6312) (2016).
                                                                    doi:10.1126/science.aaf5239.
                                                               [4] R.K. Pan, S. Sinha, K. Kaski, J. Saramäki, The
                                                                    evolution of interdisciplinarity in physics
                                                                    research,      Sci.    Rep.    2    (1)     (2012).
                                                                    doi:10.1038/srep00551.
Figure 6: Spatial characteristics        of   citation         [5] R.K. Pan, K. Kaski, S. Fortunato, World citation and
trajectories in different decades                                   collaboration networks: uncovering the role of
                                                                    geography in science, Sci. Rep. 2 (1) (2012).
4. Conclusion and discussion                                        doi:10.1038/srep00902.
                                                               [6] A. Avramescu, Actuality and obsolescence of
An empirically detailed investigation of the spatial                scientific literature, Journal of the American
pattern of papers’ citation mobility in knowledge                   Society for Information Science 30 (5) (1979)
space is indispensable for understanding knowledge                  296-303. doi:10.1002/asi.4630300509
diffusion. In this study, we trace and quantify                [7] Y.H. Eom, S. Fortunato, Characterizing and
individual papers’ citation sequences on the epistemic              modeling citation dynamics, Plos One 6 (9) (2011)
                                                                    e24926. doi:10.1371/journal.pone.0024926.
landscape based on semantic proximity.
                                                               [8] A. Abrishami, S. Aliakbary, Predicting citation
     We primarily examine two spatial scale
                                                                    counts based on deep neural network learning
characteristics and observe the overall conserved                   techniques, J. Informetr. 13 (2) (2019) 485-499.
citation mobility independent of citation counts,                   doi:10.1016/j.joi.2019.02.011.
which is distinct from the fat-tail characteristics            [9] M. Golosovsky, S. Solomon, Runaway events
displayed in human mobility. By applying the Gravity                dominate the heavy tail of citation distributions,
model, epistemic distance and popularity are                        The European Physical Journal Special Topics
identified as two key divers. Next, compared with                   205           (1)         (2012)          303-311.
high-cited papers, disruptive and sleeping beauties                 doi:10.1140/epjst/e2012-01576-4.
present wider citation mobile scopes. Finally, current         [10] M.E.J. Newman, The first-mover advantage in
                                                                    scientific publication, Epl 86 (6) (2009).
papers have narrower mobility than earlier papers,
                                                                    doi:10.1209/0295-5075/86/68001.
reflecting more myopic information foraging in
                                                               [11] M. Golosovsky, S. Solomon, Stochastic dynamical
current scientific practice.                                        model of a growing citation network based on a
     Several research extensions can be performed.                  self-exciting point process, Phys. Rev. Lett. 109
Further with a whole picture of science, citation                   (2012)                                       98701.
mobilities within and across disciplines could be                   doi:10.1103/PhysRevLett.109.098701.
explored, gaining more comprehensive insights. The             [12] P.D.B. Parolo, R.K. Pan, R. Ghosh, B.A. Huberman,
framework could be applied to patents, open-source                  K. Kaski, S. Fortunato, Attention decay in science,
software, and online searching behavior.                            J. Informetr. 9 (4) (2015) 734-745.
                                                                    doi:10.1016/j.joi.2015.07.006.
                                                               [13] Q. Ke, E. Ferrara, F. Radicchi, A. Flammini,
Acknowledgements                                                    Defining and identifying sleeping beauties in
This work is supported by the National Natural                      science, Proc. Natl. Acad. Sci. U. S. A. 112 (24)
Science Foundation of China (Grant No. 71871042 and                 (2015)                                 7426-7431.
                                                                    doi:10.1073/pnas.1424329112.
72371052).
                                                               [14] A.M. Petersen, R.K. Pan, F. Pammolli, S. Fortunato,
                                                                    Methods to account for citation inflation in
References                                                          research evaluation, Res. Policy 48 (7) (2019)
                                                                    1855-1865. doi:10.1016/j.respol.2019.04.009.
[1] S. Fortunato, C.T. Bergstrom, K. Boerner, J.A.
                                                               [15] M.W. Nielsen, J.P. Andersen, Global citation
    Evans, D. Helbing, S. Milojevic, A.M. Petersen, F.
                                                                    inequality is on the rise, Proceedings of the
    Radicchi, R. Sinatra, B. Uzzi, A. Vespignani, L.
                                                                    National Academy of Sciences 118 (7) (2021).
    Waltman, D. Wang, A. Barabasi, Science of science,
                                                                    doi:10.1073/pnas.2012208118.
    Science               359                 (2018).
                                                               [16] A. Varga, The narrowing of literature use and the
    doi:10.1126/science.aao0185.
                                                                    restricted mobility of papers in the sciences,




                                                         132
     Proceedings of the National Academy of Sciences                 (7182)            (2008)             1098-1102.
     119 (17) (2022). doi:10.1073/pnas.2117488119.                   doi:10.1038/nature06518.
[17] J.S.G. Chu, J.A. Evans, Slowed canonical progress in       [31] L. Wu, D. Wang, J.A. Evans, Large teams develop
     large fields of science, Proceedings of the National            and small teams disrupt science and technology,
     Academy of Sciences 118 (41) (2021).                            Nature (2019). doi:10.1038/s41586-019-0941-9.
     doi:10.1073/pnas.2021636118.
[18] R.K. Pan, A.M. Petersen, F. Pammolli, S. Fortunato,
     The memory of science: inflation, myopia, and the
     knowledge network, J. Informetr. 12 (3) (2018)
     656-678. doi:10.1016/j.joi.2018.06.005.
[19] R. Sinatra, P. Deville, M. Szell, D. Wang, A. Barabsi,
     A century of physics, Nat. Phys. 11 (10) (2015)
     791-796. doi:10.1038/nphys3494.
[20] Y. Bu, L. Waltman, Y. Huang, A multidimensional
     framework for characterizing the citation impact
     of scientific publications, Quant. Sci. Stud. 2 (1)
     (2021) 155-183. doi:10.1162/qss_a_00109.
[21] V. Larivière, S. Haustein, K. Börner, Long-distance
     interdisciplinarity leads to higher scientific
     impact, Plos One 10 (3) (2015) e122565, .
     doi:10.1371/journal.pone.0122565.
[22] Z. Lin, Y. Yin, L. Liu, D. Wang, Sciscinet: a large-
     scale open data lake for the science of science
     research,       Sci.   Data     10     (1)    (2023).
     doi:10.1038/s41597-023-02198-9.
[23] Z. Shen, H. Ma, K. Wang, A web-scale system for
     scientific knowledge exploration, Melbourne,
     Australia, 2018, pp. 87-92.
[24] Q. Le, T. Mikolov, Distributed representations of
     sentences and documents, Proceedings of
     Machine Learning Research, Bejing, China, 2014,
     pp. 1188-1196.
[25] L. Mcinnes, J. Healy, N. Saul, L. Großberger, Umap:
     uniform manifold approximation and projection,
     Journal of Open Source Software 3 (29) (2018)
     861. doi:10.21105/joss.00861.
[26] M.C. González, C.A. Hidalgo, A.L. Barabási,
     Understanding individual human mobility
     patterns, Nature 453 (2008) 779-782.
     doi:10.1038/nature.
[27] C. Song, T. Koren, P. Wang, A. Barabási, Modelling
     the scaling properties of human mobility, Nat.
     Phys.        6       (10)      (2010)        818-823.
     doi:10.1038/nphys1760.
[28] M. Lenormand, A. Bassolas, J.J. Ramasco,
     Systematic comparison of trip distribution laws
     and models, J. Transp. Geogr. 51 (2016) 158-169.
     doi:10.1016/j.jtrangeo.2015.12.008.
[29] F. Simini, M.C. González, A. Maritan, A. Barabási, A
     universal model for mobility and migration
     patterns, Nature 484 (7392) (2012) 96-100.
     doi:10.1038/nature10856.
[30] D.W. Sims, E.J. Southall, N.E. Humphries, G.C. Hays,
     C.J.A. Bradshaw, J.W. Pitchford, A. James, M.Z.
     Ahmed, A.S. Brierley, M.A. Hindell, D. Morritt, M.K.
     Musyl, D. Righton, E.L.C. Shepard, V.J. Wearmouth,
     R.P. Wilson, M.J. Witt, J.D. Metcalfe, Scaling laws of
     marine predator search behaviour, Nature 451




                                                          133