Open-mentorship team is beneficial to disruptive ideas⋆
                                Bili Zheng1, Wenjing Li1 and Jianhua Hou1,∗

                                1 Sun Yat-sen University, Guangzhou 510006, Guangdong, China


                                                   Abstract
                                                   How collaboration benefits disruption is widely discussed in academia, but less attention is paid
                                                   to mentorship in the collaboration of an article. This study focuses on the association between
                                                   close/open-mentorship measured by whether coauthors in publications belong to the same
                                                   academic genealogy and the disruption of publications measured by the Disruption Index (DI).
                                                   We selected 361,189 publications in Neuroscience from the SciSciNet database and then
                                                   constructed regression models and estimated the relationship between the variables. Moreover,
                                                   we use Propensity Score Matching and causal forest to estimate the causal relationship between
                                                   them. The findings show that articles with open-mentorship collaboration are more disruptive
                                                   than those with close-mentorship collaboration. The findings provide implications for team
                                                   formation and team management in practice.

                                                   Keywords
                                                   academic genealogy, disruption index, mentorship type, team science1


                                1. Introduction                                                                          and broad range relationships which means the
                                                                                                                         mentee may be the “learner” in mentoring
                                   In the past decades, scientific papers have become                                    relationships regardless of age or position [5]. For
                                less disruptive [1]. Some studies attribute this drastic                                 similarity, we here refer mentorship to as the advisor-
                                change to the scientific enterprise, team size, and                                      advisee relationship like most genealogical studies [6,
                                collaboration distance [2, 3, 4]. Inspired by a series of                                7].
                                studies on collaboration and disruption [3, 4], we are
                                interested in whether a close-mentorship or open-                                        2. Data and method
                                mentorship team will fuse more disruptive ideas. The
                                research question is based on the following                                              2.1. Data collection
                                assumption: a close-mentorship team means all the
                                                                                                                           We derive mentorship from the dataset released by
                                members in a team belong to the same genealogy,
                                                                                                                         Qing et al (2022) [8], which enriches the Academic
                                while an open-mentorship team means the members
                                                                                                                         Family Tree by adding publication records from
                                belong to more than one genealogy.
                                                                                                                         Microsoft Academic Graph (MAG) [8]. Then, we obtain
                                    To address the question, we first define the term
                                                                                                                         the DI of each paper from SciSciNet, which provides
                                mentorship. Mentorship can occur formally through
                                                                                                                         over 134 million scientific publications and frequently
                                doctoral     and      postdoctoral      advisor-advisee
                                                                                                                         used indexes (such as DI, Z-score, and sleeping beauty
                                relationships or informally through collaborations.
                                                                                                                         coefficient) [9]. We obtained 505,926 papers with DI,
                                Some genealogy databases like The Academic Family
                                                                                                                         82,814 authors, and 5,855 academic genealogies.
                                Tree encompass both advisor-advisee relationships


                                Joint Workshop of the 5th Extraction and Evaluation of
                                Knowledge Entities from Scientific Documents and the 4th AI +
                                Informetrics (EEKE-AII2024), April 23~24, 2024, Changchun,
                                China and Online
                                EMAIL: zhengbli@mail2.sysu.edu.cn (Bili Zheng);
                                liwj336@mail2.sysu.edu.cn (Wenjing Li);
                                houjh5@mail.sysu.edu.cn (Jianhua Hou)
                                ∗ Corresponding author.

                                             © 2023 Copyright for this paper by its authors. Use permitted under
                                             Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings

                                                                                                                   155
After excluding missing values, there were 361,189                studies, we selected those factors for which: (1) prior
papers.                                                           work has investigated the factors possibly influencing
                                                                  DI; (2) existing studies had verified the relationship
2.2. Causal inference                                             with DI; (3) the data for calculating the factors were
                                                                  available in records from SciSciNet [10].
Table 1
Variable description
                                                                     To check the robustness of the PSM, we use causal
                        Variable
 No      Variable                     Annotation                  forest (CF), a state-of-art causal inference method
                          type
                                                                  [11]. Compared with PSM, it solves the curse of
                                      1 if it is a close-         dimensionality and provides a more accurate estimate
                                      mentorship                  of the treatment effect.
                                      team; 0 if it is               In the causal forest, considering the analysis of
  1     treatment        binary
                                      open-                       heterogeneous causal effects, our estimation objective
                                      mentorship                  is Conditional Average Treatment Effects (CATE). The
                                      team                        CATE for a given observation 𝑖 is defined as:
                                      Disruption                          τ(x) = E[ YiW=1 − YiW=0 ∣∣ Xi = x ]       (Eq. 1)
  2      outcome       continuous
                                      index (DI)                   where i = 1,2, … , n represents the paper in our
  3         PY          discrete      Publication year            sample and Wi ∈ {0,1} indicates whether the team of
                                      Total citation              paper 𝑖 is close-mentorship. We observe the outcome
  4         CI          discrete
                                      counts                      of interest YiW=1 if the paper is assigned to the
                                      10th percentile             treatment condition (i.e., if the team of paper is close-
                                      Z-score of the              mentorship), otherwise we observe YiW=0 . Xi denote a
  5        A10         continuous
                                      paper defined in            vector of the paper`s other characteristics.
                                      Uzzi et al (2013)

  6         TS          discrete
                                      Team size of an             3. Results
                                      article
                                      1 if it is remote           3.1. OLS estimates
  7         RC           binary       collaboration; 0               From the data we observed, the number of papers
                                      if it is not                with open-mentorship teams dramatically increased
                                      The average age             until 2011. However, the number of papers with close-
  8         AA         continuous     of authors in a             mentorship teams is 0 after 1980. The number of
                                      team                        papers with open-mentorship teams far exceeds that
                                      The        average          with close-mentorship teams (Figure 1). We tested
                                      productivity of             the between-group difference between the two
  9         AP         continuous                                 groups by Mann-Whitney test through Python. The
                                      authors in a
                                      team                        result shows that there is a significant difference
                                      Average citation            between the two groups (p<0.001).
                                      counts           of            We first answer the question by OLS regression.
 10         AC         continuous                                 When only the independent variable relationship was
                                      authors in a
                                                                  included in the model. The regression coefficient of
                                      team
                                                                  the variable is negative and significant at the 0.01
                                                                  level, providing initial evidence that close mentorship
   We adapt Propensity Score Matching (PSM) to
                                                                  has a negative effect on disruption. The magnitude of
validate causality between mentorship type and DI.
                                                                  this coefficient changed significantly when we added
The main effect of this study is the effect of the close-
                                                                  the control variables one by one. In the final model, we
mentorship team on DI (treatment effect). The effect
                                                                  controlled all the confounding variables and fixed
can be influenced by some confounding factors. from
                                                                  effect, and the model specification had the largest
the literature review, team-related, personal-related,
                                                                  adjusted R2, suggesting that the explanatory power of
article-related factors can be considered as
                                                                  the model was enhanced by the control. The results in
confounding factors of DI. Table 1 shows the variables
                                                                  the final model show that a close-mentorship team
we included in this study. The variable “outcome” (DI)
                                                                  has a significantly negative effect on disruption。
is the explained variable. Admittedly, there is a large
bulk of factors that may influence DI, but it is hard to
include all factors. As implemented in previous


                                                            156
                            (a)                                      To check the robustness of the results, we used
                                                                 causal forest (CF), a state-of-the-art method. For each
                                                                 paper, we obtain an individualized treatment effect
                                                                 with its 95% confidence interval estimated. The
                                                                 CATEs of the close-mentorship team have a mean of -
                                                                 0.0004. In other words, the close-mentorship team
                                                                 decreases DI by 0.0004 times. However, when we take
                                                                 citation counts as the dependent variable, we found
                                                                 that the CATEs of the close-mentorship team have a
                                                                 mean of 8.503, which means that papers with close-
                                                                 mentorship may have more citation counts.
                                                                    (a)


                            (b)


                                                                   (b)

  Figure 1. The distribution of mentorship type. (a)
The annual distribution of papers with different
mentorship types. (b) The distribution of the paper`s
DI with different mentorship types.

3.2. Mentorship type and DI
    We test the relationship between mentorship and
DI through PSM. We categorized papers with a close-
mentorship team as the treatment group (29,556
samples) and papers with an open-mentorship team
as the control group (331,632 samples). Figure 2(a)
shows that the propensity score distributions of the
two groups of samples are significantly different,                   Figure 2. The propensity score distribution. (a)
while the propensity scores of the two groups                    The propensity score before matching. (b) The
converge after matching. However, after matching the             distribution of DI between close-mentorship group
two groups of samples, the distributions of PY, CI, A10,         and open-mentorship group.
TS, RC, AA, AP, and AC are the same, which indicates
that the matching is effective. Through the hypothesis
test commonly used in AB experiments, we found that              4. Discussion and conclusion
there is a significant difference in DI between the two
groups (p<0.05), with the close-mentorship team                    This study investigated whether the close-
having an average of -0.002917 DI lower than the                 mentorship team fuses more disruptive ideas than the
open-mentorship team, which means that the DI of the             open-mentorship team. We used academic genealogy
close-mentorship team is 36.34% lower than the DI of             to quantify whether an article was close-mentorship
the open-mentorship team (Figure 2(b)).                          or open-mentorship and used the Disruption Index to
                                                                 quantify the disruption idea. We investigated the


                                                           157
relationship between the variables by analyzing                       [10] Bornmann, L., Haunschild, R., & Mutz, R. (2020).
papers in Neuroscience and constructing regression                         Should citations be field-normalized in
models. Moreover, we used PSM and causal forest to                         evaluative bibliometrics? An empirical analysis
test whether there is a causal relationship between                        based on propensity score matching. Journal of
mentorship type and DI. The results indicate that the                      Informetrics,             14(4),             20.
articles with the close-mentorship team are less                           doi:10.1016/j.joi.2020.101098
disruptive than those with the open-mentorship team.                  [11] Wager, S., & Athey, S. (2018). Estimation and
However, the articles with the close-mentorship team                       Inference of Heterogeneous Treatment Effects
are more cited than those with the open-mentorship                         using Random Forests. Journal of the American
team.                                                                      Statistical Association, 113(523), 1228-1242.
                                                                           doi:10.1080/01621459.2017.1319839
References
[1]   Park, M., Leahey, E., & Funk, R. J. (2023). Papers
      and patents are becoming less disruptive over
      time.       Nature,       613(7942),          138-144.
      doi:10.1038/s41586-022-05543-x
[2]   Kozlov, M. (2023). 'Disruptive' science has
      declined - and no one knows why. Nature,
      613(7943), 225. doi:10.1038/d41586-022-
      04577-5
[3]   Lin, Y., Frey, C. B., & Wu, L. (2023). Remote
      collaboration fuses fewer breakthrough ideas.
      Nature,              623(7989),               987-991.
      doi:10.1038/s41586-023-06767-1
[4]   Wu, L., Wang, D., & Evans, J. A. (2019). Large
      teams develop and small teams disrupt science
      and technology. Nature, 566(7744), 378-382.
      doi:10.1038/s41586-019-0941-9
[5]   Ke, Q., Liang, L., Ding, Y., David, S. V., & Acuna, D.
      E. (2022). A dataset of mentorship in bioscience
      with semantic and demographic estimations.
      Scientific Data, 9(1), 467. doi:10.1038/s41597-
      022-01578-x
[6]   Corsini, A., Pezzoni, M., & Visentin, F. (2022).
      What makes a productive Ph.D. student?
      Research                 Policy,               51(10).
      doi:10.1016/j.respol.2022.104561
[7]   Rosenfeld, A., & Maksimov, O. (2022). Factors
      that impact (positively and negatively) the
      advisor-advisee relationship Should Young
      Computer Scientists Stop Collaborating with
      Their Doctoral Advisors? COMMUNICATIONS
      OF        THE        ACM,        65(10),        66-72.
      doi:10.1145/3529089
[8]    Ke, Q., Liang, L., Ding, Y., David, S. V., & Acuna, D.
      E. (2022). A dataset of mentorship in bioscience
      with semantic and demographic estimations.
      Scientific Data, 9(1), 467. doi:10.1038/s41597-
      022-01578-x
[9]    Lin, Z., Yin, Y., Liu, L., & Wang, D. (2023).
      SciSciNet: A large-scale open data lake for the
      science of science research. Scientific Data,
      10(1). doi:10.1038/s41597-023-02198-9


                                                                158