Open-mentorship team is beneficial to disruptive ideas⋆ Bili Zheng1, Wenjing Li1 and Jianhua Hou1,∗ 1 Sun Yat-sen University, Guangzhou 510006, Guangdong, China Abstract How collaboration benefits disruption is widely discussed in academia, but less attention is paid to mentorship in the collaboration of an article. This study focuses on the association between close/open-mentorship measured by whether coauthors in publications belong to the same academic genealogy and the disruption of publications measured by the Disruption Index (DI). We selected 361,189 publications in Neuroscience from the SciSciNet database and then constructed regression models and estimated the relationship between the variables. Moreover, we use Propensity Score Matching and causal forest to estimate the causal relationship between them. The findings show that articles with open-mentorship collaboration are more disruptive than those with close-mentorship collaboration. The findings provide implications for team formation and team management in practice. Keywords academic genealogy, disruption index, mentorship type, team science1 1. Introduction and broad range relationships which means the mentee may be the “learner” in mentoring In the past decades, scientific papers have become relationships regardless of age or position [5]. For less disruptive [1]. Some studies attribute this drastic similarity, we here refer mentorship to as the advisor- change to the scientific enterprise, team size, and advisee relationship like most genealogical studies [6, collaboration distance [2, 3, 4]. Inspired by a series of 7]. studies on collaboration and disruption [3, 4], we are interested in whether a close-mentorship or open- 2. Data and method mentorship team will fuse more disruptive ideas. The research question is based on the following 2.1. Data collection assumption: a close-mentorship team means all the We derive mentorship from the dataset released by members in a team belong to the same genealogy, Qing et al (2022) [8], which enriches the Academic while an open-mentorship team means the members Family Tree by adding publication records from belong to more than one genealogy. Microsoft Academic Graph (MAG) [8]. Then, we obtain To address the question, we first define the term the DI of each paper from SciSciNet, which provides mentorship. Mentorship can occur formally through over 134 million scientific publications and frequently doctoral and postdoctoral advisor-advisee used indexes (such as DI, Z-score, and sleeping beauty relationships or informally through collaborations. coefficient) [9]. We obtained 505,926 papers with DI, Some genealogy databases like The Academic Family 82,814 authors, and 5,855 academic genealogies. Tree encompass both advisor-advisee relationships Joint Workshop of the 5th Extraction and Evaluation of Knowledge Entities from Scientific Documents and the 4th AI + Informetrics (EEKE-AII2024), April 23~24, 2024, Changchun, China and Online EMAIL: zhengbli@mail2.sysu.edu.cn (Bili Zheng); liwj336@mail2.sysu.edu.cn (Wenjing Li); houjh5@mail.sysu.edu.cn (Jianhua Hou) ∗ Corresponding author. © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings 155 After excluding missing values, there were 361,189 studies, we selected those factors for which: (1) prior papers. work has investigated the factors possibly influencing DI; (2) existing studies had verified the relationship 2.2. Causal inference with DI; (3) the data for calculating the factors were available in records from SciSciNet [10]. Table 1 Variable description To check the robustness of the PSM, we use causal Variable No Variable Annotation forest (CF), a state-of-art causal inference method type [11]. Compared with PSM, it solves the curse of 1 if it is a close- dimensionality and provides a more accurate estimate mentorship of the treatment effect. team; 0 if it is In the causal forest, considering the analysis of 1 treatment binary open- heterogeneous causal effects, our estimation objective mentorship is Conditional Average Treatment Effects (CATE). The team CATE for a given observation 𝑖 is defined as: Disruption τ(x) = E[ YiW=1 − YiW=0 ∣∣ Xi = x ] (Eq. 1) 2 outcome continuous index (DI) where i = 1,2, … , n represents the paper in our 3 PY discrete Publication year sample and Wi ∈ {0,1} indicates whether the team of Total citation paper 𝑖 is close-mentorship. We observe the outcome 4 CI discrete counts of interest YiW=1 if the paper is assigned to the 10th percentile treatment condition (i.e., if the team of paper is close- Z-score of the mentorship), otherwise we observe YiW=0 . Xi denote a 5 A10 continuous paper defined in vector of the paper`s other characteristics. Uzzi et al (2013) 6 TS discrete Team size of an 3. Results article 1 if it is remote 3.1. OLS estimates 7 RC binary collaboration; 0 From the data we observed, the number of papers if it is not with open-mentorship teams dramatically increased The average age until 2011. However, the number of papers with close- 8 AA continuous of authors in a mentorship teams is 0 after 1980. The number of team papers with open-mentorship teams far exceeds that The average with close-mentorship teams (Figure 1). We tested productivity of the between-group difference between the two 9 AP continuous groups by Mann-Whitney test through Python. The authors in a team result shows that there is a significant difference Average citation between the two groups (p<0.001). counts of We first answer the question by OLS regression. 10 AC continuous When only the independent variable relationship was authors in a included in the model. The regression coefficient of team the variable is negative and significant at the 0.01 level, providing initial evidence that close mentorship We adapt Propensity Score Matching (PSM) to has a negative effect on disruption. The magnitude of validate causality between mentorship type and DI. this coefficient changed significantly when we added The main effect of this study is the effect of the close- the control variables one by one. In the final model, we mentorship team on DI (treatment effect). The effect controlled all the confounding variables and fixed can be influenced by some confounding factors. from effect, and the model specification had the largest the literature review, team-related, personal-related, adjusted R2, suggesting that the explanatory power of article-related factors can be considered as the model was enhanced by the control. The results in confounding factors of DI. Table 1 shows the variables the final model show that a close-mentorship team we included in this study. The variable “outcome” (DI) has a significantly negative effect on disruption。 is the explained variable. Admittedly, there is a large bulk of factors that may influence DI, but it is hard to include all factors. As implemented in previous 156 (a) To check the robustness of the results, we used causal forest (CF), a state-of-the-art method. For each paper, we obtain an individualized treatment effect with its 95% confidence interval estimated. The CATEs of the close-mentorship team have a mean of - 0.0004. In other words, the close-mentorship team decreases DI by 0.0004 times. However, when we take citation counts as the dependent variable, we found that the CATEs of the close-mentorship team have a mean of 8.503, which means that papers with close- mentorship may have more citation counts. (a) (b) (b) Figure 1. The distribution of mentorship type. (a) The annual distribution of papers with different mentorship types. (b) The distribution of the paper`s DI with different mentorship types. 3.2. Mentorship type and DI We test the relationship between mentorship and DI through PSM. We categorized papers with a close- mentorship team as the treatment group (29,556 samples) and papers with an open-mentorship team as the control group (331,632 samples). Figure 2(a) shows that the propensity score distributions of the two groups of samples are significantly different, Figure 2. The propensity score distribution. (a) while the propensity scores of the two groups The propensity score before matching. (b) The converge after matching. However, after matching the distribution of DI between close-mentorship group two groups of samples, the distributions of PY, CI, A10, and open-mentorship group. TS, RC, AA, AP, and AC are the same, which indicates that the matching is effective. Through the hypothesis test commonly used in AB experiments, we found that 4. Discussion and conclusion there is a significant difference in DI between the two groups (p<0.05), with the close-mentorship team This study investigated whether the close- having an average of -0.002917 DI lower than the mentorship team fuses more disruptive ideas than the open-mentorship team, which means that the DI of the open-mentorship team. We used academic genealogy close-mentorship team is 36.34% lower than the DI of to quantify whether an article was close-mentorship the open-mentorship team (Figure 2(b)). or open-mentorship and used the Disruption Index to quantify the disruption idea. We investigated the 157 relationship between the variables by analyzing [10] Bornmann, L., Haunschild, R., & Mutz, R. (2020). papers in Neuroscience and constructing regression Should citations be field-normalized in models. Moreover, we used PSM and causal forest to evaluative bibliometrics? An empirical analysis test whether there is a causal relationship between based on propensity score matching. Journal of mentorship type and DI. The results indicate that the Informetrics, 14(4), 20. articles with the close-mentorship team are less doi:10.1016/j.joi.2020.101098 disruptive than those with the open-mentorship team. [11] Wager, S., & Athey, S. (2018). Estimation and However, the articles with the close-mentorship team Inference of Heterogeneous Treatment Effects are more cited than those with the open-mentorship using Random Forests. Journal of the American team. Statistical Association, 113(523), 1228-1242. doi:10.1080/01621459.2017.1319839 References [1] Park, M., Leahey, E., & Funk, R. J. (2023). Papers and patents are becoming less disruptive over time. Nature, 613(7942), 138-144. doi:10.1038/s41586-022-05543-x [2] Kozlov, M. (2023). 'Disruptive' science has declined - and no one knows why. Nature, 613(7943), 225. doi:10.1038/d41586-022- 04577-5 [3] Lin, Y., Frey, C. B., & Wu, L. (2023). Remote collaboration fuses fewer breakthrough ideas. Nature, 623(7989), 987-991. doi:10.1038/s41586-023-06767-1 [4] Wu, L., Wang, D., & Evans, J. A. (2019). Large teams develop and small teams disrupt science and technology. Nature, 566(7744), 378-382. doi:10.1038/s41586-019-0941-9 [5] Ke, Q., Liang, L., Ding, Y., David, S. V., & Acuna, D. E. (2022). A dataset of mentorship in bioscience with semantic and demographic estimations. Scientific Data, 9(1), 467. doi:10.1038/s41597- 022-01578-x [6] Corsini, A., Pezzoni, M., & Visentin, F. (2022). What makes a productive Ph.D. student? Research Policy, 51(10). doi:10.1016/j.respol.2022.104561 [7] Rosenfeld, A., & Maksimov, O. (2022). Factors that impact (positively and negatively) the advisor-advisee relationship Should Young Computer Scientists Stop Collaborating with Their Doctoral Advisors? COMMUNICATIONS OF THE ACM, 65(10), 66-72. doi:10.1145/3529089 [8] Ke, Q., Liang, L., Ding, Y., David, S. V., & Acuna, D. E. (2022). A dataset of mentorship in bioscience with semantic and demographic estimations. Scientific Data, 9(1), 467. doi:10.1038/s41597- 022-01578-x [9] Lin, Z., Yin, Y., Liu, L., & Wang, D. (2023). SciSciNet: A large-scale open data lake for the science of science research. Scientific Data, 10(1). doi:10.1038/s41597-023-02198-9 158