Understanding Partnership in Scientific Collaborations: A Preliminary Study from the Paper-level Perspective Chao Lu1,* , Mengting Li1, and Chenyu Zhou1 1 Hohai University, 8 West Focheng Road, Nanjing, China, 211000 Abstract Scientific collaboration is more and more common in scientific knowledge production. It has been widely investigated through quantitative and qualitative ways recently. However, most quantita- tive methods purely based on co-author information usually fail to dig deeper into the internal interaction between collaborators as contributors, which fails to observe internal interactions between collaborators. In this study, we investigated how collaborators in teams work together to perform their research by understanding how two collaborators work together as partners which the traditional collaborative network usually overlooked naturally. By collecting author information from Scopus and author contribution statements from PLoS, we take the biology sub- ject as an example and have examined more than 120,000 research articles and found that divi- sion of labor is quite common in scientific collaboration; that partnership as a form of division of labor is widely observed in our dataset; and that the diversity in contributing tasks between part- ners is generally mild. This study will shed light on understanding the mechanism in scientific collaboration via division of labor that co-authorship studies widely overlook. It helps us create research teams with higher levels of engagement and communication. Keywords Scientific Collaboration, Author Contribution Statement, Natural Language Processing. 1 Introduction scientific collaborations, such as division of labor and team role differentiation[3, 8]. Recently, co-contribu- Scientific collaboration is more and more common in torship[1] as a type of partnership in scientific collab- scientific knowledge production. It has been widely in- orations drew our research interest. Given that re- vestigated through quantitative and qualitative ways search teams consist of not only individual building [1, 2] recently. However, there is still more to be inves- blocks but living collaborators, we want to investigate tigated especially when more data are disclosed on in- how this partnership exists in scientific collaboration teractions between collaborators in each team, i.e., au- and how this close relationship in scientific collabora- thor contribution statement [1, 3–5] while most quan- tion influences scientific performance in the future. titative methods based purely on co-author infor- Thus, in this preliminary study, we collected au- mation usually fail to dig deeper on the internal inter- thor information from Scopus and author contribution action between collaborators as contributors[6, 7]. statements from PLoS, we took the biology subject as Contributorship other than authorship especially pay an example and examined more than 120,000 re- attention to the actual contributions made by each sci- search articles to examine partnership in scientific col- entific collaborator. Studies suggest that contributor- laboration from three perspectives: ratio, strength, ship provides us with new perspectives to understand and diversity. This study and the study to come will Joint Workshop of the 5th Extraction and Evaluation of Knowledge Entities from Scientific Documents and the 4th AI + Informetrics (EEKE-AII2024), April 23~24, 2024, Changchun, China and Online *Corresponding author.EMAIL: luchao91@hhu..edu.cn ©️ 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings 141 help shed light on understanding the mechanism in detailed definition for each role. For each piece of au- scientific collaboration via division of labor that stud- thor-defined tasks, we might assign more than one ies via co-authorship widely overlooked. It might help standard contribution role(see Table 1). For those us create research teams with higher levels of engage- contributions that cannot be standardized using the ment and communication. taxonomy, we label them as "Other". The rest, around 0.5 percent, of author-defined tasks, we automatically 2 Data and Methods label them as "UNKNOWN" for we did not mannually label them. Considering the amount of this part of data 2.1 Data is quite small, the potential side effect of them on the To examine the phenomenon of partnership in scien- whole study could be ignored. With the multi-labeling tific collaborations, we collected 126,894 articles in tactic, we can expand our data, resulting in 5,154 more the Biology domain from PLoS (Public Library of Sci- author-task pairs. ence) from 2006 to 2020 with their author contribu- tion statements. The yearly distribution of the articles Table 1. Annotation sample for author-defined contri- we collected is shown in Fig.1. The plot suggests that bution standardization using CrediT the distribution generally followed an increasing Author-defined Task Contribution Role trend except in 2015 and 2016, we double-checked the Participated in critical dis- data and found that the PLoS journals did not label cussion of the draft's ini- Writing – review & subject information for their papers in these two years, tial findings and revision editing so we failed to include the biology papers in these of the manuscript years from the whole collections. Statistically analyzed the Formal analysis data Contributed to the design Conceptualization, and development of the Methodology project Then we construct paper-level co-authorship net- works(CAN for short) and co-contributorship net- works (CCN for short) as proposed in [9] for each pa- per. We are to analyze the partnership from three per- spectives: partnership ratio; partnership strength, and partnership diversity. The formulas for the three measurements are as follows: π‘›π‘’π‘šπ‘π‘’π‘Ÿ π‘œπ‘“ 𝐢𝐢𝑁 𝑒𝑑𝑔𝑒𝑠 𝑃𝑅 = (1) π‘šπ‘Žπ‘₯π‘–π‘šπ‘’π‘š π‘›π‘’π‘šπ‘π‘’π‘Ÿ π‘œπ‘“ 𝑒𝑑𝑔𝑒𝑠 π‘‘π‘œπ‘‘π‘Žπ‘™ π‘€π‘’π‘–π‘”β„Žπ‘‘ π‘œπ‘“ 𝐢𝐢𝑁 𝑒𝑑𝑔𝑒𝑠 Fig. 1. Yearly distribution of biology publications 𝑃𝑆 = (2) π‘‘π‘œπ‘‘π‘Žπ‘™ π‘€π‘’π‘–π‘”β„Žπ‘‘ π‘œπ‘“ 𝐢𝐴𝑁 𝑒𝑑𝑔𝑒𝑠 from PLoS journals collected in this study π‘›π‘’π‘šπ‘π‘’π‘Ÿ π‘œπ‘“ π‘’π‘›π‘–π‘žπ‘’π‘’ π‘π‘œπ‘›π‘‘π‘Ÿπ‘–π‘π‘’π‘‘π‘œπ‘Ÿ π‘Ÿπ‘œπ‘™π‘’π‘  𝑃𝐷 = (3) π‘›π‘’π‘šπ‘π‘’π‘Ÿ π‘œπ‘“ π‘Žπ‘™π‘™ π‘π‘œπ‘›π‘‘π‘Ÿπ‘–π‘π‘’π‘‘π‘œπ‘Ÿ π‘Ÿπ‘œπ‘™π‘’π‘  2.2 Methods Following previous studies[1, 9], we process the au- 3 Preliminary Findings thor contribution statements and link the author Fig. 2 plots the partnership ratio in our dataset. It sug- names to their tasks in each paper using Python scripts. gests that generally in each team exists some level of Using Scopus API, we can disambiguate author names partnership, which results in some degree of division for this study. In total, we have collected 574,979 of labor in scientific collaboration. Specifically, more pieces of disambiguated author information and than 60,000 teams all collaborators are engaged in at 2,831,375 author-task pairs. least one collaborative task. Given that PLoS did not adopt the CRediT1 taxonomy Fig.3 shows the partnership strength distribution, until 2016, we manually labeled around 99.5% of au- which indicates how closely collaborators in a team thor-defined tasks according to the taxonomy with 1 http://credi.niso.org/ 142 are connected when doing research via the number of usually perform 2.35 different contributor roles, on tasks two collaborators collaborated in a study. The average. Given that there are 14 different contributor figure demonstrates that on average the total edge roles theoretically that two partners can work on, the weights of CCNs are 2.09, which means on average, diversity of the partnership remains quite mild. two collaborators collaborate two divided tasks in each study. It also suggests that some collaborators in teams might be more involved in collaboration than others, indicating the existance of the partnership in scientific collaboration. Given that the average weight of CCNs is as double as those measured in CANs. And CCNs are naturally sparser than CANs as suggested by[9], the figure implies that partnership plays a role in scientific collaboration. Fig. 4 The distribution of partnership diversity in our study 4 Conclusion and Future Work In this preliminary study, we investigated how collab- orators in teams work together to perform their re- search by understanding how two collaborators work together as partners which the traditional collabora- Fig. 2. The distribution of partnership ratio in our tive network usually overlooked naturally. By collect- study ing author information from Scopus and author contri- bution statements from PLoS, we take the biology sub- ject as an example and have examined more than 120,000 research articles and found that division of la- bor is quite common in scientific collaboration; that partnership as a form of division of labor is widely ob- served in our dataset; and that the diversity in contrib- uting tasks between partners is generally mild. This study will shed light on understanding the mechanism in scientific collaboration via division of labor that co- authorship studies widely overlook. It helps us create research teams with higher levels of engagement and communication. Acknowledgements This article is an outcome of the youth project "Study of Scientific Collaborators’ Scientific Effectiveness Fig. 3. The distribution of partnership strength in our from The Perspective of Division of Labor" (ID: study 72004054) supported by the National Natural Science Foundation of China and the project "The Causal Effect Fig.4 shows the partnership diversity in scientific of Team Diversity on Team Performance" supported collaboration, which generally reflects how diverse it by the Fundamental Research Funds for the Central can be when two collaborators work as partners on Universities (ID: B220201058). the same tasks. It shows that generally partners 143 References [5] Allen L, Scott J, Brand A, et al (2014) Publishing: Credit where credit is due. Nature News 508:312. [1] Lu C, Zhang C, Xiao C, Ding Y (2022) Contributor- [6] Devine EB, Beney J, Bero LA (2005) Equity, ac- ship in Scientific Collaborations: The Perspective countability, transparency: Implementation of of Contribution-based Byline Orders. Infor- the contributorship concept in a multi-site study. mation Processing & Management 59:. Am J Pharm Educ 69:61. [2] GarcΓ­a-SΓ‘nchez P, DΓ­az-DΓ­az NL, De SaΓ‘-PΓ©rez P https://doi.org/10/dqjngb (2019) Social capital and knowledge sharing in [7] Rennie D, Yank V, Emanuel L (1997) When Au- academic research teams. International Review thorship Fails: A Proposal to Make Contributors of Administrative Sciences 85:191–207. Accountable. JAMA 278:579–585. [3] Haeussler C, Sauermann H (2020) Division of la- [8] Xu F, Wu L, Evans J (2022) Flat teams drive sci- bor in collaborative knowledge production: The entific innovation. Proc Natl Acad Sci USA role of team size and interdisciplinarity. Re- 119:e2200927119. search Policy 49:103987. [9] Lu C, Zhang Y, Ahn Y-Y, et al (2020) Co-contribu- [4] LariviΓ¨re V, Desrochers N, Macaluso B, et al torship Network and Division of Labor in Indi- (2016) Contributorship and division of labor in vidual Scientific Collaborations. Journal of the knowledge production. Soc Stud Sci 46:417–435. Association for Information Science and Tech- nology 71:1162–1178. 144