=Paper=
{{Paper
|id=Vol-2658/paper6
|storemode=property
|title=Exploring the Relation between Biomedical Entities and Government Funding
|pdfUrl=https://ceur-ws.org/Vol-2658/paper6.pdf
|volume=Vol-2658
|authors=Fang Tan,Siting Yang,Xiaoyan Wu,Jian Xu
|dblpUrl=https://dblp.org/rec/conf/jcdl/TanYWX20
}}
==Exploring the Relation between Biomedical Entities and Government Funding==
EEKE 2020 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents Exploring the Relation between Biomedical Entities and Government Funding Fang Tan Siting Yang Xiaoyan Wu Jian Xu School of Information School of Information School of Information School of Information Management Management Management Management Sun Yat-Sen University Sun Yat-Sen University Sun Yat-Sen University Sun Yat-Sen University Guangzhou Guangdong Guangzhou Guangdong Guangzhou Guangdong Guangzhou Guangdong China China China China cathytf@163.com 524228058@qq.com wxy1954174163@163. issxj@mail.sysu.edu.cn com ABSTRACT government funds in terms of institutions, patents, employment resolution capacity, etc. [6, 7]. Meanwhile, most of the research In order to study and analyze the effect of government funding on on scientific research and funding is limited to the exploration of the promotion of scientific research in the field of medicine and to the relationship between some indicators of research help the government manage research funds more rationally, this achievements (e.g. quantity and citation) and funding, and lacks a study proposes a framework for analyzing the relationship detailed study on the impact of funding on entity level. Therefore, between entities in the field of medicine and funds. The this paper combined PubMed medical database and funding framework consists of four parts: biomedical abstracts acquisition, information published by the National Institutes of Health (NIH) NIH funding information acquisition and biomedical entity to compare the actual research focus and funding focus in the extraction; Development trend analysis of biomedical entity; biomedical field from 1988 to 2017. First, the trajectory of the Analysis of the most funded entities; Analysis of the relationship field is mapped from a physical research perspective to between entity research popularity and government funding. The understand macro trends; second, the most funded entities are results of preliminary analysis are as follows: the field of genetic counted, the focuses and tendencies of government funding on research is in a period of rapid development, while the field of biomedical entities are summarized; finally, the specific species research is in a “flat period”; Disease research catch relationship between biomedical research funding and research NIH’s continuous attention; the stimulating effect of government popularity is further analyzed, which provides a reference for the funding on the research popularity is decreasing, which is affected government 's choice of funding recipients and funding levels. by various factors. CCS CONCEPTS 2 METHODOLOGY • Applied computing → Bioinformatics • Applied computing → This paper proposes a framework for analyzing the Computing in government • Information systems → Information relationship between biomedical entities and funds, as shown in retrieval Figure 1. KEYWORDS Biomedical entities, Government funding, Entitymetrics, Evolutionary trend 1 INTRODUCTION By 2019, the total number of literatures in PubMed, the database of biomedical papers, has reached 29 million [1], and statistically, nearly 1/3 of US patents come directly from federally funded programs [2], meaning that the federal government plays an important role in the development of scientific research. Entitymetrics was originally proposed by Ding et al. [3]. Current research around entities in medicine mainly includes the identification and classification of named entities [4], and the Figure 1. Our Research Framework extraction of entity relationships [5], while research on government funding is limited to quantifying the effects of Copyright 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 50 EEKE 2020 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents Fang Tan et al. In Figure 1, the analysis framework can be divided into four development trend, the number of gene/protein entities is rising main modules: data acquisition and entity extraction; the fastest and is in the stage of rapid development. The research Development trend analysis of biomedical entity; Analysis of the on species entities is in the flat stage and is less numerous. most funded entities; Analysis of the relationship between entity research popularity and government funding. 1. Data acquisition and entity extraction. Obtaining biomedical data from PubMed between 1988 and 2017, obtaining funding information and relevant research papers of project outputs from NIH funding database, and biomedical entity extraction based on BERN [8, 9, 10]. BERN, namely Biomedical named entity recognition and multi-type normalization, a Web-based biomedical text mining tool. The process of entity extraction involving two steps: named entity recognition and entity normalization. At last, 489,433 biomedical entities are obtained between 1988 and 2017. 2,082,652 research projects are obtained, with about $1,0261.3 billion [10]. 2. Development trend analysis of biomedical entity. Biomedical entities are categorized into Species, Diseases, Gene/Protein, and Drug/Chemical for evolutionary analysis. Table 1 shows the Figure 2. Mention Trends of Biomedical Entities number of entities of four types. 3.2 Analysis of the Most Funded Entities Table 1. The number of entities of four types Table 2 shows the top twenty biomedical entities in terms of total NIH funding dollars. Mice, HIV, Human immunodeficiency Entity Type Number disease and Tumor have all received more than $100 billion, Species 84,203 which are the four entities receiving the most funding. In terms of Disease 36,704 entity types, the highest number of disease entities occupying nine Gene/Protein 25,489 seats, and the lowest number of gene/protein entities, only two. Drug/Chemical 134,574 This indicates that the study of disease is an area of research that 3. Analysis of the most funded entities. Combined with the the NIH has always valued and continues to focus on. biomedical entity data, the entities mentioned in the NIH project output articles are extracted to count the amount of funding for the Table 2. Entities with the highest total funding (top 20) entities. We define the funding for an entity as the sum of the funding for all articles in which the entity appears. 4. Analysis of the relationship between entity research popularity Funds ID Entity ID Entity Name Entity Type and government funding. We define the entity research popularity (billion) as the number of papers in which the entity is occurred. Thus, the 1 1009505 Mice species 183.87 annual number of four types of entities is counted according to the 2 1272105 HIV species 165.38 year of research paper in which the entity is located. The years Human 1988, 1998, 2008, and 2017 are selected, with the entity's research immune- 3 106985801 disease 127.30 popularity as the vertical axis, and the entity's annual funding deficiency amount calculated by step 3 as the horizontal axis to create scatter disease plots. 4 256225101 Tumor disease 101.09 5 255268301 Cancer disease 94.07 6 1009005 Mouse species 93.22 3 PRELIMINARY RESULTS 7 1011605 Rat species 66.93 drug/chemi- 3.1 Development trend analysis of biomedical 8 4168403 Alcohol 62.13 cal entity 9 323759402 Insulin gene/protein 51.24 Based on the change of the number of research entities of each 10 1167605 HIV-1 species 48.70 type, the development trend of biomedical fields in the past three 11 258006601 DM disease 42.71 decades is analyzed. The number of entities studied in each year is 12 325454802 CD4+ gene/protein 40.97 the number of biomedical entity types mentioned in all papers drug/chemi- published in that year. Figure 2 shows the number of research 13 291977503 Glucose 39.99 cal entities for each type over time. From the perspective of 14 107480901 Breast and disease 34.50 51 EEKE 2020 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents epithelial- myoepithelial carcinomas drug/chemi- 15 287734103 Ca2+ 28.60 cal 16 107550501 AD disease 25.93 17 261400701 Obesity disease 23.81 18 267406001 Depression disease 23.80 Bronchial 19 106971701 disease 23.65 asthma 20 325464002 p32 gene/protein 22.07 3.3 Analysis of the relationship between research popularity of entity and government funding Based on biomedical entities in the four fields (Species, disease, gene/protein and drug/chemical), the years 1988, 1998, 2008 and 2017 are selected for scatter plotting, and the relationship between entity's research popularity and government Figure 4. Scatterplot of disease entity research funding and funding is visually analyzed, to identify the driving effect of the research popularity in 1988, 1998, 2008 and 2017 fund on research in each field from the entity's perspective. As shown in Figure 4, The linear coefficient obtained by fitting the linear trend of disease entities in four years is slightly larger than that obtained by species entities. Like the species entity, in 1988, a small increase in funding is followed by a significant increase in research popularity. As the years go by, the increase in funding amount is greater than the increase in research popularity, the slope of the fitted line gradually decreases, which means the stimulating effect of the funding amount on the research popularity is gradually slowing down. In 2017, there are more entities with high funding and low research popularity, which of course could be related to the emergence of new types of entities. Figure 3. Scatterplot of species entity research funding and research popularity in 1988, 1998, 2008 and 2017 As shown in Figure 3, in 1988, a small increase in funding is followed by a significant increase in research popularity. In the following three years, the linear fit reveals that with the passage of time and the increase of the funding amount, the stimulating effect of funding amounts on the popularity of species research slows down. Figure 5. Scatterplot of gene/protein entity research funding and research popularity in 1988, 1998, 2008 and 2017 As shown in Figure 5, for gene/protein entities, the initial trend in 1988 is similar to the first two (species entity and disease 52 EEKE 2020 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents Fang Tan et al. entity), while later, especially in 2017, it is clear that the upper trends in four categories of entities in the biomedical field and limit of research popularity has declined over time. counted the entities that received the highest funding. However, Is there any commonality among these entities? Is entity-related research with any certain characteristics always more likely to be funded by the government? In addition, current research shows that the incentive effect of increased government funding on research in various fields is decreasing, while the impact of other factors such as the continuity of government funding on research popularity has not been explored. Therefore, further research will be conducted on the study of the characteristics of the funded entities and the rules of government funding. ACKNOWLEDGMENTS We acknowledge the editors and the anonymous reviewers for insightful suggestions on this work. REFERENCES [1] Nicolas Fiorini,,Kathi Canese, Grisha Starchenko, et al., 2018. Best match: new relevance search for PubMed. PLOS Biology 16, 8 (Aug, 2018), e2005343. DOI: https://doi.org/10.1371/journal.pbio.2005343. [2] Guanghui Xia, Junlian Li, Baokun Xing, et al, 2019. Study of Named Entity Figure 6. Scatterplot of drug/chemical entity research funding Recognition in Medical Treatment Based on Literatures of Chinese Case and research popularity in 1988, 1998, 2008 and 2017 Reports. Journal of Medical Intelligence 40, 6 (May, 2019), 54-59. [3] Ying Ding, Min Song, Jia Han, et al., 2013. Entitymetrics: Measuring the impact of entities. PloS one 8, 8 (Aug, 2013), 1-14. DOI: As shown in Figure 6, the presentation pattern of https://doi.org/10.1371/journal.pone.0071416. drug/chemical entities is similar to that of gene/protein entities. [4] Yuan Xu, Yanqiu Ge, Qiang, Wang, et al., 2018. Medical Name Entity The upper limit of research popularity has declined over time, Recognition and Application in Chinese Admission Record of Stroke Patients Based on CRF and RUTA rule. Journal of Sun Yat-sen University (Medical which indicates that as the years go by, the amount of funding Sciences) 39, 3 (May, 2018), 455-462. does not play a significant role in the drug/chemical entity's effect [5] Xiuyan Wang, Lei Cui, 2013. Extract Semantic Relations Between Biomedical Entities Applied Hybrid Method. New Technology of Library and Information on research popularity anymore. Service 29, 3 (Mar, 2013), 77-82. The above analysis follows that the influencing factors of the [6] Yongjian Xu, Jizhong Zhou, 2008. A Study of the Relationship between change of entity research popularity should be multi-faceted and Government R&D Funding and Business Technology Innovation Activities. China Soft Science, 11 (Nov 2008), 141-148. complex, rather than simply being linearly influenced by research [7] Paul A. David, Bronwyn H. Hall, Andrew A. Toole, 2000. Is Public R&D a funding, and the complexity increases with the increase of years. Complement or a Substitute for Private R&D——A Review of the Econometric Evidence. Research Policy 29, 4-5 (Apr, 2000), 497-529. [8] Jacob Devlin, Ming-Wei Chang, Kenton Lee, et al., 2018. Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding. 4 CONCLUSION AND FUTURE WORK arXiv:1810.04805. Retrieved from https://arxiv.org/abs/1810.04805. [9] Donghyeon Kim, Jinhyuk Lee, Chan Ho So, et al., 2019. A neural named entity recognition and multi-type normalization tool for biomedical text mining. IEEE 4.1 Conclusion Access 7, (Jan 2019), 73729–73740. DOI: https://doi.org/10.1109/ACCESS.2019.2920708. Studies linking entities to government funding and exploration [10] Jian Xu, Sunkyu Kim, Min Song, et al., 2020. Building a PubMed knowledge of trends from an entity perspective barely visible as far as we graph. Scientific Data 7, 1 (Jun, 2020). 1-15. DOI:10.1038/s41597-020-0543-2. know. This study puts forward a preliminary research idea, applying the idea of entitymetrics to biomedical field from the perspective of scientific research funds, and carries out a preliminary research trend exploration and knowledge discovery. The conclusions are as follows: a) the field of genetic research is in a period of rapid development, while the field of species research is in a “flat period”; b) Disease research catch NIH’s continuous attention; c) the stimulating effect of government funding on the research popularity is decreasing, which is affected by various factors. These findings provide the basis for a follow- up study. 4.2 Future work Inspired by the initial results, our future work will focus on a more in-depth exploration of the relationship between government funding and entity development. In this study, we summarized the 53