=Paper= {{Paper |id=Vol-2658/paper6 |storemode=property |title=Exploring the Relation between Biomedical Entities and Government Funding |pdfUrl=https://ceur-ws.org/Vol-2658/paper6.pdf |volume=Vol-2658 |authors=Fang Tan,Siting Yang,Xiaoyan Wu,Jian Xu |dblpUrl=https://dblp.org/rec/conf/jcdl/TanYWX20 }} ==Exploring the Relation between Biomedical Entities and Government Funding== https://ceur-ws.org/Vol-2658/paper6.pdf
                   EEKE 2020 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents




            Exploring the Relation between Biomedical Entities and
                            Government Funding
       Fang Tan                             Siting Yang                            Xiaoyan Wu                              Jian Xu
 School of Information                 School of Information                     School of Information               School of Information
     Management                            Management                               Management                           Management
Sun Yat-Sen University                Sun Yat-Sen University                    Sun Yat-Sen University              Sun Yat-Sen University
Guangzhou Guangdong                   Guangzhou Guangdong                       Guangzhou Guangdong                 Guangzhou Guangdong
        China                                 China                                     China                               China
  cathytf@163.com                      524228058@qq.com                         wxy1954174163@163.                  issxj@mail.sysu.edu.cn
                                                                                         com


ABSTRACT                                                                    government funds in terms of institutions, patents, employment
                                                                            resolution capacity, etc. [6, 7]. Meanwhile, most of the research
In order to study and analyze the effect of government funding on           on scientific research and funding is limited to the exploration of
the promotion of scientific research in the field of medicine and to        the relationship between some indicators of research
help the government manage research funds more rationally, this             achievements (e.g. quantity and citation) and funding, and lacks a
study proposes a framework for analyzing the relationship                   detailed study on the impact of funding on entity level. Therefore,
between entities in the field of medicine and funds. The                    this paper combined PubMed medical database and funding
framework consists of four parts: biomedical abstracts acquisition,         information published by the National Institutes of Health (NIH)
NIH funding information acquisition and biomedical entity                   to compare the actual research focus and funding focus in the
extraction; Development trend analysis of biomedical entity;                biomedical field from 1988 to 2017. First, the trajectory of the
Analysis of the most funded entities; Analysis of the relationship          field is mapped from a physical research perspective to
between entity research popularity and government funding. The              understand macro trends; second, the most funded entities are
results of preliminary analysis are as follows: the field of genetic        counted, the focuses and tendencies of government funding on
research is in a period of rapid development, while the field of            biomedical entities are summarized; finally, the specific
species research is in a “flat period”; Disease research catch              relationship between biomedical research funding and research
NIH’s continuous attention; the stimulating effect of government            popularity is further analyzed, which provides a reference for the
funding on the research popularity is decreasing, which is affected         government 's choice of funding recipients and funding levels.
by various factors.

CCS CONCEPTS                                                                2     METHODOLOGY
• Applied computing → Bioinformatics • Applied computing →                      This paper proposes a framework for analyzing the
Computing in government • Information systems → Information                 relationship between biomedical entities and funds, as shown in
retrieval                                                                   Figure 1.

KEYWORDS
Biomedical entities,      Government     funding,    Entitymetrics,
Evolutionary trend


1    INTRODUCTION
     By 2019, the total number of literatures in PubMed, the
database of biomedical papers, has reached 29 million [1], and
statistically, nearly 1/3 of US patents come directly from federally
funded programs [2], meaning that the federal government plays
an important role in the development of scientific research.
Entitymetrics was originally proposed by Ding et al. [3]. Current
research around entities in medicine mainly includes the
identification and classification of named entities [4], and the
                                                                                          Figure 1. Our Research Framework
extraction of entity relationships [5], while research on
government funding is limited to quantifying the effects of



      Copyright 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                                                       50
                   EEKE 2020 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents


                                                                                                                                  Fang Tan et al.

   In Figure 1, the analysis framework can be divided into four             development trend, the number of gene/protein entities is rising
main modules: data acquisition and entity extraction;                       the fastest and is in the stage of rapid development. The research
Development trend analysis of biomedical entity; Analysis of the            on species entities is in the flat stage and is less numerous.
most funded entities; Analysis of the relationship between entity
research popularity and government funding.
1. Data acquisition and entity extraction. Obtaining biomedical
data from PubMed between 1988 and 2017, obtaining funding
information and relevant research papers of project outputs from
NIH funding database, and biomedical entity extraction based on
BERN [8, 9, 10]. BERN, namely Biomedical named entity
recognition and multi-type normalization, a Web-based
biomedical text mining tool. The process of entity extraction
involving two steps: named entity recognition and entity
normalization. At last, 489,433 biomedical entities are obtained
between 1988 and 2017. 2,082,652 research projects are obtained,
with about $1,0261.3 billion [10].
2. Development trend analysis of biomedical entity. Biomedical
entities are categorized into Species, Diseases, Gene/Protein, and
Drug/Chemical for evolutionary analysis. Table 1 shows the
                                                                                     Figure 2. Mention Trends of Biomedical Entities
number of entities of four types.

                                                                            3.2      Analysis of the Most Funded Entities
         Table 1. The number of entities of four types                          Table 2 shows the top twenty biomedical entities in terms of
                                                                            total NIH funding dollars. Mice, HIV, Human immunodeficiency
           Entity Type                          Number                      disease and Tumor have all received more than $100 billion,
              Species                            84,203                     which are the four entities receiving the most funding. In terms of
              Disease                            36,704                     entity types, the highest number of disease entities occupying nine
           Gene/Protein                          25,489                     seats, and the lowest number of gene/protein entities, only two.
          Drug/Chemical                         134,574                     This indicates that the study of disease is an area of research that
3. Analysis of the most funded entities. Combined with the                  the NIH has always valued and continues to focus on.
biomedical entity data, the entities mentioned in the NIH project
output articles are extracted to count the amount of funding for the
                                                                                  Table 2. Entities with the highest total funding (top 20)
entities. We define the funding for an entity as the sum of the
funding for all articles in which the entity appears.
4. Analysis of the relationship between entity research popularity                                                                      Funds
                                                                            ID        Entity ID      Entity Name     Entity Type
and government funding. We define the entity research popularity                                                                       (billion)
as the number of papers in which the entity is occurred. Thus, the           1         1009505           Mice           species         183.87
annual number of four types of entities is counted according to the          2         1272105           HIV            species         165.38
year of research paper in which the entity is located. The years                                        Human
1988, 1998, 2008, and 2017 are selected, with the entity's research                                    immune-
                                                                             3       106985801                          disease         127.30
popularity as the vertical axis, and the entity's annual funding                                      deficiency
amount calculated by step 3 as the horizontal axis to create scatter                                    disease
plots.                                                                       4       256225101          Tumor          disease          101.09
                                                                             5       255268301          Cancer         disease           94.07
                                                                             6        1009005           Mouse          species           93.22
3     PRELIMINARY RESULTS                                                    7        1011605             Rat          species           66.93
                                                                                                                     drug/chemi-
3.1     Development trend analysis of biomedical                             8         4168403         Alcohol                           62.13
                                                                                                                         cal
      entity                                                                 9       323759402          Insulin      gene/protein        51.24
    Based on the change of the number of research entities of each          10        1167605           HIV-1          species           48.70
type, the development trend of biomedical fields in the past three          11       258006601            DM           disease           42.71
decades is analyzed. The number of entities studied in each year is         12       325454802          CD4+         gene/protein        40.97
the number of biomedical entity types mentioned in all papers                                                        drug/chemi-
published in that year. Figure 2 shows the number of research               13       291977503         Glucose                           39.99
                                                                                                                         cal
entities for each type over time. From the perspective of                   14       107480901        Breast and       disease           34.50




                                                                       51
                    EEKE 2020 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents




                          epithelial-
                        myoepithelial
                         carcinomas
                                           drug/chemi-
15      287734103            Ca2+                            28.60
                                               cal
16      107550501            AD              disease         25.93
17      261400701          Obesity           disease         23.81
18      267406001         Depression         disease         23.80
                          Bronchial
19      106971701                            disease         23.65
                           asthma
20      325464002            p32          gene/protein       22.07

3.3    Analysis of the relationship between research
      popularity of entity and government funding
    Based on biomedical entities in the four fields (Species,
disease, gene/protein and drug/chemical), the years 1988, 1998,
2008 and 2017 are selected for scatter plotting, and the
relationship between entity's research popularity and government               Figure 4. Scatterplot of disease entity research funding and
funding is visually analyzed, to identify the driving effect of the                research popularity in 1988, 1998, 2008 and 2017
fund on research in each field from the entity's perspective.
                                                                                   As shown in Figure 4, The linear coefficient obtained by
                                                                              fitting the linear trend of disease entities in four years is slightly
                                                                              larger than that obtained by species entities. Like the species
                                                                              entity, in 1988, a small increase in funding is followed by a
                                                                              significant increase in research popularity. As the years go by, the
                                                                              increase in funding amount is greater than the increase in research
                                                                              popularity, the slope of the fitted line gradually decreases, which
                                                                              means the stimulating effect of the funding amount on the
                                                                              research popularity is gradually slowing down. In 2017, there are
                                                                              more entities with high funding and low research popularity,
                                                                              which of course could be related to the emergence of new types of
                                                                              entities.




  Figure 3. Scatterplot of species entity research funding and
      research popularity in 1988, 1998, 2008 and 2017

    As shown in Figure 3, in 1988, a small increase in funding is
followed by a significant increase in research popularity. In the
following three years, the linear fit reveals that with the passage of
time and the increase of the funding amount, the stimulating effect
of funding amounts on the popularity of species research slows
down.



                                                                               Figure 5. Scatterplot of gene/protein entity research funding
                                                                                  and research popularity in 1988, 1998, 2008 and 2017

                                                                                 As shown in Figure 5, for gene/protein entities, the initial trend
                                                                              in 1988 is similar to the first two (species entity and disease




                                                                         52
                    EEKE 2020 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents


                                                                                                                                             Fang Tan et al.

entity), while later, especially in 2017, it is clear that the upper         trends in four categories of entities in the biomedical field and
limit of research popularity has declined over time.                         counted the entities that received the highest funding. However, Is
                                                                             there any commonality among these entities? Is entity-related
                                                                             research with any certain characteristics always more likely to be
                                                                             funded by the government? In addition, current research shows
                                                                             that the incentive effect of increased government funding on
                                                                             research in various fields is decreasing, while the impact of other
                                                                             factors such as the continuity of government funding on research
                                                                             popularity has not been explored. Therefore, further research will
                                                                             be conducted on the study of the characteristics of the funded
                                                                             entities and the rules of government funding.

                                                                             ACKNOWLEDGMENTS
                                                                             We acknowledge the editors and the anonymous reviewers for
                                                                             insightful suggestions on this work.

                                                                             REFERENCES
                                                                             [1] Nicolas Fiorini,,Kathi Canese, Grisha Starchenko, et al., 2018. Best match: new
                                                                                  relevance search for PubMed. PLOS Biology 16, 8 (Aug, 2018), e2005343.
                                                                                  DOI: https://doi.org/10.1371/journal.pbio.2005343.
                                                                             [2] Guanghui Xia, Junlian Li, Baokun Xing, et al, 2019. Study of Named Entity
Figure 6. Scatterplot of drug/chemical entity research funding                    Recognition in Medical Treatment Based on Literatures of Chinese Case
    and research popularity in 1988, 1998, 2008 and 2017                          Reports. Journal of Medical Intelligence 40, 6 (May, 2019), 54-59.
                                                                             [3] Ying Ding, Min Song, Jia Han, et al., 2013. Entitymetrics: Measuring the
                                                                                  impact of entities. PloS one 8, 8 (Aug, 2013), 1-14. DOI:
   As shown in Figure 6, the presentation pattern of                              https://doi.org/10.1371/journal.pone.0071416.
drug/chemical entities is similar to that of gene/protein entities.          [4] Yuan Xu, Yanqiu Ge, Qiang, Wang, et al., 2018. Medical Name Entity
The upper limit of research popularity has declined over time,                    Recognition and Application in Chinese Admission Record of Stroke Patients
                                                                                  Based on CRF and RUTA rule. Journal of Sun Yat-sen University (Medical
which indicates that as the years go by, the amount of funding                    Sciences) 39, 3 (May, 2018), 455-462.
does not play a significant role in the drug/chemical entity's effect        [5] Xiuyan Wang, Lei Cui, 2013. Extract Semantic Relations Between Biomedical
                                                                                  Entities Applied Hybrid Method. New Technology of Library and Information
on research popularity anymore.                                                   Service 29, 3 (Mar, 2013), 77-82.
   The above analysis follows that the influencing factors of the            [6] Yongjian Xu, Jizhong Zhou, 2008. A Study of the Relationship between
change of entity research popularity should be multi-faceted and                  Government R&D Funding and Business Technology Innovation Activities.
                                                                                  China Soft Science, 11 (Nov 2008), 141-148.
complex, rather than simply being linearly influenced by research            [7] Paul A. David, Bronwyn H. Hall, Andrew A. Toole, 2000. Is Public R&D a
funding, and the complexity increases with the increase of years.                 Complement or a Substitute for Private R&D——A Review of the Econometric
                                                                                  Evidence. Research Policy 29, 4-5 (Apr, 2000), 497-529.
                                                                             [8] Jacob Devlin, Ming-Wei Chang, Kenton Lee, et al., 2018. Bert: Pre-training of
                                                                                  Deep      Bidirectional    Transformers      for   Language     Understanding.
4     CONCLUSION AND FUTURE WORK                                                  arXiv:1810.04805. Retrieved from https://arxiv.org/abs/1810.04805.
                                                                             [9] Donghyeon Kim, Jinhyuk Lee, Chan Ho So, et al., 2019. A neural named entity
                                                                                  recognition and multi-type normalization tool for biomedical text mining. IEEE
4.1    Conclusion                                                                 Access          7,        (Jan         2019),        73729–73740.         DOI:
                                                                                  https://doi.org/10.1109/ACCESS.2019.2920708.
    Studies linking entities to government funding and exploration           [10] Jian Xu, Sunkyu Kim, Min Song, et al., 2020. Building a PubMed knowledge
of trends from an entity perspective barely visible as far as we                  graph. Scientific Data 7, 1 (Jun, 2020). 1-15. DOI:10.1038/s41597-020-0543-2.
know. This study puts forward a preliminary research idea,
applying the idea of entitymetrics to biomedical field from the
perspective of scientific research funds, and carries out a
preliminary research trend exploration and knowledge discovery.
The conclusions are as follows: a) the field of genetic research is
in a period of rapid development, while the field of species
research is in a “flat period”; b) Disease research catch NIH’s
continuous attention; c) the stimulating effect of government
funding on the research popularity is decreasing, which is affected
by various factors. These findings provide the basis for a follow-
up study.

4.2    Future work
   Inspired by the initial results, our future work will focus on a
more in-depth exploration of the relationship between government
funding and entity development. In this study, we summarized the




                                                                        53