=Paper= {{Paper |id=Vol-1747/BP02_ICBO2016 |storemode=property |title=Disease Named Entity Recognition Using NCBI Corpus |pdfUrl=https://ceur-ws.org/Vol-1747/BP02_ICBO2016.pdf |volume=Vol-1747 |authors=Thomas Hahn,Hidayat Ur Rahman,Richard Segall |dblpUrl=https://dblp.org/rec/conf/icbo/HahnRS16 }} ==Disease Named Entity Recognition Using NCBI Corpus == https://ceur-ws.org/Vol-1747/BP02_ICBO2016.pdf
  Biomedical Disease Name Entity Recognition Using
                    NCBI Corpus

           Hidayat Ur Rahman                                      Thomas Hahn                                Dr. Richard Segall
       Lahore Leads University                        University of Arkansas at Little Rock               Arkansas State University
 5Tipu Block Near Garden Town Near                      2801 South University Avenue                 Computer Inform Tech Department
 Kalma Chowk, Lahore 54000 Pakistan                         Little Rock, AR, 72204                    State University, AR 72404-0130
          +92-3329702722                                      + 1 (501) 301 4890                             + 1 (870) 972-3989
     Hidayat.Rhman@gmail.com                          Thomas.F.Hahn3@gmail.com                              rsegall@astate.edu



     Abstract— Named Entity Recognition (NER) in biomedical                 the author used ME to distinguish between 23 different
literature is a very active research area. NER is a crucial component of    biological categories achieving an F-score of 72%.
biomedical text mining because it allows for information retrieval,
reasoning and knowledge discovery. Much research has been carried out       Performance of biomedical NER as compared to general
in this area using semantic type categories, such as “DNA”, “RNA”,          purpose NER is not satisfactory [13]. Many approaches have
“proteins” and “genes”. However, disease NER has not received its           been used to enhance the performance of biomedical NER
needed attention yet, specifically human disease NER. Traditional           systems, e.g. adding biomedical domain knowledge [14] [15],
machine learning approaches lack the precision for disease NER, due to
their dependence on token level features, sentence level features and the   applying post-processing [14] and combining different
integration of features, such as orthographic, contextual and linguistic    machine learning classifiers to perform a hybrid classification
features. In this paper a method for disease NER is proposed which          scheme [16]. Some of the above mentioned applications are
utilizes sentence and token level features based on Conditional Random      discussed below.
Fields using the NCBI disease corpus. Our system utilizes rich features
including orthographic, contextual, affixes, bigrams, part of speech and    The exact biomedical term could be referred to by
stem based features. Using these feature sets our approach has achieved a   abbreviations or synonyms. Therefore, abbreviation and
maximum F-score of 94% for the training set by applying 10 fold cross       synonym recognition are used to unify and normalize
validation for semantic labeling of the NCBI disease corpus. For testing    biomedical entities for biomedical NER. For example, in [17]
and development corpus the model has achieved an F-score of 88% and
85% respectively.                                                           the authors have used logistic regression for abbreviation
                                                                            scoring based on the Medstract corpus thus achieving a recall
   Keywords— NCBI disease corpus, naïve Bayesian, Bayesian                  of 83% and precision of 80%. In [18] an abbreviation
networks, Non nested generalized exemplars;                                 recognition system has been developed using the AB3P
                                                                            corpus. Thus, a recall of 95.86% and precision of 86.64%
                         I.    INTRODUCTION                                 could be achieved. In [19] pattern-matching rules were
                                                                            developed for matching abbreviations with their respective
Biomedical Named Entity Recognition (NER) is based on
                                                                            full term. Thus, a recall of 70% and a precision of 95% could
dictionary-based, rule-based and machine learning approaches
                                                                            be obtained. In [20] a system was developed based on
[1] and [2]. In the dictionary based approach all the terms are
                                                                            collocations yielding a recall of 88.5% and precision of
not defined in dictionary. This is the major limitation of this
                                                                            96.3%. In [21] a rule-based synonym recognition system was
approach [3]. Rule-based approaches make decisions based on
                                                                            developed, in [22] a pattern matching system was developed
certain rules, which are learned from the data in form of text
                                                                            to match abbreviations with their corresponding full names.
terms. But these rules are not applicable in all cases [3]. On
                                                                            A lot of current research is interested in entity recognition and
the other hand, machine learning approaches require enormous
                                                                            normalization [23]. In the BioCreative III competition, one
annotated data to train the algorithm [4]. Nowadays machine
                                                                            task was focused on gene normalization, i.e. to identify and
learning approaches are commonly used for NER, e.g.,
                                                                            link genes to the standard database [24]. Such system has also
Support Vector Machines (SVM) [5], Maximum Entropy
                                                                            been developed in [25]. Relationships between biomedical
(ME) [6], Hidden Markov Models (HMM) [7] and
                                                                            entities, e.g. protein-protein interactions, gene-disease
Conditional Random Fields (CRF) [8]. In [9] an HMM model
                                                                            interactions are investigated in [26].
has been proposed to distinguish between DNA, RNA,
                                                                            Much work has been done in the field of relationship mining.
protein, cell-type and cell-line. Kazema et al. proposed an
                                                                            For example, in [27] a relationship mining system was
SVM based approach to identify DNA, cell-type, cell-line,
                                                                            developed using MetaMap to identify biomedical entities [28]
protein and lipid achieving an f-score of 73.6% [10]. In [11]
                                                                            while using linguistic rules to determine the semantic
CRFs based NER system was developed to recognize protein
                                                                            relationships between them. In [29] a gene-disease
mentions achieving an F-score of 78.4%. Beside CRFs in [12],
                                                                            relationship extraction system was developed from Medline
abstracts using machine learning approach. It performed better
than dictionary- and rule-based approaches.                         A. Word Normalization
The research in this work focuses on biomedical disease              Word normalization attempts to reduce different form of
classification using the National Center for biotechnology       words such as noun, adjective, verb etc. to its reduced/stemmed
(NCBI) corpus and applying combinations of machine               or root form . Common technique used for word normalization
learning approaches. We found that selecting rich features and   is the use of stemmer or lemmatizer, which stems word to its
combining classifiers contribute to a better performance.        base form. Following are the various patterns analyzed which
                                                                 are reduced to its root form.
                    II.   DATASET DETAILS                            •    Colorectal cancer  colorect cancer
Our dataset is the National Center for Biotechnology                 •    Endometrial cancer  endometri cancer
Information (NCBI) Disease Corpus. It is available
at http://www.ncbi.nlm.nih.gov/CBBresearch/Dogan/DISEAS              •    Alzheimer disease  alzheim diseas
E/. It consists of 793 abstracts containing 2783 sentences,          •    Neurological disease  neurolog diseas
3224 unique disease names [30] and about 6,900 disease
names in total. NCBI corpus annotators have annotated every          •    Arthritis  arthriti
sentence of the PubMed abstracts excluding organism names
                                                                     •    Deficiency of DPD  defici of DPD
(e.g. human, virus and bacteria), gender (male and female),
general terms (deficiencies and syndromes), biological               •    Premenopausal      ovarian   cancer    premenopaus
references and nested disease. Annotations were done using a              ovarian cancer
web base tool called PubTator [31]. The corpus annotations
were assigned four categories based on the nature of the             •    Neurodegeneration  neurodegener
disease which consist of 3922 specific disease annotation,           •    Familial deficiency of the seventh component of
1029 disease class annotations, 1774 modifiers and 173                    complement  famili defici of the seventh compon of
composite mentions. The dataset is further divided into                   complement
training, testing and development set as shown in the table
below                                                              B. Orthographic Features

                                                                 Orthographic features are related to the geometry and
     Classes       Training    Testing    Development            indentation of the text such as capitalization, digits, numbers,
                   set         set        set                    numerics, single caps, all caps, two caps, punctuation,
      Modifiers    1292        264        218                    symbols etc. Such features are very effective in NER. Use of
      Specific     2959        556        409                    orthographic feature has been advocated in [32-34].
      Disease
      Composite 116           20          37                       C. Part Of Speech (POS) Tags
      Mention
      Disease      781        121         127                    Usually POS tags help define the boundaries of phrases. In
      Class                                                      some scenarios POS tags have improved NER performance
    Table-1: Description of Train, test and Development          [34-35]. Since POS tagging is a challenging and
    d          d i hi       i                                    computationally demanding process some researchers have
                      III. FEATURE SET                           not used it in NER [36]. We have improved performance by
                                                                 including POS tags.
To improve classification accuracy, selecting and defining the     D. N-grams
features is very important. Enriching the feature set can
improve the performance of a particular machine learning         N-grams are defined by a sequence of n tokens or words. The
algorithm. To train our algorithm we used the following          most common n-gram is unigram because it contains a single
features:                                                        token. Other n-grams are bigrams and tri-grams containing 2
                                                                 and 3 tokens respectively. Generally, N-grams are represented
    1.   Word Normalization
                                                                 by the equation                                         -------
    2.   Orthographic
                                                                 (1).
    3.   Part of Speech (POS) Tags                               From equation (1)                                      which
    4.   N-grams                                                 represents unigrams, while bigrams add one more word and
    5.   Affixes                                                 can be represented as
    6.   Contextual                                              and     hence    tri-grams    adds     two    more     words
                                                                                                           and hence other N-
Each of these 6 features is explained in more detail below:
gram models can be found so on. In our experiment we only          Contextual (Cc), Normalized (Nm), Unigrams (Ug), bigrams
used bigrams and unigrams.                                         (bg), Affixes (Ax), Part of speech (POS) and Orthographic (O).
                                                                   Performance evaluation was carried out using standard metrics
  E. Affixes                                                       such as precision, recall and F-score.

Prefix and suffix features have significantly improved                 Precision=
performance in the recognition of named entities. In [37] the
authors have collected most frequent suffixes and prefixes             Recall =
from the training data, while in [38] the authors have grouped
the prefixes and suffixes into 23 categories. In our experiment        F-score =
beside contextual features affixes has shown significant
improvement.                                                          Results obtained in Table-2 is based on applying 10 Fold
                                                                   cross validation on the training set.
  F. Contextual features
                                                                   Feature combination                 precision   recall   F-score
Contextual features refer to the word preceding and following      O                                   0.54        0.62     0.53
the named entities. Let       be the current token i.e. named
entity, so for each feature we use two token instances around it   O+ Nm                               0.77        0.76     0.74
i.e.                            . Now for each token      which    O+ Nm+ POS                          0.87        0.87     0.86
appears in the text at location                              the
same features are calculated or more specifically c=               O+ Nm + POS +Un                     0.91        0.91     0.91
…….. (2) Is the contextual window. In our experiment               O+ Nm + POS + Un + Bg               0.92        0.92     0.91
contextual features are the most important features in the
recognition of NEs combined with affixes. Initially two            O+ Nm+ POS +Un + Bg + Cc            0.92        0.92     0.92
contextual features followed by the current word were selected
                                                                   O+ Nm +POS +Un + Bg +Cc + Affixes   0.94        0.94.    0.94
for the experiment. However, when realizing their importance
four contextual features were selected. See equation 2, i.e. the   Table-2: Performance evaluation of Feature set.
two words preceding and the two words following the NE.
                                                                   Table-2 shows combinations of different features for
                                                                   improving CRF performance. Oorthographic features were
                IV.    CLASSIFICATION SCHEME                       taken as a benchmark. The benchmark performance was an F-
In this research Conditional Random Fields (CRF) was applied       score of 0.53, a precision of 0.54 and a recall of 0.62. Adding
to the NCBI disease corpus. CRF is a probabilistic model for       stemmed or normalized features improved the F-score to 0.74,
labeling sequential data; it’s widely used for part of speech      the precision to 0.77 and the recall to 0.76. Adding part of
tagging and named entity recognition [39, 40]. CRF has several     speech tags further improved the F-score by 12 percent.
advantages over the HMM and SVM. CRF is based on a                 Nevertheless, the part of speech tags were recently removed
discriminative model. Hence, it includes a rich feature set        from the NER system. Unigram-based models have been the
containing overlapping features using conditional probability.     primary models in NER and hence we included them in our
Given a sequence                                       and its     system. Adding the unigram features improved the F-score by
labels                             , the conditional probability   5%. Adding bigram-features did not raise the overall F-score
         is defined by CRF as follows [41]:                        but improved precision and recall by 1%. Adding contextual
                                                                   features only improved the F-score slightly by 1% but had no
                                                             (2)   effect on precision and recall. Combining all features, i.e.
                                                                   orthographic, normalized, part of speech, unigram, bigram,
   Is a weight vector defined by                                   contextual features and affixes yielded 94% for precision,
These weights are associated with features having length equal     recall and F-score. This performance was achieved with a 10-
to                                                         M.      fold cross-validation on the training set due to the rich feature
                                                                   selection.

f is a feature function. Weight vectors (denoted by w) are            Figure 1 shows the F-scores for each of the 4 classes. In
obtained using the L-BFGS method [42]. In our experiment           our experiment the following four classes were defined:
CRFSUITE has been used, which is the Python                             •   Disease Class = DC
implementation of CRF [43].
                                                                        •   Composite Mention = CM
                 V.    RESULT AND DISCUSSION                            •   Specific Disease = SD
Table-2 shows the contributions of features and their effects on        •   Modifier = MD
the performance of CRF. The feature set is divided into
The F-scores of the training, development and testing sets are
                                                                          System        Dataset       Precision   Recall   F-Measure
plotted in figure 1. The best F-scores could be achieved for the
Modifier class. For this class an F-score of 0.96 could be
                                                                                     Training           0.94      0.94       0.94
reached for the training dataset and for the development and
testing dataset an F-score of 0.92 was obtained. The second             CRF
highest F-scores could be achieved for the Specific Disease                          Testing            0.88      0.89       0.88
                                                                        Result
class. For this class the F-score of the training dataset was
0.95, for the testing set it was 0.92 and for the development set                    Development        0.86      0.86       0.85
it was 0.88. The third highest F-scores were achieved for the
Disease Class. For this class the F-score for the training set was                   Training           0.86      0.82       0.84
0.86 and the F-scores for the testing and development set were          BANNER
both 0.71. The F-scores were lowest for the Composite                   Result
                                                                                     Testing            0.83      0.80       0.81
Mention class. For this class the F-score for the training set
was 0.72, for the testing set it was 0.52 and for the                                Development        0.82      0.81       0.81
development set it was 0.62. We observed a positive                  Table-3: Comparison of BANNER and CRF results: For both
correlation between the size of the training sample sets and F-      Classifiers Precision, Recall and F-score are reported.
score. The largest training sample comprising of over 1,000
was available for the Modifier class, followed by the Special        Figure 2 also shows that our F-scores (depicted in blue) are
Disease class, followed by the Disease Class having the second       much higher than those of BANNER (depicted in red)
smallest training sample followed by the Composite Mention
class, which had the smallest training sample.                 The
performance of machine learning algorithms depends on the
size of the training sample. Too small training samples increase
the risk of under fitting while too large training samples
increase the risk for over fitting.




                                                                             Figure-2: Plot of BANNER Vs Proposed Model
                                                                         In summary it can be concluded that CRF based on 6
                                                                     features clearly outperformed BANNER. This clearly shows
                                                                     that the sequential classifier CRF is well suited for classifying
                                                                     biomedical literature based on rich features.

Figure-1: F-score Comparision of Training, Testing and                                         VI.   CONCLUSION
Development Data sets.
                                                                         This paper presents a machine learning approach for human
We compared the performance of our approach, which is based          disease named entity recognition using the NCBI disease
on combining features with that of BANNER using the same             corpus. The system takes the advantage of background
dataset and classes. The results of this comparison are shown in     knowledge obtained from the selected features to better
table 3. Details about BANNER results can be found in [30].          distinguish between the four classes. Improvements due to
The data in table 3 indicates that our approach yielded much         feature additions have been demonstrated. The highest
higher F-scores than BANNAR for the training, testing and            improvement could be obtained when adding a second feature
development set. The F-score obtained with our approach is           to the first. However, in order to evaluate the overall benefit for
10% higher for the training set, 7% higher for the testing set       each feature, all possible combinations of feature additions
and 4% higher for the development set. Hence, we clearly             need to be considered.
succeeded in outperforming BANNER.
                                                                                                  REFERENCES
[1]. A.M. Cohen, W.R. Hersh A survey of current work in biomedical text           [24]. C.N. Arighi, P.M. Roberts, S. Agarwal, S. Bhattacharya, G. Cesareni, A.
      mining Brief Bioinform, 6 (2005), pp. 57–71                                       Chatr-Aryamontri, et al.
[2]. L. Li, R. Zhou, D. Huang,Two-phase biomedical named entity                   [25]. BioCreative III interactive task: an overview. BMC Bioinformatics, 12
      recognition using CRFs. Comput Biol Chem, 33 (2009), pp. 334–338                  (Suppl. 8) (2011),
[3]. D. Rebholz-Schuhmann, A.J. Yepes, C. Li, S. Kafkas, I. Lewin, N.             [26]. M. Huang, J. Liu, X. Zhu. GeneTUKit: a software for document-level
      Kang, et al. Assessment of NER solutions against the first and second             gene normalization. Bioinformatics, 27 (2011), pp. 1032–1033
      CALBC Silver Standard Corpus.J Biomed Semantics, 2 (Suppl. 5)               [27]. C.N. Arighi, Z. Lu, M. Krallinger, K.B. Cohen, W.J. Wilbur, A.
      (2011)                                                                            Valencia, et al.Overview of the BioCreative III workshop. BMC
[4]. M. Krallinger, M. Vazquez, F. Leitner, D. Salgado, A. Chatr-                       Bioinformatics, 12 (Suppl. 8) (2011), p. S1
      Aryamontri, A. Winter, et al.The Protein–Protein Interaction tasks of       [28]. Ben Abacha, P. Zweigenbaum. Automatic extraction of semantic
      BioCreative III: Classification/ranking of articles and linking bio-              relations between medical entities: a rule based approach. J Biomed
      ontology concepts to full text.BMC Bioinformatics, 12 (Suppl. 8) (2011)           Semantics, 2 (Suppl. 5) (2011), p. S4
[5]. M.S. Habib, J. Kalita, Scalable biomedical Named Entity Recognition:         [29]. A.R. Aronson, F.M. Lang. An overview of MetaMap: historical
      investigation of a database-supported SVM approach.Int J Bioinform                perspective and recent advances. J Am Med Inform Assoc, 17 (2010),
      Res Appl, 6 (2010), pp. 191–208                                                   pp. 229–236
[6]. S.K. Saha, S. Sarkar, P. Mitra. Feature selection techniques for             [30]. Rezarta Islamaj, Dogan Zhiyong Lu. An improved corpus for disease
      maximum entropy based biomedical named entity recognition. J Biomed               mentioned in Pubmed citatations Proceedings of the 2012 Workshop on
      Inform, 42 (2009), pp. 905–911                                                    Biomedical Natural Language Processing (BioNLP 2012), pages 91–99,
[7]. Y.M.N. Ephraim.Hidden Markov processes.IEEE Trans Inform Theory,                   Montr´eal, Canada, June 8, 2012
      48 (2002), pp. 1518–1569                                                    [31]. Leaman, R.,Miller,C.Gonzalez. enabling recognition of disease in
[8]. He Y, Kayaalp M. Biological entity recognition with conditional random             biomedical text with machine learning: corpus and Benchmarks.
      fields. In: AMIA annu symp proc; 2008. p. 293–7.                                  Symposium on languages in biology and medicine 2009. Pg 82-89.
[9]. Zhou GD, Su J. Exploring deep knowledge resources in biomedical              [32]. Wei.C, Kao.H,Lu.Z. ‘Pubtator: A Pubmed-like interactive curation
      name recognition. In: JNLPBA; 2004. p. 96–99                                      system for document triage and literature curation. In procedings of
[10]. Kazama J, Makino T, Ohta Y, Tsujii J. Tuning support vector machines              BioCreative workshop 2012 pg145-150.
      for biomedical named entity recognition. In: Association for                [33]. N. Collier, K. Takeuchi. Comparison of character-level and part of
      computational linguistics Morristown, NJ, USA; 2002. p. 1–8.                      speech features for name recognition in biomedical texts. J Biom.
[11]. T. Tsai, W.C. Chou, S.H. Wu, T.Y. Sung, J. Hsiang, W.L.                           Inform. 37. pp423-435. 2004.
      Hsu,Integrating linguistic knowledge into a conditional random field        [34]. D. Shen, J. Zhang, G. Zhou, S. Jian and L. Tan, Effective Adaptation of
      framework to identify biomedical named entities. Expert Syst Appl, 30             a Hidden Markov Modelbased Named Entity Recognizer for Biomedical
      (2006), pp. 117–128                                                               Domain, In: Proceedings of ACL 2003 Workshop on NLP in
[12]. Lin YF, Tsai TH, Chou WC, Wu KP, Sung TY, Hsu WL. A maximum                       Biomedicine, Sapporo, Japan, pp4956, 2003.
      entropy approach to biomedical named entity recognition. In: The 4th        [35]. Tsai, T.-H., Wu, S.-H., & Hsu, W.-L. (2005). Exploitation of linguistic
      ACM SIGKDD workshop on data mining in bioinformatics; 2004. p.                    features using a CRFbased biomedical named entity recognizer. to
      56–61.                                                                            appear in ACL Workshop on Linking Biological Literature, Ontologies
[13]. C.R. Yen-Ching, Tsai Tzong-Han, Hsu Wen-Lian. New challenges for                  and Databases: Mining Biological Semantics, Detroit
      biological text-mining in the next decade. J Comput Sci Technol, 25         [36]. L. Ratinov and D. Roth. 2009. Design challenges and misconceptions in
      (2010), pp. 169–179                                                               named entity recognition. In CoNLL, 6.
[14]. Y. Sasaki, Y. Tsuruoka, J. McNaught, S. Ananiadou. How to make the          [37]. J. Kazama, T. Makino, Y. Ohta, J. Tsujii. Tuning Support Vector
      most of NE dictionaries in statistical NER.BMC Bioinformatics, 9                  Machines for Biomedical Named Entity Recognition. In: Proceedings of
      (Suppl. 11) (2008), p. S5                                                         Workshop on NLP in the Biomedical Domain, ACL 2002. pp1-8. 2002.
[15]. Zhou GDaJS. Exploring deep knowledge resources in biomedical name           [38]. G. Zhou and J. Su. Named Entity Recognition using an HMM-based
      recognition. In: JNLPBA; 2004.                                                    Chunk Tagger. In Proc. of the 40th Annual Meeting of the Association
[16]. B.S. Fei Zhu. Combined SVM-CRFs for biological named entity                       for Computational Linguistics (ACL), pp. 473-480 2002.
      recognition with maximal bidirectional squeezing. PLoS One, 7 (6)           [39]. Huang H-S, Lin Y-S, Lin K-T, Kuo C-J, Chang Y-M, Yang B-H, Chung
      (2012), p. e39230                                                                 I-F, Hsu C-N: High-recall gene mention recognition by unification of
[17]. J.T. Chang, H. Schutze, R.B. Altman. Creating an online dictionary of             multiple background parsing models. Proceedings of the 2nd
      abbreviations from MEDLINE.J Am Med Inform Assoc, 9 (2002), pp.                   BioCreative Challenge Evaluation Workshop 2007, 23:109-111.
      612–620                                                                     [40]. Klinger R, Friedrich CM, Fluck J, Hofmann-Apitius M: Named entity
[18]. C.J. Kuo, M.H. Ling, K.T. Lin, C.N. Hsu. BIOADI: a machine learning               recognition with combinations of conditional random fields. In
      approach to identifying abbreviations and definitions in biological               Proceedings of the 2nd BioCreative Challenge Evaluation Workshop
      literature. BMC Bioinformatics, 10 (Suppl. 15) (2009), p. S7                [41]. Porter M.F. “Snowball: A language for stemming algorithms”. 2001.
[19]. H. Yu, G. Hripcsak, C. Friedman.Mapping abbreviations to full forms in      [42]. Ms. Anjali Ganesh Jivani “A Comparative Study of Stemming
      biomedical articles. J Am Med Inform Assoc, 9 (2002), pp. 262–272                 Algorithms” Int. J. Comp. Tech. Appl., Vol 2 (6), 1930-1938
[20]. H. Liu, C. Friedman. Mining terminological knowledge in large               [43]. Dekang Lin and Xiaoyun Wu. 2009. Phrase Clustering for
      biomedical corpora. Pac Symp Biocomput (2003), pp. 415–426                        Discriminative Learning. In Proceedings of the Joint Conference of the
[21]. J. McCrae, N. Collier. Synonym set extraction from the biomedical                 47th Annual Meeting of the ACL and the 4th International Joint
      literature by lexical pattern discovery. BMC Bioinformatics, 9 (2008), p.         Conference on Natural Language Processing of the AFNLP, pages
      159                                                                               1030–1038, Suntec, Singapore, August. Association for Computational
[22]. A.M. Cohen, W.R. Hersh, C. Dubay, K. Spackman. Using co-                          Linguistics.
      occurrence network structure to extract synonymous gene and protein
      names from MEDLINE abstracts. BMC Bioinformatics, 6 (2005), p. 103
[23]. H.-Y.K. Zhiyong Lu, Wei Chih-Hsuan, Huang Minlie, Liu Jingchen,
      Kuo Cheng-Ju, Hsu Chun-Nan, et al.The gene normalization task in
      BioCreative III.BMC Bioinformatics, 12 (2011)